Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Complex systems often consist of multiple agents (or components) interacting with each other and their environment to achieve certain objectives. For example, teams of robots are employed to perform tasks such as monitoring, surveillance, and disaster response in different domains including search and rescue [1], object transportation [2], and formation control [3]. With growing complexity of autonomous systems and their safety-critical nature, the need for automated and reliable design and analysis methods and tools is increasing. To this end, an ambitious goal in system design and control is to automatically synthesize controllers for controllable parts of the system that guarantee the satisfaction of the specified objectives. Given a model of the system describing the interaction of a controllable plant with its environment and an objective in a formal language such as linear temporal logic (LTL), controller synthesis problem seeks to construct a finite-state controller that ensures that the system satisfies the objective, regardless of how its environment behaves. In this paper we consider the controller synthesis problem for multi-agent systems.

One of the main challenges in automated synthesis of systems is the scalability problem. This issue becomes more evident for multi-agent systems, as adding each agent can often increase the size of the state space exponentially. The pioneering work by Pnueli et al. [4] showed that reactive synthesis from LTL specifications is intractable which prohibited the practitioners from utilizing synthesis algorithms in practice. Distributed reactive synthesis [5] and multi-player games of incomplete information [6] are undecidable in general. Despite these discouraging results, recent advances in this growing research area have enabled automatic synthesis of interesting real-world systems [7], indicating the potential of the synthesis algorithms for solving realistic problems. The key insight is to consider more restricted yet practically useful subclasses of the general problem, and in this paper we take a step toward this direction.

The main motivation for our work is the growing interest in robotic motion planning from rich high-level specifications, e.g., LTL [810]. In most of these works, all agents are controlled and operate in static and fully-observable environments, and the applications of synthesis algorithms are restricted to very small examples due to the well-known state explosion problem. Since the reactive synthesis from LTL specifications is intractable, no algorithm will be efficient for all problems. Nevertheless, one can observe that in many application domains such as robot motion planning, the systems are structured, a fact that can be exploited to achieve better scalability.

In this paper, we consider a special class of multi-agent systems that are referred to as decoupled and are inspired by robot motion planning, decentralized control [11, 12], and swarm robotics [13, 14] literature. Intuitively, in a decoupled multi-agent system the transition relations (or dynamics) of the agents are decoupled, i.e., at any time-step, agents can make decisions on what action to take based on their own local state. For example, an autonomous vehicle can decide to slow down or speed up based on its position, velocity, etc. However, decoupled agents are coupled through objectives, i.e., an agent may need to cooperate with other agents or react to their actions to fulfill a given objective (e.g., it would not be a wise decision for an autonomous vehicle to speed up when the front vehicle pushes the break if collision avoidance is an objective.) In our framework, multi-agent systems consist of a set of controlled and uncontrolled agents. Controlled agents may need to cooperate with each other and react to the actions of uncontrolled agents in order to fulfill their objectives. Besides, controlled agents may be imperfect in the sense that they can only partially observe their environment, for example due to the limitations in their sensors. The goal is to synthesize controllers for each controlled agent such that the objectives are enforced in the resulting system.

To solve the controller synthesis problem for multi-agent systems one can directly construct the model of the system by composing those of the agents, and solve the problem centrally for the given objectives. However, the centralized method lack flexibility, since any change in one of the components requires the repetition of the synthesis process for the whole system. Besides the resulting system might be exponentially larger than the individual parts, making this approach infeasible in practice. Compositional reactive synthesis aims to exploit the structure of the system by breaking the problem into smaller and more manageable pieces and solving them separately. Then solutions to sub-problems are merged and analyzed to find a solution for the whole problem. The existing structure in multi-agent systems makes them a potential application area for compositional synthesis techniques.

To this end, we propose a compositional framework for decoupled multi-agent systems based on automatic decomposition of objectives and compositional reactive synthesis using maximally permissive strategies [15]. We assume that the objective of the system is given in conjunctive form. We make an observation that in many cases, each conjunct of the global objective only refers to a small subset of agents in the system. We take advantage of this structure to decompose the synthesis problem: for each conjunct of the global objective, we only consider the agents that are involved, and compute the maximally permissive strategies for those agents with respect to the considered conjunct. We then intersect the strategies to remove potential conflicts between them, and project back the constraints to subproblems, solving them again with updated constraints, and repeating this process until the strategies become fixed.

We implement the algorithms symbolically using binary decision diagrams (BDDs) and apply them to a robot motion planning case study where multiple robots are placed on a grid-world with static obstacles and other dynamic, uncontrolled and potentially adversarial robots. We consider different objectives such as collision avoidance, keeping a formation and bounded reachability. We show that by taking advantage of the structure of the system, the proposed compositional synthesis algorithm can significantly outperform the centralized synthesis approach, both from time and memory perspective, and can solve problems where the centralized algorithm is infeasible. Furthermore, using compositional algorithms we managed to solve synthesis problems for systems with multiple agents, more complex objectives and for grid-worlds of sizes that are much larger than the cases considered in similar works. Our findings show the potential of symbolic and compositional reactive synthesis methods as planning algorithms in presence of dynamically changing and possibly adversarial environment.

Contributions. The main contributions of the paper are as follow. We propose a framework for modular specification and compositional controller synthesis for multi-agent systems with imperfect controlled agents. We implement the methods symbolically using BDDs and apply them to a robot motion planning case study. We report on our experimental results and show that the compositional algorithm can significantly outperform the centralized approach.

Related Work. Compositional reactive synthesis has been considered in some recent works. Kupferman et al. [16] propose a compositional algorithm for LTL realizability and synthesis based on a Safraless approach that transforms the synthesis problem into a Büchi game. Baier et al. [17] give a compositional framework for treating multiple linear-time objectives inductively. Sohail et al. [18] propose an algorithm to compositionally construct a parity game from conjunctive LTL specifications. Alur et al. [19] show how local specifications of components can be refined compositionally to ensure satisfaction of a global specification. Lustig et al. [20] study the problem of LTL synthesis from libraries of reusable components. Alur et al. [21] propose a framework for compositional synthesis from a library of parametric and reactive controllers. Filiot et al. [15] reduce the LTL realizability problem to solving safety games. They show that, for LTL specifications written as conjunction of smaller LTL formulas, the problem can be solved compositionally by first computing winning strategies for each conjunct. Moreover, they show that compositional algorithms can handle fairly large LTL specifications. To the best of our knowledge, algorithms in [15] seems to be the most successful application of compositional synthesis in practice.

Two-player games of imperfect information are studied in [2225], and it is shown that they are often more complicated than games of perfect information. The algorithmic difference is exponential, due to a subset construction that turns a game of imperfect information into an equivalent game of perfect information. In this paper, we build on the results of [15, 25] and extend and adapt their methods to treat multi-agent systems with imperfect agents. To the best of our knowledge, compositional reactive synthesis is not studied in the context of multi-agent systems and robot motion planning.

The controller synthesis problem for systems with multiple controllable agents from a high-level temporal logic specification is also considered in many recent works (e.g., [8, 26, 27]). A common theme is based on first computing a discrete controller satisfying the LTL specification over a discrete abstraction of the system, which is then used to synthesize continues controllers guaranteed to fulfill the high-level specification. In many of these works (e.g., [28, 29]) the agents’ models are composed (either from the beginning or incrementally) to obtain a central model. The product of the central model with the specification automaton is then constructed and analyzed to compute a strategy. In [9], authors present a compositional motion planning framework for multi-robot systems based on a reduction to satisfiability modulo theories. However, their model cannot handle uncertain or dynamic environment. In [8, 30] it is proposed that systems with multiple components can be treated in a decentralized manner by considering one component as a part of the environment of another component. However, these approaches cannot address the need for joint decision making and cooperative objectives. In this paper we consider compositional and symbolic algorithms for solving games in presence of a dynamic and possibly adversarial environment.

2 Preliminaries

Linear temporal logic (LTL). We use LTL to specify system objectives. LTL is a formal specification language with two types of operators: logical connectives (e.g., \(\lnot \) (negation) and \(\wedge \) (conjunction)) and temporal operators (e.g., \(\bigcirc \) (next), \(\mathcal {U}\) (until), \(\Diamond \) (eventually), and \(\Box \) (always)). Let \(\mathcal {V}\) be a finite set of Boolean variables. A formula with no temporal operator is a Boolean formula or a predicate. Given a predicate \(\phi \) over variables \(\mathcal {V}\), we say \(s \in 2^\mathcal {V}\) satisfies \(\phi \), denoted by \(s \models \phi \), if the formula obtained from \(\phi \) by replacing all variables in s by \(\mathtt {true}\) and all other variables by \(\mathtt {false}\) is valid. We call the set of all possible assignments to variables \(\mathcal {V}\) states and denote them by \(\varSigma _\mathcal {V}\), i.e., \(\varSigma _\mathcal {V}= 2^\mathcal {V}\). An LTL formula over variables \(\mathcal {V}\) is interpreted over infinite words \(w \in {(\varSigma _{\mathcal {V}})}^\omega \). The language of an LTL formula \(\varPhi \), denoted by \(\mathcal {L}(\varPhi )\), is the set of infinite words that satisfy \(\varPhi \), i.e., \(\mathcal {L}(\varPhi )=\left\{ w \in {(\varSigma _{\mathcal {V}})}^\omega ~|~w \models \varPhi \right\} \). We assume some familiarity of the reader with LTL. We often use predicates over \(\mathcal {V}\cup \mathcal {V}'\) where \(\mathcal {V}'\) is the set of primed versions of the variables in \(\mathcal {V}\). Given a subset of variables \(\mathcal {X} \subseteq \mathcal {V}\) and a state \(s \in \varSigma _\mathcal {V}\), we denote by \(s_{| \mathcal {X}}\) the projection of s to \(\mathcal {X}\). For a set \(\mathcal {Z} \subseteq \mathcal {V}\), let \(Same(\mathcal {Z},\mathcal {Z}')\) be a predicate specifying that the value of the variables in \(\mathcal {Z}\) stay unchanged during a transition. Ordered binary decision diagrams (OBDDs) can be used for obtaining concise representations of sets and relations over finite domains [31]. If R is an n-ary relation over \(\left\{ 0,1 \right\} \), then R can be represented by the BDD for its characteristic function: \(f_{R}(x_1, \cdots , x_n) = 1 \) if and only if \(R(x_1,\cdots ,x_n) = 1\). With a little bit abuse of notation and when it is clear from the context, we treat sets and functions as their corresponding predicates.

Game Structures. A game structure \(\mathcal {G}\) of imperfect information is a tuple \(\mathcal {G}= (\mathcal {V}, \varLambda , \tau , \mathcal {OBS}, \gamma )\) where \(\mathcal {V}\) is a finite set of variables, \(\varLambda \) is a finite set of actions, \(\tau \) is a predicate over \(\mathcal {V}\cup \varLambda \cup \mathcal {V}'\) defining \(\mathcal {G}\)’s transition relation, \(\mathcal {OBS}\) is a finite set of observable variables, and \(\gamma : \varSigma _\mathcal {OBS}\rightarrow 2^{\varSigma _\mathcal {V}} \backslash \emptyset \) maps each observation to its corresponding set of states. We assume that the set \(\left\{ \gamma (o) ~|~o \in \varSigma _\mathcal {OBS} \right\} \) partitions the state space \(\varSigma _\mathcal {V}\) (this assumption can be weakened to a covering of the state space where observations can overlap [24, 25].) A game structure \(\mathcal {G}\) is called perfect information if \(\mathcal {OBS}= \mathcal {V}\) and \(\gamma (s) = \left\{ s \right\} \) for all \(s \in \varSigma _\mathcal {V}\). We omit \((\mathcal {OBS}, \gamma )\) in the description of games of perfect information.

In this paper, we consider two-player turn-based game structures where player-1 and player-2 alternate in taking turns. Let \(t \in \mathcal {V}\) be a special variable with domain \(\left\{ 1,2 \right\} \) determining which player’s turn it is during the game. Without loss of generality, we assume that player-1 always start the game. Let \(\varSigma _\mathcal {V}^i = \left\{ s \in \varSigma _\mathcal {V}~|~s_{| t} = i \right\} \) for \(i=1,2\) denote player-i’s states in the game structure. At any state \(s \in \varSigma _\mathcal {V}^i\), the player-i chooses an action \(\ell \in \varLambda \) such that there exists a successor state \(s' \in \varSigma _{\mathcal {V}'}\) where \((s,\ell ,s') \models \tau \). Intuitively, at a player-i state, she chooses an available action according to the transition relation \(\tau \) and the next state of the system is chosen from the possible successor states. For every state \(s \in \varSigma _\mathcal {V}\), we define \(\varGamma (s) = \left\{ \ell \in \varLambda ~|~ \exists s' \in \varSigma _{\mathcal {V}'}. ~ (s,\ell ,s') \models \tau \right\} \) to be the set of available actions at that state. A run in \(\mathcal {G}\) from an initial state \(s_{init} \in \varSigma _\mathcal {V}\) is a sequence of states \(\pi =s_0s_1s_2\cdots \) such that \(s_0 = s_{init}\) and for all \(i > 0\), there is an action \(\ell _i \in \varLambda \) with \((s_{i-1}, \ell _i, s'_i) \models \tau \), where \(s'_i\) is obtained by replacing the variables in \(s_i\) by their primed copies. A run \(\pi \) is maximal if either it is infinite or it ends in a state \(s \in \varSigma _\mathcal {V}\) where \(\varGamma (s) = \emptyset \). The observation sequence of \(\pi \) is the unique sequence \(Obs(\pi ) = o_0 o_1 o_2\cdots \) such that for all \(i \ge 0\), we have \(s_i \in \gamma (o_i)\).

Strategies. A strategy \(\mathtt{{S}}\) in \(\mathcal {G}\) for player-i, \(i \in \left\{ 1,2 \right\} \), is a function \(\mathtt{{S}}: (\varSigma _\mathcal {V})^*.\varSigma _\mathcal {V}^i \rightarrow \varLambda \). A strategy \(\mathtt{{S}}\) in \(\mathcal {G}\) for player-2 is observation-based if for all prefixes \(\rho _1, \rho _2 \in (\varSigma _\mathcal {V})^*.\varSigma _\mathcal {V}^2\), if \(Obs(\rho _1) = Obs(\rho _2)\), then \(\mathtt{{S}}(\rho _1) =\mathtt{{S}}(\rho _2)\). In this paper, we are interested in the existence of observation-based strategies for player-2. Given two strategies \(\mathtt{{S}}_1\) and \(\mathtt{{S}}_2\) for player-1 and player-2, the possible outcomes \(\varOmega _{\mathtt{{S}}_1,\mathtt{{S}}_2}(s)\) from a state \(s \in \varSigma _\mathcal {V}\) are runs: a run \(s_0s_1s_2\cdots \) belongs to \(\varOmega _{\mathtt{{S}}_1,\mathtt{{S}}_2}(s)\) if and only if \(s_0=s\) and for all \(j \ge 0\) either \(s_j\) has no successor, or \(s_j \in \varSigma _\mathcal {V}^i\) and \((s_j,\mathtt{{S}}_i(s_0\cdots s_j),s'_{j+1}) \models \tau \) where \(s_j \in \varSigma _\mathcal {V}^i\).

Winning Condition. A game \((\mathcal {G}, \phi _{init}, \varPhi )\) consists of a game structure \(\mathcal {G}\), a predicate \(\phi _{init}\) specifying an initial state \(s_{init} \in \varSigma _\mathcal {V}\), and an LTL objective \(\varPhi \) for player-2. A run \(\pi = s_0s_1\cdots \) is winning for player-2 if it is infinite and \(\pi \in \mathcal {L}(\varPhi )\). Let \(\Pi \) be the set of runs that are winning for player-2. A strategy \(\mathtt{{S}}_2\) is winning for player-2 if for all strategies \(\mathtt{{S}}_1\) of player-1, we have \(\varOmega _{\mathtt{{S}}_1,\mathtt{{S}}_2}(s_{init}) \subseteq \Pi \), that is, all possible outcomes are winning for player-2. Note that We assume the nondeterminism is always on player-1’s side. We say the game \((\mathcal {G}, \phi _{init}, \varPhi )\) is realizable if and only if the system has a winning strategy in the game \((\mathcal {G}, \phi _{init}, \varPhi )\).

Constructing the Knowledge Game Structure. For a game structure \(\mathcal {G}= (\mathcal {V}, \varLambda , \tau , \mathcal {OBS}, \gamma )\) of imperfect information, a game structure \(\mathcal {G}^K\) of perfect information can be obtained using a subset construction procedure such that for any objective \(\varPhi \), there exists a deterministic observation-based strategy for player-2 in \(\mathcal {G}\) with respect to \(\varPhi \) if and only if there exists a deterministic winning strategy for player-2 in \(\mathcal {G}^K\) for \(\varPhi \) [22, 25]. Intuitively, each state in \(\mathcal {G}^K\) is a set of states of \(\mathcal {G}\) that represents player-2’s knowledge about the possible states in which the game can be after a sequence of observations. In the worst case, the size of \(\mathcal {G}^K\) is exponentially larger than the size of \(\mathcal {G}\). We refer to \(\mathcal {G}^K\) as the knowledge game structure corresponding to \(\mathcal {G}\). In the rest of this section, we only consider game structures of perfect information.

Solving Games. In this paper, we use the bounded synthesis approach [15, 32] to solve the synthesis problems from LTL specifications. In [15], it is shown how LTL formulas can be reduced to safety games. Formally, a safety game is a game \((\mathcal {G}, \phi _{init}, \varPhi )\) with a special safety objective \(\varPhi = \Box (\mathtt {True})\). That is, any infinite run in the game structure \(\mathcal {G}\) starting from an initial state \(s \models \phi _{init}\) is winning for player-2. We drop \(\varPhi \) from description of safety games as it is implicitly defined. Intuitively, in a safety game, the goal of player-2 is to avoid the dead-end states, i.e., states that there is no available action. We refer the readers to [15, 33] for details of reducing LTL formulas to safety games and solving them. Composition of two game structures \(\mathcal {G}_{1} =(\mathcal {V}^{1}, \varLambda ^{1}, \tau ^{1}), \mathcal {G}_{2} =(\mathcal {V}^{2}, \varLambda ^{2}, \tau ^{2})\) of perfect information, denoted by \(\mathcal {G}^\otimes = \mathcal {G}_1 \otimes \mathcal {G}_2\), is a game structure \(\mathcal {G}^\otimes = (\mathcal {V}^{\otimes }, \varLambda ^{\otimes }, \tau ^{\otimes })\) of perfect information where \(\mathcal {V}^\otimes = \mathcal {V}^1 \cup \mathcal {V}^2\), \(\varLambda ^\otimes = \varLambda ^1 \cup \varLambda ^2\), and \(\tau ^\otimes = \tau ^1 \wedge \tau ^2\). To solve a game \((\mathcal {G}, \phi _{init}, \varPhi )\), we first obtain the game structure \(\mathcal {G}^\varPhi \) corresponding to \(\varPhi \) using the methods proposed in [15], and then solve the safety game \((\mathcal {G}\otimes \mathcal {G}^\varPhi , \phi _{init})\) to determine the winner of the game and compute a winning strategy for player-2, if one exists.

Maximally Permissive Strategies. Safety games are memory-less determined, i.e., player-2 wins the game if and only if there exists a strategy \(\mathtt{{S}}: \varSigma _\mathcal {V}^2 \rightarrow \varLambda \). Intuitively, a memory-less strategy only depends on the current state and is independent from the history of the game. Let \((\mathcal {G}, \phi _{init})\) be a safety game, where \(\mathcal {G}_{} =(\mathcal {V}^{}, \varLambda ^{}, \tau ^{})\) is a perfect information game. Assume \(W \subseteq \varSigma _\mathcal {V}\) be the set of winning states for player-2, i.e., from any state \(s \in W\) there exists a strategy \(\mathtt{{S}}_2\) such that for any strategy \(\mathtt{{S}}_1\) chosen by player-1, all possible outcomes \(\pi \in \varOmega _{\mathtt{{S}}_1,\mathtt{{S}}_2}(s)\) are winning. The maximally permissive strategy \(\mathcal {S}: \varSigma ^2_\mathcal {V}\rightarrow 2^\varLambda \) for player-2 is defined as follows: for all \(s \in \varSigma ^2_\mathcal {V}\), \(\mathcal {S}(s) = \{\ell \in \varLambda ~|~ \forall r \in \varSigma _{\mathcal {V}'}. ~ (s,\ell ,r) \models \tau _s \rightarrow r \in W\}\), i.e., the set of actions \(\ell \) where all \(\ell \)-successors belong to the set of winning states. It is well-known that \(\mathcal {S}\) subsumes all winning strategies of player-2 in the safety game \((\mathcal {G}, \varPhi _{init})\). Composition of two maximally permissive strategies \(\mathcal {S}_1, \mathcal {S}_2: \varSigma ^2_\mathcal {V}\rightarrow 2^\varLambda \), denoted by \(\mathcal {S}=\mathcal {S}_1 \otimes \mathcal {S}_2\), is defined as \(\mathcal {S}(s) = \mathcal {S}_1(s) \cap \mathcal {S}_2(s)\) for any \(s \in \varSigma _\mathcal {V}\), i.e., the set of allowed actions by \(\mathcal {S}\) at any state \(s \in \varSigma _\mathcal {V}\) is the intersection of the allowed actions by \(\mathcal {S}_1\) and \(\mathcal {S}_2\). The restriction of the game structure \(\mathcal {G}\) with respect to its maximally permissive strategy \(\mathcal {S}\) is the game structure \(\mathcal {G}[\mathcal {S}]=(\mathcal {V},\varLambda , \tau \wedge \phi _\mathcal {S})\) where \(\phi _\mathcal {S}\) is the predicate encoding \(\mathcal {S}\), i.e., for all \((s,\ell ) \in \varSigma ^2_\mathcal {V}\times \varLambda \), \((s,\ell ) \models \phi _\mathcal {S}\) if and only if \(\ell \in \mathcal {S}(s)\). Intuitively, \(\mathcal {G}[\mathcal {S}]\) is the same as \(\mathcal {G}\) but player-2’s actions are restricted according to \(\mathcal {S}\).

3 Multi-agent Systems

In this section we describe how we model multi-agent systems and formally state the problem that is considered in the rest of the paper. Typically game structures arise from description of open systems in a particular language [34]. In our framework, we use agents to specify a system in a modular manner. An agent \(\mathtt{{a}}_{} = (\mathtt{{type}}_{}, \mathcal {I}_{}, \mathcal {O}_{}, \varLambda _{}, \tau _{}, \mathcal {OBS}_{}, \gamma _{})\) is a tuple where \(\mathtt{{type}}\in \left\{ \text {controlled}, \text {uncontrolled} \right\} \) indicates whether the agent can be controlled or not, \(\mathcal {O}\) (\(\mathcal {I}\)) is a set of output (input) variables that the agent can (cannot, respectively) control by assigning values to them, \(\varLambda \) is a set of actions for the agent, and \(\tau \) is a predicate over \(\mathcal {I}\cup \mathcal {O}\cup \varLambda \cup \mathcal {O}'\) that specifies the possible transitions of the agent where \(\mathcal {O}'\) is the primed copies of the variables \(\mathcal {O}\), \(\mathcal {OBS}\) is a set of observable variables, and \(\gamma : \varSigma _\mathcal {OBS}\rightarrow 2^{\varSigma _{\mathcal {I}\cup \mathcal {O}}}\) is the observation function that maps agent’s observations to its corresponding set of states. Intuitively, \(\tau \) defines what actions an agent can choose at any state \(s \in \varSigma _{\mathcal {I}} \times \varSigma _{\mathcal {O}}\) and what are the possible next valuations over agent’s output variables for the chosen action. That is, \((i,o,\ell ,o') \models \tau \) for \(i \in \varSigma _{\mathcal {I}}\), \(o \in \varSigma _{\mathcal {O}}\), \(\ell \in \varLambda \), and \(o' \in \varSigma _{\mathcal {O}'}\) means that at any state s of the system with \(s_{| \mathcal {I}} = i\) and \(s_{| \mathcal {O}} = o\), the agent can take action \(\ell \), and a state with component \(o'\) is a possible successor. A perfect agent is an agent with \(\mathcal {OBS}= \mathcal {I}\cup \mathcal {O}\) and \(\gamma (s) = \left\{ s \right\} \) for all \(s \in \varSigma _\mathcal {I}\times \varSigma _\mathcal {O}\). We omit \((\mathcal {OBS}, \gamma )\) in the description of perfect agents. An agent \(\mathtt{{a}}\) is called local if and only if its transition relation \(\tau \) is a predicate over \(\mathcal {O}\cup \varLambda \cup \mathcal {O}'\), i.e., it does not depend on any uncontrolled variable \(v \in \mathcal {I}\).

A multi-agent system \(\mathcal {M}= \left\{ \mathtt{{a}}_1, \mathtt{{a}}_2, \cdots , \mathtt{{a}}_n \right\} \) is defined as a set of agents \(\mathtt{{a}}_{i} = (\mathtt{{type}}_{i}, \mathcal {I}_{i}, \mathcal {O}_{i}, \varLambda _{i}, \tau _{i}, \mathcal {OBS}_{i}, \gamma _{i})\) for \(1 \le i \le n\). Let \(\mathcal {V}= \bigcup _{i=1}^n \mathcal {O}_i\) be the set of agents’ output variables. We assume that the set of output variables of agents are pairwise disjoint, i.e., \(\forall 1\le i \le n.~ \mathcal {O}_i \cap \mathcal {O}_j = \emptyset \), and the set of input variables \(\mathcal {I}_i\) for each agent \(\mathtt{{a}}_i \in \mathcal {M}\) is a subset of variables controlled by other agents, i.e., \(\mathcal {I}_i \subseteq \mathcal {V}\backslash \mathcal {O}_i\). We further make some simplifying assumptions. First, we assume that all uncontrolled agents are perfect, i.e., uncontrolled agent has perfect information about the state of the system at any time-step. Second, we assume that all controlled agents are cooperative while uncontrolled ones can play adversarially, i.e., the controlled agents cooperate with each other and make joint decisions to enforce the global objective. Finally, we assume that the observation variables for controlled agents are pairwise disjoint, i.e., \(\forall 1\le i \le n.~ \mathcal {OBS}_i \cap \mathcal {OBS}_j = \emptyset \), and that each controlled agent has perfect knowledge about other controlled agents’ observations. That is, controlled agents share their observations with each other. Intuitively, it is as if the communication between controlled agents is instantaneous and error-free, i.e., they have perfect communication and tell each other what they observe. This assumption helps us preserve the two-player game setting and to stay in a decidable subclass of the more general problem of multi-player games with partial information. Note that multiplayer games of incomplete information are undecidable in general [6].

In this paper we focus on a special setting where all agents are local. A multi-agent system \(\mathcal {M}= \left\{ \mathtt{{a}}_1, \mathtt{{a}}_2, \cdots , \mathtt{{a}}_n \right\} \) is dynamically decoupled (or decoupled in short) iff all agents \(\mathtt{{a}}\in \mathcal {M}\) are local. Intuitively, agents in a decoupled multi-agent system can choose their action based on their own local state and regardless of the local states of other agents in the system. That is, the availability of actions for each agent in any state of the system is only a function of the agent’s local state. Such setting arises in many applications, e.g., robot motion planning, where possible transitions of agents are independent from each other. For example, how a robot moves around a room is usually based on its own characteristics and motion primitives [9]. Note that this does not mean that the controlled agents are completely decoupled, as the objectives might concern different agents in the system, e.g., collision avoidance objective for a system consisting of multiple controlled robots, which requires cooperation between agents.

In our framework, the user describes the agents and specifies the objective as a conjunctive LTL formula. From description of the agents, a game structure is obtained that encodes how the state of the system evolves. Formally, given a decoupled multi-agent system \(\mathcal {M}= \mathcal {M}^\mathtt{{u}}\biguplus \mathcal {M}^\mathtt{{c}}\) partitioned into a set \(\mathcal {M}^\mathtt{{u}}= \left\{ \mathtt{{u}}_1, \cdots , \mathtt{{u}}_m \right\} \) of uncontrolled agents and a set \(\mathcal {M}^\mathtt{{c}}= \left\{ \mathtt{{c}}_1, \cdots , \mathtt{{c}}_n \right\} \) of controlled agents, the turn-based game structure \(\mathcal {G}^\mathcal {M}\) induced by \(\mathcal {M}\) is defined as \(\mathcal {G}^\mathcal {M}= (\mathcal {V}^{}, \varLambda ^{}, \tau ^{}, \mathcal {OBS}^{}, \gamma ^{})\) where \(\mathcal {V}= \left\{ t \right\} \cup \bigcup _{\mathtt{{a}}\in \mathcal {M}} \mathcal {O}_{\mathtt{{a}}}\) is the set of all variables in \(\mathcal {M}\) with t as a turn variable, \(\varLambda = \bigcup _{\mathtt{{a}}\in \mathcal {M}} \varLambda _{\mathtt{{a}}}\) is the set of actions, \(\mathcal {OBS}= \bigcup _{\mathtt{{c}}\in \mathcal {M}^\mathtt{{c}}} \mathcal {OBS}_{\mathtt{{c}}}\) is the set of all observation variables of controlled agents (note that we assume all uncontrolled agents are perfect,) and \(\tau \) and \(\gamma \) are defined as follows:

$$\begin{aligned} \begin{aligned} \tau&=\tau _e \vee \tau _s\\ \tau _e&= t=1 \wedge t'=2 \wedge \bigwedge _{\mathtt{{u}}\in \mathcal {M}^\mathtt{{u}}} \tau _\mathtt{{u}}\wedge \bigwedge _{\mathtt{{c}}\in \mathcal {M}^\mathtt{{c}}} Same(\mathcal {O}_{\mathtt{{c}}}, {\mathcal {O}}'_{\mathtt{{c}}})\\ \tau _s&= t=2 \wedge t'=1 \wedge \bigwedge _{\mathtt{{c}}\in \mathcal {M}^\mathtt{{c}}} \tau _\mathtt{{c}}\wedge \bigwedge _{\mathtt{{u}}\in \mathcal {M}^\mathtt{{u}}} Same(\mathcal {O}_{\mathtt{{u}}}, \mathcal {O}'_{\mathtt{{u}}})\\ \gamma&= \bigwedge _{\mathtt{{c}}\in \mathcal {M}^\mathtt{{c}}} \gamma _{\mathtt{{c}}} \end{aligned} \end{aligned}$$

Intuitively, at each step, uncontrolled agents take actions consistent with their transition relations, and their variables get updated while the controlled agents’ variables stay unchanged. Then the controlled agents react concurrently and simultaneously by taking actions according to their transition relations, and their corresponding variables get updated while the uncontrolled agents’ variables stay unchanged.

Fig. 1.
figure 1

Grid-world with static obstacles

Example 1

Let \(R_1\) and \(R_2\) be two robots in an \(n \times n\) grid-world similar to the one shown in Fig. 1. Assume \(R_1\) is an uncontrolled robot, whereas \(R_2\) can be controlled. In the sequel, let i range over \(\left\{ 1,2 \right\} \). At each time any robot \(R_i\) can move to one of its neighboring cells by taking an action from the set \(\varLambda _i = \left\{ up_i, down_i, right_i, left_i \right\} \). Furthermore, assume that \(R_2\) has imperfect sensors and can only observe \(R_1\) when \(R_1\) is in one of its adjacent cells. Let \((x_i, y_i)\) represent the position of robot \(R_i\) in the grid-world at any timeFootnote 1. We define \(\mathcal {O}_i = \left\{ x_i, y_i \right\} \) and \(\mathcal {I}_i = \mathcal {O}_{3-i}\) as the output and input variables, respectively. Note that the controlled variables by one agent are the input variables of the other agent. Transition relation \(\tau _i = \bigwedge _{\ell \in \varLambda _i} \tau _{\ell }\) is defined as conjunction of four parts corresponding to robot’s action where

$$\begin{aligned} \begin{aligned} \tau _{up_i} = (y_i> 1) \wedge up_i \wedge (y_i' \leftrightarrow y_i - 1) \wedge Same(x_i, x_i')\\ \tau _{down_i} = (y_i< n) \wedge down_i \wedge (y_i' \leftrightarrow y_i + 1) \wedge Same(x_i, x_i')\\ \tau _{left_i} = (x_i > 1) \wedge left_i \wedge (x_i' \leftrightarrow x_i - 1) \wedge Same(y_i, y_i')\\ \tau _{right_i} = (x_i < n) \wedge right_i \wedge (x_i' \leftrightarrow x_i - 1) \wedge Same(y_i, y_i')\\ \end{aligned} \end{aligned}$$

Intuitively, each \(\tau _\ell \) for \(\ell \in \varLambda _i\) specifies whether the action is available in the current state and what is its possible successors. For example, \(\tau _{up_i}\) indicates that if \(R_i\) is not at the top row \((y_i > 1)\), then the action \(up_i\) is available and if applied, in the next state the value of \(y_i\) is decremented by one and the value of \(x_i\) does not change. Next we define the observation function \(\gamma _2\) for \(R_2\). It is easier and more intuitive to define \(\gamma _2^{-1}\), and since observations partition the state space \(\gamma _2 = (\gamma _2^{-1})^{-1}\) is defined. Formally,

$$ \gamma _2^{-1}(a,b,c,d) = {\left\{ \begin{array}{ll} (a,b, c, d) &{} \text { if } a-1 \le c \le a+1 \wedge b-1 \le d \le b+1\\ (\perp , \perp , c, d) &{} \text { otherwise} \end{array}\right. } $$

Let \(\mathcal {OBS}_2 = \left\{ x_1^o, y_1^o, x_2^o, y_2^o \right\} \) where \(x_1^o, y_1^o \in \left\{ \perp , 1, 2, \cdots , n \right\} \) and \(x_2^o, y_2^o \in \left\{ 1, \cdots , n \right\} \). Intuitively, \(R_2\) observes its own local state perfectly. Furthermore, if \(R_1\) is in one of its adjacent cells, its position is observed perfectly, otherwise, \(R_1\) is away and its location cannot be observed. \(\gamma _2\) can be symbolically encoded as \(\bigvee _{o \in {\varSigma _{\mathcal {OBS}}}} (o \wedge \phi _{\gamma (o)})\) where \(\phi _{\gamma (o)}\) is the predicate specifying the set \(\gamma (o)\). Finally, we let \(R_1=(\text {uncontrolled}, \mathcal {I}_1, \mathcal {O}_1, \varLambda _1, \tau _1)\) and \(R_2 = (\text {controlled}, \mathcal {I}_2, \mathcal {O}_2, \varLambda _2, \mathcal {OBS}_2, \gamma _2)\). Note that \(R_1\) (\(R_2\)) is modeled as a perfect (imperfect, respectively) local agent.

The game structure \(\mathcal {G}^\mathcal {M}\) of imperfect information corresponding to multi-agent system \(\mathcal {M}= \left\{ R_1, R_2 \right\} \) is a tuple \(\mathcal {G}^\mathcal {M}= (\mathcal {V}^{}, \varLambda ^{}, \tau ^{}, \mathcal {OBS}^{}, \gamma ^{})\) where \(\mathcal {V}= \left\{ t \right\} \cup \mathcal {O}_1 \cup \mathcal {O}_2\), \(\varLambda = \varLambda _1 \cup \varLambda _2\), \(\tau = \tau _e \vee \tau _s\), \(\tau _e = t=1 \wedge t'=2 \wedge \tau _1 \wedge Same(\mathcal {O}_2, \mathcal {O}_2')\), \(\tau _s = t=2 \wedge t'=1 \wedge \tau _2 \wedge Same(\mathcal {O}_1, \mathcal {O}_1')\), \(\mathcal {OBS}= \mathcal {OBS}_2\), and \(\gamma = \gamma _2\).    \(\square \)

We now formally define the problem we consider in this paper.

Problem 1

Given a decoupled multi-agent system \(\mathcal {M}= \mathcal {M}^\mathtt{{u}}\biguplus \mathcal {M}^\mathtt{{c}}\) partitioned into uncontrolled \(\mathcal {M}^\mathtt{{u}}= \left\{ \mathtt{{u}}_1, \cdots , \mathtt{{u}}_m \right\} \) and controlled agents \(\mathcal {M}^\mathtt{{c}}= \left\{ \mathtt{{c}}_1, \cdots , \mathtt{{c}}_n \right\} \), a predicate \(\phi _{init}\) specifying an initial state, and an objective \(\varPhi = \varPhi _1 \wedge \cdots \wedge \varPhi _k\) as conjunction of \(k \ge 1\) LTL formulas \(\varPhi _i\), compute strategies \(\mathtt{{S}}_1, \cdots , \mathtt{{S}}_n\) for controlled agents such that the strategy \(\mathtt{{S}}= \mathtt{{S}}_1 \otimes \cdots \otimes \mathtt{{S}}_n\) defined as composition of the strategies is winning for the game \((\mathcal {G}^\mathcal {M}, \phi _{init}, \varPhi )\), where \(\mathcal {G}^\mathcal {M}\) is the game structure induced by \(\mathcal {M}\).

4 Compositional Controller Synthesis

We now explain our solution approach for Problem 1 stated in Sect. 3. Algorithm 1 summarizes the steps for compositional synthesis of strategies for controlled agents in a multi-agent system. It has three main parts. First the synthesis problem is automatically decomposed into subproblems by taking advantage of the structure in the multi-agent system and given objective. Then the subproblems are solved separately and their solutions are composed. The composition may restrict the possible actions that are available for agents at some states. The composition is then projected back to each subproblem and the subproblems are solved again with new restrictions. This process is repeated until either a subgame becomes unrealizable, or computed solutions for subproblems reach a fixed point. Finally, a set of strategies, one for each controlled agent, is extracted by decomposing the strategy obtained in the previous step. Next, we explain Algorithm 1 in more detail.

figure a

4.1 Decomposition of the Synthesis Problem

The synthesis problem is decomposed into subproblems in lines \(2-9\) of Algorithm 1. The main idea behind decomposition is that in many cases, each conjunct \(\varPhi _i\) of the objective \(\varPhi \) only refers to a small subset of agents. This observation is utilized to obtain a game structure from description of those agents that are involved in \(\varPhi _i\), i.e., only agents are considered to form and solve a game with respect to \(\varPhi _i\) that are relevant. In other words, each subproblem corresponds to a conjunct \(\varPhi _i\) of the global objective \(\varPhi \) and the game structure obtained from agents involved in \(\varPhi _i\).

For each conjunct \(\varPhi _i\), \(1 \le i \le k\), Algorithm 1 first obtains the set \(\mathcal {INV}_i\) of involved agents using the procedure Involved. Formally, let \(\mathcal {V}_{\varPhi _i} \subseteq \mathcal {V}\) be the set of variables appearing in \(\varPhi _i\)’s formula. The set of involved agents are those agents whose controlled variables appear in the conjunt’s formula, i.e., \(\mathbf {Involved}(\varPhi _i) = \left\{ a \in \mathcal {M}~|~\mathcal {O}_a \cap \mathcal {V}_{\varPhi _i} \not = \emptyset \right\} \).

A game structure \(\mathcal {G}_i\) is then obtained from the description of the agents \(\mathcal {INV}_i\) using the procedure CreateGameStructure as explained in Sect. 3. The projection \(\phi _{init}^i\) of the predicate \(\phi _{init}\) with respect to the involved agents is computed next. The procedure Project takes a predicate \(\phi \) over variables \(\mathcal {V}_\phi \) and a subset \(\mathcal {X} \subseteq \mathcal {V}_\phi \) of variables as input, and projects the predicate with respect to the given subset. Formally, \(\mathbf {Project}(\phi ,\mathcal {X}) = \left\{ s_{|\mathcal {X}} ~|~s \in \varSigma _{\mathcal {V}_\phi } \right\} \).

The knowledge game structure \(\mathcal {G}^K_i\) corresponding to \(\mathcal {G}_i\) is obtained at line 7. Note that this step is not required if the system only includes perfect agents that can observe the state of the game perfectly at any time-step. Finally, the objective \(\varPhi _i\) is transformed into a game structure using the algorithms in [15, 33] and composed with \(\mathcal {G}^K_i\) to obtain a safety game \((\mathcal {G}^d_i,\phi ^i_{init})\). The result of decomposition phase is k safety games \(\left\{ (\mathcal {G}^d_1, \phi ^1_{init}), \cdots , (\mathcal {G}^d_k, \phi ^k_{init}) \right\} \) that form the subproblems for the compositional synthesis phase.

Example 2

Let \(R_i\) for \(i=1,\cdots ,4\) be four robots in an \(n \times n\) grid-world, where \(R_4\) is uncontrolled and other robots are controlled. For simplicity, assume that all agents are perfect. At each time-step any robot \(R_i\) can move to one of its neighboring cells by taking an action from the set \(\left\{ up_i, down_i, right_i, left_i \right\} \) with their obvious meanings. Consider the following objective \(\varPhi = \varPhi _1 \wedge \varPhi _2 \wedge \varPhi _3 \wedge \varPhi _{12} \wedge \varPhi _{23} \) where \(\varPhi _i\) for \(i=1,2,3\) specifies that \(R_i\) must not collide with \(R_4\), and \(\varPhi _{12}\) (\(\varPhi _{23}\)) specifies that \(R_1\) and \(R_2\) (\(R_2\) and \(R_3\), respectively) must avoid collision with each other. Sub-formulas \(\varPhi _i\), \(i=1,2,3\), only involve agents \(R_i \text { and } R_4\), i.e., \(\mathcal {INV}(\varPhi _i)=\left\{ R_i,R_4 \right\} \). Therefore, the game structures \(\mathcal {G}_i\) induced by agents \(R_i\) and \(R_4\) are composed with the game structure computed for \(\varPhi _i\) to form a sub-problem as a safety game. Similarly, we obtain safety games for objectives \(\varPhi _{12}\) and \(\varPhi _{23}\) with \(\mathcal {INV}(\varPhi _{12}) = \left\{ R_1, R_2 \right\} \) and \(\mathcal {INV}(\varPhi _{23})=\left\{ R_2, R_3 \right\} \), respectively.    \(\square \)

Remark 1

The decomposition method used here is not the only way to decompose the problem, neither it is necessarily optimal. More efficient decomposition technique can be used to obtain quicker convergence in Algorithm 1 for example by different grouping of conjuncts. Nevertheless, the decomposition technique explained above is simple and it was effective in our experiments.

4.2 Compositional Synthesis

The safety games obtained in decomposition phase are compositionally solved in lines \(9-21\) of Algorithm 1. At each iteration of the main loop, the subproblems \((\mathcal {G}^d_i, \phi ^i_{init})\) are solved, and a maximally permissive strategy \(\mathcal {S}^d_i\) is computed for them, if one exists. Computed strategies are then composed in line 11 of Algorithm 1 to obtain a strategy \(\mathcal {S}\) for the whole system. The strategy \(\mathcal {S}\) is then projected back to sub-games, and it is compared if all the projected strategies are equivalent to the strategies computed for the subproblems. If that is the case, the main loop terminates, while \(\mathcal {S}\) is winning for the game \((\mathcal {G}^d, \phi _{init})\) where \((\mathcal {G}^d, \phi _{init})\) is the safety game associated with the multi-agent system \(\mathcal {M}\) and objective \(\varPhi \). Otherwise, at least one of the subproblems needs to be restricted. Each sub-game is restricted by the computed projection, and the process is repeated. The loop terminates either if at some iteration a subproblem becomes unrealizable, or if permissive strategies \(\mathcal {S}_1, \cdots , \mathcal {S}_k\) reach a fixed point. In the latter case, a set of strategies, one for each controlled agent is extracted from \(\mathcal {S}\) as explained below.

4.3 Computing Strategies for the Agents

Let \(\mathcal {V}^\otimes = \bigcup _{i=1}^k \mathcal {V}_{\mathcal {G}^d_i}\) be the set of all variables used to encode the game structures \(\mathcal {G}^d_i\), and \(\varLambda ^c = \varLambda _{\mathtt{{c}}_1} \times \cdots \times \varLambda _{\mathtt{{c}}_n}\) be the set of controlled agents’ actions. Once a permissive strategy \(\mathcal {S}: \varSigma _{\mathcal {V}^\otimes } \rightarrow 2^{{\varLambda }^c}\) is computed, a winning strategy \(\mathtt{{S}}_d: \varSigma _{\mathcal {V}^\otimes } \rightarrow \varLambda ^c\) is obtained from \(\mathcal {S}\) by restricting the non-deterministic action choices of the controlled agents to a single action. The strategy \(\mathtt{{S}}_d\) is then decomposed into strategies \(\mathtt{{S}}_1:\varSigma _{\mathcal {V}^\otimes } \rightarrow \varLambda _{\mathtt{{c}}_1}, \cdots ,\mathtt{{S}}_n: \varSigma _{\mathcal {V}^\otimes } \rightarrow \varLambda _{\mathtt{{c}}_n}\) for the agents simply by projecting the actions over system transitions to their corresponding agents. Formally, for any \(s \in \varSigma _{\mathcal {V}^\otimes }\) such that \(\mathcal {S}(s)\) is defined, let \(\mathtt{{S}}_d(s)=\sigma \in \mathcal {S}(s)\) where \(\sigma =(\sigma _1, \cdots , \sigma _n) \in \varLambda ^c\) is an arbitrary action chosen from possible actions permitted by \(\mathcal {S}\) in the state s. Agents’ strategies are defined as \(\mathtt{{S}}_i(s)=\sigma _i\) for \(i=1,\cdots ,n\). Note that we assume each controlled agent has perfect knowledge about other controlled agents’ observations. The following theorem establishes the correctness of Algorithm 1.

Theorem 1

Algorithm 1 is soundFootnote 2.

Proof

Note that Algorithm 1 always terminates, that is because either eventually a fixed point over strategies is reached, or a sub-game becomes unrealizable which indicates that the objective cannot be enforced. Consider the permissive strategies \(\mathcal {S}^d_i\) and their projections \(\mathcal {C}_i\). We have \(\mathcal {C}_i(s) \subseteq \mathcal {S}^d_i(s)\) for any \(s \in \varSigma _\mathcal {V}\), and as a result of composing and projecting intermediate strategies, we will obtain more restricted sub-games. As the state space and available actions in any state is finite, at some point, either a sub-game becomes unrealizable because the system player becomes too restricted and cannot win the game, or all strategies reach a fixed point. Therefore, the algorithm always terminates.

We now show that Algorithm 1 is sound, i.e., if it computes strategies \((\mathtt{{S}}_1, \cdots , \mathtt{{S}}_n)\), then the strategy \(\mathtt{{S}}=\bigotimes _{i=1}^n \mathtt{{S}}_i\) is a winning strategy in the game \((\mathcal {G}^\mathcal {M}, \phi _{init}, \varPhi )\), where \(\mathcal {G}^\mathcal {M}\) is the game structure induced by \(\mathcal {M}\). Let \(\mathcal {S}^* = \bigotimes _{i=1}^k \mathcal {S}^d_i\) be the fixed point reached over the strategies. First note that any run in \(\mathcal {G}^d_i[\mathcal {S}^d_i]\) starting from a state \(s \models \phi ^i_{init}\) for \(1 \le i \le k\) satisfies the conjunct \(\varPhi _i\) since \(\mathcal {S}^d_i\) is winning in the corresponding safety game. That is, the restriction of the game structure \(\mathcal {G}^d_i\) to the strategy \(\mathcal {S}^d_i\) satisfies \(\varPhi _i\). Consider any run \(\pi =s_0s_1s_2\cdots \) in the restricted game structure \(\mathcal {G}^d[\mathcal {S}^*]\) starting from the initial state \(s_0 \models \phi _{init}\) where \(\mathcal {G}^d = \bigotimes _{i=1}^k \mathcal {G}^d_i\). Let \(\pi ^i=s^i_0s^i_1s^i_2\cdots \) for \(1 \le i \le k\) be the projection of \(\pi \) with respect to variables \(\mathcal {V}^d_i\) of the game structure \(\mathcal {G}^d_i\), i.e., \(s^i_j = s_{j_{|\mathcal {V}^d_i}}\) for \(j \ge 0\). Since \(s^i_0 \models \phi ^i_{init}\) and \(\mathcal {S}^d_i\) is equivalent to the projection of \(\mathcal {S}^*\) with respect to variables and actions in the game structure \(\mathcal {G}^d_i\), it follows that \(\pi ^i\) is a winning run in the safety game \((\mathcal {G}^d_i[\mathcal {S}^d_i],\varPhi _i)\), i.e., \(\pi ^i \models \varPhi _i\). As \(\pi ^i \models \varPhi _i\) for \(1 \le i \le k\), we have \(\pi \models \varPhi = \bigwedge _{i=1}^k \varPhi _i\). It follows that \(\mathcal {S}^*\) is winning in the safety game \((\mathcal {G}^d,\phi _{init})\). Moreover, \(\mathcal {S}^*\) is also winning with respect to the original game as \((\mathcal {G}^d, \phi _{init})\) is the safety game associated with \((\mathcal {G}^\mathcal {M}, \phi _{init},\varPhi )\) [15]. It is easy to see that the set \((\mathtt{{S}}_1, \cdots , \mathtt{{S}}_n)\) of strategies extracted from \(\mathcal {S}^*\) by Algorithm 1 is winning for the game \((\mathcal {G}^\mathcal {M}, \phi _{init}, \varPhi )\).    \(\square \)

Remark 2

Algorithm 1 is different from compositional algorithm proposed in [15] in two ways. First, it composes maximally permissive strategies in contrast to composing game structures as proposed in [15]. The advantage is that strategies usually have more compact symbolic representations compared to game structuresFootnote 3. Second, in the compositional algorithm in [15], sub-games are composed and a symbolic step, i.e., a post or pre-image computation, is performed over the composite game. In our experiments, performing a symbolic step over composite game resulted in a poor performance, often worse than the centralized algorithm. Algorithm 1 removes this bottleneck as it is not required in our setting. This leads to a significant improvement in algorithm’s performance since image and pre-image computations are typically the most expensive operations performed by symbolic algorithms [35].

5 Case Study

We now demonstrate the techniques on a robot motion planning case study similar to those that can be found in the related literature (e.g., [810].) Consider a square grid-world with some static obstacles similar to the one depicted in Fig. 1. We consider a multi-agent system \(\mathcal {M}= \left\{ \mathtt{{u}}_1, \cdots , \mathtt{{u}}_m, \mathtt{{c}}_1, \cdots , \mathtt{{c}}_n \right\} \) with uncontrolled robots \(\mathcal {M}^u = \left\{ \mathtt{{u}}_1, \cdots , \mathtt{{u}}_m \right\} \) and controlled ones \(\mathcal {M}^c = \left\{ \mathtt{{c}}_1, \cdots , \mathtt{{c}}_n \right\} \). At any time-step, any controlled robot \(\mathtt{{c}}_i\) for \(1 \le i \le n\) can move to one of its neighboring cells by taking actions \(up_i, down_i, left_i, \text { and } right_i\), or it can stay put by taking the action stop. Any uncontrolled robot \(\mathtt{{u}}_j\) for \(1 \le j \le m\) stays on the same row where they are initially positioned, and at any time-step can move to their left or right neighboring cells by taking actions \(left_j \text { and } right_j\), respectively. We consider the following objectives for the systems, \((\varPhi _1)\) collision avoidance, i.e., controlled robots must avoid collision with static obstacles and other robots, \((\varPhi _2)\) formation maintenance, i.e., each controlled robot \(\mathtt{{c}}_i\) must keep a linear formation (same horizontal or vertical coordinate) at all times with the subsequent controlled robot \(\mathtt{{c}}_{i+1}\) for \(1 \le i < n\), \((\varPhi _3)\) bounded reachability, i.e., controlled robots must reach the bottom row in a pre-specified number of steps. We consider two settings. First we assume all agents are perfect, i.e., all agents have full-knowledge of the state of the system at any time-step. Then we assume controlled agents are imperfect and can observe uncontrolled robots only if they are nearby and occupying an adjacent cell, similar to Example 1.

We apply two different methods to synthesize strategies for the agents. In the Centralized method, a game structure for the whole system is obtained first, and then a winning strategy is computed with respect to the considered objective. In the Compositional approach, the strategy is computed compositionally using Algorithm 1. We implemented the algorithms in Java using the BDD package JDD [36]. The experiments are performed on an Intel core i7 3.40 GHz machine with 16 GB memory. In our experiments, we vary the number of uncontrolled and controlled agents, size of the grid-world, and the objective of the system as shown in Tables 1 and 2. The columns show the number of uncontrolled and controlled robots, considered objective, size of the grid-world, number of variables in the system, and the time and memory usage for different approaches, respectively. Furthermore, we define \(\varPhi _{12} = \varPhi _1 \wedge \varPhi _2\), \(\varPhi _{13} = \varPhi _1 \wedge \varPhi _3\), and \(\varPhi = \varPhi _1 \wedge \varPhi _2 \wedge \varPhi _3\).

Table 1. Experimental results for systems with perfect agents
Table 2. Experimental results for systems with imperfect agents

Multi-agent Systems with Perfect Agents. Table 1 shows some of our experimental results for the setting where all agents are perfect (more experimental data is provided in the technical report.) Note that the compositional algorithm does not always perform better than the centralized alternative. Indeed, if the conjuncts of objectives involve a large subset of agents, compositional algorithm comes closer to the centralized algorithm. Intuitively, if the agents are “strongly” coupled, the overhead introduced by compositional algorithm is not helpful, and the central algorithm performs better. For example, when the system consists of a controlled robot and an uncontrolled one along with a single safety objective, compositional algorithm coincides with the centralized one, and centralized algorithm performs slightly better. However, if the sub-problems are “loosely” coupled, which is the case in many practical problems, the compositional algorithm significantly outperforms the centralized one, both from time and memory perspective, as we increase the number of agents and make the objectives more complex, and it can solve problems where the centralized algorithm is infeasible.

Multi-agent Systems with Imperfect Controlled Agents. Not surprisingly, scalability is a bigger issue when it comes to games with imperfect information due to the subset construction procedure, which leads to yet another reason for compositional algorithm to perform better than the centralized alternative. Table 2 shows some of our experimental results for the setting where controlled agents are imperfect. While the centralized approach fails to compute the knowledge game structure due to the state explosion problem, the compositional algorithm performs significantly better by decomposing the problem and performing subset construction on smaller and more manageable game structures of imperfect information.

6 Conclusions and Future Work

We proposed a framework for controller synthesis for multi-agent systems. We showed that by taking advantage of the structure in the system to compositionally synthesize the controllers, and by representing and exploring the state space symbolically, we can achieve better scalability and solve more realistic problems. Our preliminary results shows the potential of reactive synthesis as planning algorithms in the presence of dynamically changing and adversarial environment.

In our implementation, we performed the subset construction procedure symbolically and we only constructed the part of it that is reachable from the initial state. One of our observations was that by considering more structured observation functions for game structures of imperfect information, such as the ones considered in our case study where the robots show a “local” observation behavior, the worst case exponential blow-up in the constructed knowledge game structure does not occur in practice. In future, we plan to investigate how considering more restricted yet practical observation functions can enable us to handle systems with imperfect agents of larger size.