Efficient approaches for multi-agent planning

Borrajo, Daniel; Fernández, Susana

doi:10.1007/s10115-018-1202-1

Efficient approaches for multi-agent planning

Regular Paper
Open access
Published: 03 May 2018

Volume 58, pages 425–479, (2019)
Cite this article

Download PDF

You have full access to this open access article

Knowledge and Information Systems Aims and scope Submit manuscript

Efficient approaches for multi-agent planning

Download PDF

11k Accesses
12 Citations
Explore all metrics

Abstract

Multi-agent planning (MAP) deals with planning systems that reason on long-term goals by multiple collaborative agents which want to maintain privacy on their knowledge. Recently, new MAP techniques have been devised to provide efficient solutions. Most approaches expand distributed searches using modified planners, where agents exchange public information. They present two drawbacks: they are planner-dependent; and incur a high communication cost. Instead, we present two algorithms whose search processes are monolithic (no communication while individual planning) and MAP tasks are compiled such that they are planner-independent (no programming effort needed when replacing the base planner). Our two approaches first assign each public goal to a subset of agents. In the first distributed approach, agents iteratively solve problems by receiving plans, goals and states from previous agents. After generating new plans by reusing previous agents’ plans, they share the new plans and some obfuscated private information with the following agents. In the second centralized approach, agents generate an obfuscated version of their problems to protect privacy and then submit it to an agent that performs centralized planning. The resulting approaches are efficient, outperforming other state-of-the-art approaches.

A brief introduction to distributed systems

Article Open access 16 August 2016

Interoperability in Internet of Things: Taxonomies and Open Challenges

Article Open access 21 July 2018

Internet of things technology, research, and challenges: a survey

Article 02 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multi-agent systems are being used for many applications. One of the newer research areas with a huge potential is the one related to integrating multi-agent systems with classical planning capabilities, generating the subfield of multi-agent planning (MAP). Automated planning deals with the task of finding sequences of actions that achieve a set of goals from an initial state. A key requirement for most automated planning techniques is to be domain-independent, so the same problem solver is able to generate solutions in different domains by receiving as input a domain model expressed in a declarative language, PDDL (Planning Domain Description Language) [36]. Automated planning has been useful in many real-world applications, ranging from rovers’ operations [1] to transportation logistics [34] or fire extinction [29]. In most of these applications, planning involves selecting plans for a set of agents (e.g., rovers, trucks or firemen). Recently, there has been an interest on handling these classical planning tasks by explicitly considering the characteristic of being multi-agent. Also, the new approaches deal with the task of preserving privacy of information among the agents.

There have been two main approaches to solve multi-agent planning tasks: centralized and distributed [24, 26]. The centralized approach aims at generating the complete plan for all agents in the same common search episode. A problem with this approach is that the complexity grows exponentially with the number of agents in general. Also, the naïve centralized approach could have some difficulties maintaining the privacy on some internal agents’ knowledge. This private information can refer to any of the planning components: states, goals and actions. For instance, in a transportation logistics application, a company might have divided its operation in several branches [34]. Each branch might receive its list of specific services (goals) to be addressed. The central branch might as well receive some common services to be planned for. So, there is a mixture of public and private goals. In some branches, they might use public trains to transport goods, while in others they might use private ships. Thus, there can be private and public objects as well. The same can be said about information on states. For instance, the locations of drivers of other branches.

Distributed planning consists of each agent solving its own planning task; planning is performed by the agents. However, given that in many domains, there are public interacting goals, it might be that one agent, $\phi _i$, generates a solution that invalidates another agent’s, $\phi _j$, solution since it did not take into account $\phi _j$’s private and public goals and plans to solve them. Potential solutions are plan merging [32, 50] or plan coordination [22], that can be as hard as centralized planning and are usually only useful for loosely coupled tasks [21]. If agents can achieve their goals creating few (or no) interactions with the plans of other agents, the tasks are called loosely coupled. As the number of interactions increases, the tasks become more tightly coupled. Interaction among agents’ plans can be either positive—one agent achieves a set of (sub)goals that are also achieved by another agent—or negative—an agent deletes a subset of (sub)goals achieved by another agent. Brafman and Domshlak have recently shown that complexity of MAP can scale polynomially in the number of agents if some parameters related to the coupling level are fixed [13].

We are interested in a deterministic MAP setting, where we have a set of agents for which we have to find a solution to a collaborative MAP problem with private information [67]. Thus, agents are not self-interested and there is no reasoning on agents’ utility. There are two main features of MAP. The first one can become an advantage for planning: if we know just a bit more information than standard planning tasks, i.e., who the agents are, then we can exploit that information to naturally decompose planning tasks. And that can lead to big improvements on planning efficiency, as other decomposition methods [11]. The second one can be seen as a disadvantage for planning: we have to keep some level of privacy among agents, such that no agent should be able to infer the private information of other agents.

In this paper, we deal with both issues. The user declares as input some information: the agents, and their privacy requirements. So, privacy is defined by the user, as opposed to other techniques where privacy is computed from the structure of the planning task [12]. Also, as opposed to other MAP techniques [12, 66], the agents not only share the public knowledge, but also their private knowledge. However, private knowledge is obfuscated to maintain privacy. Again, by sharing the obfuscated private information, we maintain privacy, but at the same time we are able to design more efficient MAP approaches than only sharing public information. We have devised several privacy preserving methods that balance the privacy level and planning efficiency.

We first propose mapr (Multi-Agent Planning by plan Reuse). This approach occupies a middle ground between distributed and centralized planning. We have been inspired by iterative MAP techniques [42]. It starts from a PDDL description of the planning task (domain and problem) and generates an obfuscated version for each agent. Currently, we deal with PDDL2.1. Then, it first assigns a subset of public goals to each agent, while each agent might have an additional set of private goals. Afterward, mapr calls the first agent to provide a solution (plan) that takes into account its private and public goals. mapr iteratively calls each agent with the solutions provided by previous agents, augmented with domain and problem components needed to reproduce the solution (goals, literals from the state and actions). Each agent receives its own goals plus the goals of the previous agents. Thus, each agent solves its own problem, but taking into account the previous agents’ augmented solutions. Since previous solutions might consider private information, all private information from an agent is obfuscated for the next ones. We have called it planning by reuse given that each agent reuses information from previous agents planning episodes. They pass their goals, and can reuse (or not) the actions in their plans. Since each agent receives the plan from the previous agent, that implicitly considers the solutions to all previous agents, instead of starting the search from scratch, it can also reuse the previous whole plan or only a subset of the actions. Therefore, we can use any recent work on planning by reuse [7, 33]. Sharing the obfuscated private and public information of previous agents also solves the potential problem of invalidating previous agents plans, by forcing each agent to regenerate previous agents’ plans or provide an alternative subplan to achieve previous agents’ goals.

The second approach is a centralized version, cmap (Centralized Multi-Agent Planning). It shares the same first step with mapr: each agent obfuscates its initial planning task (domain and problem), and a central agent assigns a subset of public goals to some agents. This goal allocation indirectly generates a smaller set of agents (those to which cmap has assigned goals). Then, each agent sends its obfuscated planning task to a common agent. This agent joins all problem descriptions and performs centralized planning over the obfuscated planning tasks. Experiments show that both techniques outperform state-of-the-art MAP techniques. One of the key differences of our approaches with respect to current MAP techniques lies on our focus on the division of labor. Both mapr and cmap explicitly reason about efficient ways of splitting the MAP task among agents, while other techniques focus on the collaboration of agents to achieve goals.

A previous version of this paper was published as an extended abstract in [4], and in a workshop [5]. Both approaches also participated in the First Competition of Distributed and Multiagent Planners (CoDMAP) [6].^{Footnote 1} The main contributions of this paper are:^{Footnote 2}

Detailed definition of two new MAP algorithms, mapr and cmap, that explicitly consider (preserve) agents privacy.
Definition of seven goal assignment strategies for both mapr and cmap. We report on three new ones with respect to our previous papers.
Definition of four strategies for ordering agents. We only considered one before.
Definition (and implementation) of three methods to preserve privacy.
Definition of two new domains, Port and Depots-robots, to test some of the properties of the algorithms. We had already introduced Port before.
Evaluation of all the combinations of the previous algorithms and strategies against state-of-the-art MAP and centralized planners. We have greatly extended the experimentation.
Evaluation of privacy preservation on different domains.

The paper is structured as follows. Section 2 presents the MAP task we are dealing with. Next, Sect. 3 defines the privacy we use in this paper and four different algorithms for maintaining privacy. Section 4 describes mapr, and Sect. 5 presents cmap. Section 6 describes the experimental setup, shows results and discusses them. Section 7 presents relevant state-of-the-art approaches. Finally, Sect. 8 draws some conclusions and presents some future work.

2 Single and multi-agent planning tasks

We deal here with multi-agent classical planning tasks. We describe the standard single-agent planning task. We start by providing a standard definition of single-agent planning task in the propositional setting. An alternative common setting nowadays is SAS+ [20]. Our approaches follow the propositional setting, so we will use it to describe our algorithms. We mention the SAS+ representation given that the base planner used for the experiments uses it as we describe later.

Definition 1

(Single-agent classical planning task) A single-agent classical planning task is a tuple $\Pi =\{F,A,I,G\}$, where F is a set of propositions, A is a set of instantiated actions, $I\subseteq F$ is an initial state, and $G\subseteq F$ is a set of goals.

Each action $a\in A$ is described by a set of preconditions (pre(a)) that represent literals that must be true in a state to execute the action and a set of effects (eff(a)), literals that are expected to be added (add(a) effects) or removed (del(a) effects) from the state after execution of the action. The definition of each action might also include a cost c(a) (the default cost is one). The application of an action a in a state s is defined by a function $\gamma $, such that $\gamma (s,a)=(s{\setminus }\text{ del }(a))\cup \text{ add }(a)$ if pre(a)$\subseteq s$ and s otherwise (it cannot be applied).^{Footnote 3} The planning task should generate as output a sequence of actions $\pi =(a_1,\ldots ,a_n)$ such that if applied in order from the initial state I would result in a state $s_n$, where goals are true, $G\subseteq s_n$. Plan cost is commonly defined as: $C(\pi )=\sum _{a_i\in \pi } c(a_i)$. Even if mapr and cmap can deal with the same cost functions that current planners can, in this paper we will only report on quality measured as the plan length. Thus, $c(a_i)=1\; \forall a_i\in A$.

In order to represent planning tasks compactly, the automated planning community uses the standard language PDDL [36]. A planning task $\Pi $ is automatically generated from the PDDL description of a domain D and a problem P. The domain in PDDL is a tuple $D=\{Ty,Co,Pr,Fn,Op\}$, where: Ty is a hierarchy of types (to characterize the problem objects); Co is a set of constants that are used by all problems in the domain; Pr and Fn are sets of definitions of predicates and functions, respectively, whose instantiations generate the facts in F; and Op is a set of operator schemas or generalized actions, defined using variables—parameters, par(a). The instantiations of those operators with problem objects generate the actions in A. A planning problem in PDDL is a tuple $P=\{D,Ob,I,G,Me\}$, where: D is the domain; Ob is a set of objects (instances of types in the domain); I is the initial state; G is the set of goals; and Me is an optional metric to define optimization criteria (most commonly minimizing plan cost). We will present some results related to the quality of the solutions that will use a cost metric.

In MAP, a set of m agents, $\Phi =\{\phi _1,\ldots ,\phi _m\}$ have to solve the planning task $\Pi $.

Definition 2

(MAP task) A MAP task is a set of planning subtasks, one for each agent, $M=\{\Pi _1,\ldots ,\Pi _m\}$. Each planning subtask can be defined as a single-agent planning task, $\Pi _i=\{A_i,F_i,I_i,G_i\}$. An alternative equivalent lifted representation of each single-agent planning task in PDDL would be a pair (domain,problem): $\Pi _i=\{D_i,P_i\}$.

In the definition, we do not require the sets $A_i$ to be disjoint. So, there can be domains where agents share a subset of actions. For instance, in the Driverlog, if we set the agents to be the drivers (a common way to agentify—model—the domain), there are actions (load and unload) that do not have an agent in their parameters. In those cases, these actions will be in the set of all agents. As we will see, this is a major difference in terms of modeling with other MAP approaches that assume that all actions have to be performed by at least one agent (an agent has to be in the parameter list of each action), and thus they require that the sets $A_i$ are disjoint [12, 54].

The components of each $\Pi _i$ have a public part that can be shared with other agents, and a private part. We assume that both the complete initial state, $I=\cup _{i=1}^m I_i$, and set of goals, $G=\cup _{i=1}^m G_i$ are consistent; that is, they are conflict-free (there are no mutex). In other MAP approaches, they allow conflicts among goals [70].

3 Preserving privacy

One of the key requirements in MAP is privacy preservation. How to represent privacy and what levels of privacy preservation are available is still an open issue. We will first address some definitions of privacy preservation and later we will define the methods we propose and use in our MAP algorithms. Nissim and Brafman [54] define weak privacy preserving (wpp) planning algorithms as those that do not exchange private information among the agents and strong privacy preserving (spp) algorithms as those where agents cannot infer more isomorphic models of other agents information than the ones that can directly be inferred from the public information. Their approaches lie in between these two extremes. Recent work has defined formally privacy leakage in terms of the inferred transition systems [64]. Our approach is based on the idea of actually sharing information and still being able to preserve privacy in a level equivalent to Nissim and Brafman, as we will discuss later. We defined in previous works a simple way to preserve privacy [5]. Here, we define several methods for preserving privacy with different properties in relation to privacy preservation. Next, we discuss them in more detail.

3.1 Definition of privacy

The best alternative for defining MAP tasks would be to have a standard language as PDDL for single-agent planning tasks. However, the community is still small and there has been yet no agreement on such standard, even if there have already been some attempts at defining it, as mapl [15], ma-pddl [44] (as the one used in the recent MAP comparison CoDMAP, or the one used by Torreño et al. [68]). Most of these approaches require that the user manually augments the PDDL (or equivalent) definitions of domains and problems to incorporate the extra multi-agent components into the corresponding language.

The most used approach, ma-strips, defines a multi-agent model as a rewriting of the single-agent task [12]. In this case, the multi-agent task is redefined as $M=\{F,\cup _i A_i,I,G\}$, where each $A_i$ is the set of actions of agent $\phi _i$. Systems using ma-strips only take as input the set of agents. Then, instantiated actions are assigned to the corresponding agent, and the notion of private atoms is inferred as the ones that are only handled by one agent’s actions. So, in their privacy model, the focus is on defining agents’ actions and then inferring the privacy of atoms. The fact that an atom (piece of information) is private is a side effect of the fact that an action belongs to an agent.

We follow a different approach. We prefer to attach the property of privacy to the information available to agents (states, goals and objects). While ma-strips defines privacy at the instantiated level (propositions), we define it at the lifted representation level (PDDL). Intuitively, and also from our experience on projects, privacy relates to knowledge (state or goals) that each agent does not want to share openly with the rest of agents when planning. In particular, we can attach the property of private to predicates and types, and they will be inherited by their instantiations (atoms and objects). A side effect of our approach is that we can deal with private goals where other approaches cannot in their current state without further reformulation, and we can deal with actions with no associated agent.

To showcase a difference between the two models, suppose, for instance, a logistics domain where a company is structured in some branches, one per city. Each branch owns some trucks to move packages within the same city, and there are common airplanes that can carry packages from one city to another. Assume that, in a given problem, a truck, truck1, is the only one that can move a package p1 from location locA to location locB (it is the only truck in the city of both locations). ma-strips would assign action move(truck1,p1,locA,locB) to truck1. Then, it would infer that the location of p1 at locA is private to truck1. However, in the real world, sometimes the location of the package might be considered as public (when we want the user to be able to track the position of the package) and sometimes as private (internal) by the whole company. In our case, the user can decide at planning time to define the predicate at as either private or public. So, in the case of ma-strips, the definition of privacy is operational for decomposing the problem into subproblems and deciding what to share or not with the rest of agents. Instead, our definition of privacy relates to the common concept used by most people/organizations, and can be parameterized by the user by setting the predicates as private or public, as explained next.

In the experiments, we have used as agents the standard setting in most MAP works (e.g., rovers, satellites, or aircrafts), and that selection of agents leads to use as private predicates and types exactly the same ones that ma-strips would generate. So, in most cases, both definitions generate the same privatization model. Torreño et al. [68] use a richer model of privacy that allows the user to specify which predicate is public for each agent. Bonisoli et al. [3] also describe a richer model of privacy where agents might not need to know the presence of other agents. In these three privacy models, the richer the model is, the more inputs it requires from the user.

Since PDDL does not allow us to define privacy, nor agents, and we wanted to minimally change the input descriptions of domains and problems from the single-agent case, we have developed a compiler that translates any classical single-agent planning task into a multi-agent planning task. The compiler takes as input a domain D and problem P in standard PDDL. In order to define the agents and the privacy level, the compiler also takes as input three lists: the agents’ domain types AT, the private predicates and functions PP, and the private types PT. We will call agentification of a domain to the particular assignment of values to these three variables. The compiler generates as output a domain and problem file for each agent that corresponds to the lifted representation of the MAP task of Definition 2. For example, in the logistics domain mentioned before, AT={truck}, PP={at} and PT={}. If trucks would have any measurement instrument, then PT could be {instrument}, as it would be private for each truck. AT, PP and PT are mainly used for creating the domain and problem definitions from the point of view of each agent. In particular, AT is used to guide the processes that define the particular domain and problem for each agent, and the ones that try to preserve privacy as explained later.

The domain file of a given agent contains only those actions that can be carried out by that agent or those actions that are not carried out by any agent. That is, if agent’s type is t, those actions that contain a parameter of type $t'$ such that either $t'=t$, or $t'$ is a super-type of t in the PDDL types’ hierarchy, or those actions that do not contain any parameter of a type in AT. The problem file of a given agent contains only the parts of the state and goals that are either public or private for that agent. Thus, it removes all references of objects of types in $AT\cup PT$ of one agent from the domain and problem files of the other agents.

A constraint that users should take into account is that the definition of those inputs has to be consistent. Thus, if two agents can modify the value of the same grounded literal, the corresponding predicate cannot be private; i.e., it cannot be in PP. In our experience, defining these inputs so that this property holds is really easy for the domains we have used in the experiments. Appendix A.1 shows the settings we have used in the experiments for each domain.

In case, the input domain and problem files are specified in the unfactored MA-PDDL language (as for the CoDMAP competition), then the compiler takes the input MA-PDDL domain file and problem file and defines the mapping from MA-PDDL to our model as:

MA-PDDL private predicates: they become PP
MA-PDDL action definition includes the agent of that action: the union of the types of those agents becomes AT
MA-PDDL private objects: their types become PT

The additional information required to convert a single-agent problem into a multi-agent problem is the same in both cases. MA-PDDL includes it in the domain and problem files, while we provide it as a separate input to the algorithm.

Nevertheless, this compilation is only needed until we have a standard language for specifying MAP tasks. In the real world, each agent would supply its own domain and problem definitions and we would not need this compilation. Therefore, the real MAP problem solving starts in Sect. 4.

Since we will need it later, we define now public goals.

Definition 3

(Public goal) A goal $g\in G$ is public in the lifted (PDDL) representation if its predicate is not a private predicate, and it does not have as argument an object of a private type. The set of public goals, PG, can be defined as: $PG=\{g\mid g\in G, g=p(o_1, o_2,\ldots , o_n), p\not \in PP, \not \!\exists o_i$type$(o_i)\in PT\}$, where type() returns the type of a given object.

We could have dropped the condition of not including a private type, but in that case the only agents that would be able to achieve it would be the “owners” of the objects of that private type. So, we directly define those goals as private.

Now, we will define three methods for privacy preservation that we have implemented: obfuscation by random substitution, generation and sharing of macro-operators, and generation of zero-arity predicates. We believe these approaches are general enough to be used by other MAP algorithms. For a better understanding of the three methods, we advance some details of our MAP algorithms described in the next section. In our MAP approaches, the search starts when each agent takes as input the compiler’s output (its domain and problem files) and generates as output a first obfuscation of both files. In cmap, a centralized planner solves the planning task generated by merging all these agents’ obfuscated files. In mapr, the first agent solves its obfuscated problem and sends to the next agent the solution plan together with domain and problem components needed to reproduce the solution. These components include the description of the actions in the solution plan in terms of preconditions and effects and the goals (where private information is obfuscated). Then, the second agent integrates the received information into its planning task and generates a plan that achieves its own goals and the previous agents’ goals. The same process is repeated iteratively with the following agents.

3.2 Obfuscation by random substitution

Since information on other agents is shared in mapr and cmap, each agent needs to obfuscate its private information when sharing it with other agents (mapr) or with a central agent (cmap). Thus, each agent takes as input the compiler’s output (its domain and problem files) and generates as output a first obfuscation of both files. There can be potentially many algorithms for encrypting/obfuscating the information. We define first a simple alternative. The obfuscation function, denoted as @, can be described as a three-steps process.

First, a random substitution is generated for the names of all private information. So, each agent reads its input domain and problem files and generates a random substitution $\sigma $ for everything considered for obfuscation. The elements of the PDDL files that are considered for obfuscation are inferred from the inputs to the compiler, AT, PP and PT. So, they are: types in $AT\cup PT$; constants of a type in $AT\cup PT$; predicates or functions in PP or whose arguments are in $AT\cup PT$; literals from the state, goals and preconditions and effects of actions in PP; actions that include any of the previous; and objects of types in $AT\cup PT$.^{Footnote 4} Second, each agent $\phi _i$ applies its substitution to its PDDL domain and problem, generating two new files, $D^@_{\phi _i}$ and $P^@_{\phi _i}$. And, third, each agent removes from $D^@_{\phi _i}$ and $P^@_{\phi _i}$ any parameter referring to its type from actions, predicates and functions.

Let us see an example of the combined effect of the compilation and the obfuscation in a simple multi-agent robot domain (Depots-robots) inspired by the Kiva robots used by Amazon.^{Footnote 5} These robots move in a grid inside a depot. They have to move inventory pods from their storage places to human workers that fill orders by picking up items from the pods. In our simplified version, a set of robots and a set of pods are placed in a grid. Robots can move to adjacent cells (vertical or horizontal movements only). They can move empty or with one pod (by placing themselves under the pod and taking it). Figure 1 (left) shows an example of an initial state in this domain with two robots (R1 and R2) and three pods (P1, P2 and P3). The goals (right) would be that pods are used by humans, so they have to be at the same locations where human workers (H1–H4) are. Note that P2 cannot be moved in the initial state before moving P1 or P3. A plan to solve the problem, would move R2 to take pod P3 to H4, while robot R1 would move to take pods P1 and P2 to H1 and H3, respectively.

In Fig. 2 we show some parts of the input domain definition and their corresponding obfuscated versions for agent R1 (agents are read from the problem file). The actions would be: move robots (four when robots are free and four to take the pods); dropping the pods (one action); and a human using a pod (one action). The agents in this domain are $AT=${robot}. The private predicates are PP={at-robot, carries, free}. All goals would be public. In our simplified domain, there are no private types. In a more complex scenario, we could model other issues, such as the robots’ batteries, that would be private types, while the battery level would be a private predicate/function. Figure 3 shows some parts of the problem file and their corresponding obfuscated versions.

The resulting substitution applicable to R1 is:^{Footnote 6}

$\sigma =${(free / anon0), (at-robot / anon1), (carries / anon2), ...}
Similarly, a possible substitution applicable to R2 would be:
$\sigma =${(free / nona0), (at-robot / nona1), (carries / nona2), ...}

Any stronger obfuscation function @ could be used instead, without loss of generality of the rest of the approach. The main constraint is that the technique has to maintain the standard properties of planning tasks. Examples of such properties are:

1.
if the obfuscation technique converts an action $a\in A$ of an agent $\phi $ into its obfuscated version, $a^@$, then for all states s such that a is applicable in s, $a^@$ is also applicable in the corresponding $s^@$;
2.
if a of an agent $\phi $ is applied in a state s, resulting in a state $s_1$, then if $a^@$ is applied in $s^@$, it must result in $s_1^@$

It is easy to see that the obfuscation technique proposed in this section fulfills these properties. Each action a of a given agent $\phi $ is mapped into one action $a^@$ where we have applied a substitution @ to its preconditions and effects, followed by removing the agent variable from all the lifted literals in pre(a)$\cup $add(a)$\cup $del(a). We also apply the same operation to I, resulting in $I^@$. Given that we perform the same mapping operation to each action a and the I, then each literal will either not change, or change in the same way in I and a. Thus, for all actions a, such that pre(a)$\subseteq I$, then pre($a^@$)$\subseteq I^@$ (if they were applicable in I, the mapped action will also be applicable in $I^@$). Now, for all actions a applicable in I, the state after applying a in I is $s_1=\gamma (I,a)$. We also applied the same mapping to the effects of a, so the result of applying $a^@$ in $I^@$, $s_2=\gamma (I^@,a^@)$, should be $s_2=s_1^@$. Therefore, we have shown that the properties hold in the case of the initial state. If we apply them recursively, they hold in all reachable states from I.

In relation to space complexity, the size of the input PDDL domain file, |D| is similar to (slightly bigger than) the size of each of the agents’ PDDL domain file, $|D|\sim |D^@_{\phi _i}|$, $i=1..m$. We now have m such files. On the other hand, the size of each agent’s PDDL problem file is usually much smaller from the original PDDL problem file $|P|\sim \frac{|P^@_{\phi _i}|}{m}$, $i=1..m$. In the obfuscated problem file of each agent, there is only private information of that agent, plus public information. Therefore, the overhead is proportional (around m times) to the size of the public information. If we see it from each agent’s perspective (the planning tasks we will actually solve in mapr), the size of each MAP task $\Pi _i$ is around m times smaller in the private part than the original one, $\Pi $. This is also relevant for the branching factor that could be reduced in a factor of m (since each action that contains an agent as parameter generates m times less instantiations), resulting in potentially exponential benefits in terms of search. In summary, one can see the compilation and obfuscation processes as a problem decomposition technique based on domain-dependent features (agent types, private predicates and private types) where space complexity is linearly augmented.

In mapr, agents also exchange plans. Next, we will study further obfuscation techniques that can be applied when plans have been generated and improve privacy preservation. Before, we explain another type of obfuscation technique with stronger privacy-preserving properties that can be applied when plans have been computed, based on generating macro-operators.

3.3 Generation and sharing of macro-operators

In mapr, agents will generate individual plans and share them with the next agents as discussed later on. Thus, we have devised a method to obfuscate further the information shared among agents by generating macro-operators from the agents’ plans. Given a plan $\pi $, a macro-operator is defined as a new action $m_{\pi }$ that has as preconditions those conditions of actions in the plan that are not achieved by previous actions in the plan, and whose effects are the ones that are added or deleted by the execution in sequence of the actions in $\pi $. They were defined in the strips planner and have been successfully used as a learning technique [30, 31]. The idea of sharing macro-operators rather than individual actions is that details on the underlying plan are missing and thus cannot be inferred by the next agents. We have defined two alternative methods that work in combination only with mapr.

Generate one macro-operator for the complete plan of each agent, which is communicated to the rest. Thus, if agent $\phi _i$ generates the plan $\pi =(a_1,\ldots ,a_n)$, then we generate its corresponding macro-operator $m_{\pi }$ using the standard procedure, though it is left fully grounded; that is, we do not perform the final generalization step [31]. Sharing the macro-operator instead of the primitive actions restricts the next agents’ capability to interleave primitive actions from the previous agents in the new plan. So, it introduces a balance between privacy preservation and completeness/quality of solutions.
Generate several macro-operators. It traverses the actions in the plan from the first one, maintaining a list of actions to be included in the next macro-operator. At each step, if action $a_i$ contains private information, it adds it to the list of actions. If not (all information is public), then it generates a macro-operator with all the collected actions, empties the list of actions and continues. Once it finishes, it sends the next agents the list of generated macro-operators. Nissim and Brafman [54] mentioned that they tried to use a variation of this idea, but it was not useful for them due to the utility problem [53]. Very recently, a variation of that approach has been published [51]. The main difference with our proposal is that they create and use the macros online (during the search of a solution), and in mapr agents build them after their search of a solution has finished and the macro-operators are used by the following agents.

Suppose, for instance, in a transportation problem where some trucks have to move packages from one place to another using a road network. They have three actions: load, unload, and move. The private component of the state would be the location of each truck, so loading and unloading have a mixture of public and private information, and move is completely private. Assume truck tr1 generates a plan such as:

load(t1,p1,n1), move(t1,n1,n2), move(t1,n2,n3), unload(t1,p1,n3),

move(t1,n3,n4), load(t1,p2,n4), move(t1,n4,n3), unload(t1,p2,n3)

Then, the first method would build and share a single macro-operator m with the following contents (that would be further obfuscated as described in Sect. 4):

pre(m)={at(tr1,n1),at(p1,n1),at(p2,n4),conn(n1,n2),...,conn(n4,n3)}^{Footnote 7}

eff(m)={at(p1,n3),at(p2,n3),not(at(tr1,n1)),at(tr1,n3)}

n2 disappears from the dynamic predicates. Thus, in case, there are different ways to go from n1 to n3, the fact of having gone through n2 would be hidden for the next agent. The second method would generate the following macro-operators:

$m_1(t1,p1,n1,n3)$=move-move-unload, $m_2(t1,n3,n4,p2)$=move-load, and

$m_3(t1,n4,n3,p2)$=move-unload. It would share the plan $\pi $=[load,$m_1$,$m_2$,$m_3$] plus some extra information (including the macro-operators models) as detailed in Sect. 4. Again, the fact of having gone through n2 would be hidden for the next agent.

This second method is similar in terms of privacy preserving to that of distributed MAP approaches (fmap [66], or mafs [54]). In their case, every time an agent changes some public component of the state, it broadcasts that information to the rest (mafs broadcasts states and fmap also include information on the partial-order plan, such as causal links, preconditions and effects related to the changes). Thus, what each agent receives from the others can be seen as a sequence of states. Suppose that agent $\phi _i$ receives from another agent $\phi _j$ the sequence of states $\langle s_0,s_1,\ldots ,s_n\rangle $ during a MAP search (each pair of consecutive states corresponds to the execution of a sequence of private actions of $\phi _j$). $\phi _i$ can compute the differences between consecutive states that it has received from $\phi _j$ (similarly with the other agents). So, it can translate the input sequence of states into $\langle s_0,\Delta _1,\ldots ,\Delta _n\rangle $, where $\Delta _i=(s_i\backslash s_{i-1})\cup \text{ not } (s_{i-1}\backslash s_i)$. That is, the set of literals that are true in $s_i$ that were not true in $s_{i-1}$ (adds) and the set of literals that were true in $s_{i-1}$ and are not true in $s_i$ (dels). Each of these deltas can be seen as the effects of a macro-operator that explain the changes generated by $\phi _j$’s private actions applied between two consecutive public actions. The only difference with a macro-operator is that preconditions cannot be computed directly from each delta. However, since we have a set of deltas, some inductive procedure could be used to derive such preconditions [78, 79]. Furthermore, in the case of fmap, those preconditions are shared.

Therefore, the information exchanged among agents when learning and sharing a set of macro-operators can be seen as similar to the information that distributed approaches exchange, given that all agents broadcast to the rest all changes in the public state. In our case, each agent only receives as information the obfuscated plans, and reduced domain and problem descriptions of the previous agents (a subset of all agents). Given that the information of all the previous agents has been merged, the following agents are not able to decide from which previous agent they are receiving which part of the plan, and reduced domain and problem descriptions. In other MAP approaches, they do know which agent has made which changes to the public part of the state. The first method of learning macro-operators leaks even less information, since it removes all intermediate changes made in the public part of the state that are no longer true at the end of the plan (they will not appear as effects in the corresponding macro-operator).

3.4 Generation of zero-arity predicates

The next privacy-preserving method is also applied by mapr when agents share their plans with the next agent. It consists of the following steps:

Since all actions in the plan are instantiated, we generate a second level of obfuscation, by removing public objects from the private literals in the preconditions and effects and generating a new symbol for the literal’s name. Private objects were already removed from all literals in the first obfuscation. So, after this step, the only objects that are left in the action are public objects that appear in public literals. Also, we remove parameters from the action. Suppose, for instance, in a Logistics domain, there is an action a1=drive-truck(?tr,?l1,?l2):
- drive-truck(?tr,?l1,?l2)
- pre={at(?tr,?l1), connected(?l1,?l2)}
- eff={not(at(?tr,?l1)), at(?tr,l2)}
Since at is private, the first obfuscation process would have generated the following model for an agent tr1 (the one used by mapr to search for a solution):
- anon1(?l1,?l2)
- pre={anon2(?l1), connected(?l1,?l2)}
- eff={not(anon2(?l1)), anon2(?l2)}
Suppose now that agent tr1 has used an instantiation of that action in a plan, such as anon1(A,B). A second obfuscation step is performed converting it into:
- anon3()
- pre={anon4, connected(A,B)}
- eff={not(anon4), anon5}
Obviously, all appearances of anon2(A) in the actions in the plan would have to be obfuscated with the same substitution (anon4), and the same applies to anon2(B) for anon5. Thus, the semantics of drive actions should be inferred from the predicate connected and we deal with it in the next step.
All static predicates (both public and private) in the preconditions of the actions are removed (as the connected predicate in the previous example). Since the action was applied, its preconditions were true in the initial state, in particular the static predicates. Also, no action can change their truth value, and they were already true, so they can be safely removed from the actions preconditions. As an example, the previous action would be shared with the remaining agents as follows. We can observe that this step has removed all semantics of the original action:
- anon3()
- pre={anon4}
- eff={not(anon4), anon5}
Let us see now an example of where these previous steps do not remove completely the semantics of the original domain. Suppose we are in the Depots-robots domain and a robot has used an instance of the move-right action (in Fig. 2) such as anon3(C31,C41). The previous steps would translate this action into:
- anon27()
- pre={anon28, empty(C41)}
- eff={not(anon28), anon29, empty(C31), non(empty(C41))}
In this case, given that empty is a public literal, mapr cannot obfuscate it (other agents might need to know whether cell C41 is empty or not). Thus, this second obfuscation step removes all semantics for actions with no public literals, and it keeps semantics proportional to the number of public predicates in the preconditions or effects of the actions. Again, the same applies to other MAP approaches (mafs or fmap). The other agents would see that empty(C41) is true and in the next state it is false, while empty(C31) becomes true.
As explained before, mapr can learn and share macro-operators, so that privacy-preserving is further augmented by not sharing the individual private actions (either between two public actions, or all). For instance, in the case of the previous example in the Logistics domain, the truck would not communicate all individual movements (private literals referring to the truck being at intermediate locations).

4 Multi-agent planning in mapr

The main steps of the mapr algorithm are to first assign public goals to agents and then iteratively solve each agent problem. Once an agent solves a problem, it communicates an augmented solution to the next agent where private components are obfuscated. In turn, the next agent should solve its own problem augmented with the obfuscated private part of the solution of the previous agents and the public part of those solutions.

4.1 mapr algorithm

Figure 4 shows a high-level description of the algorithm. It takes as input a set of agents $\Phi $, a MAP task (set of agents planning tasks), the lists of private predicates and types, a goal assignment strategy, an agents ordering scheme, the planner to be used by the first agent, and a second planner (it might be the same one) to be used by the following agents. The reason to use two planners is that the second planner might be a replanning system. Since all inputs and outputs are in PDDL, we can use any state-of-the-art planner. The mapr algorithm requires the execution of a virtual central agent that orchestrates all the relevant calls. It starts by computing the set of public goals (from M, PP and PT following Definition 3) and a first ordering of agents. mapr is then composed of eight main steps: goal assignment; second ordering of agents; first planning episode; building of augmented solution; communication of augmented solution to the next agent; merging of the information of prior agents with the current agent’s planning problem; subsequent planning episodes; and termination. We will now explain in more detail each step.

4.2 Goal assignment

In the input MAP task M, each agent’s problem definition includes all public goals. But it would be inefficient to devote all agents to achieve all goals, given that it is a collaborative task. Thus, given the total set of public goals PG and a set of agents $\Phi $, mapr first assigns a subset of goals to each agent. For each public goal $g\in PG$, each agent $\phi \in \Phi $ computes the cost of the relaxed plan, $c(g,\phi )$, from its initial state, $I_i$, following the well-known relaxed plan heuristic of ff [40]. The relaxed plan heuristic computes the cost of a plan that could reach the goals from the initial state without considering the deletes of actions. If the relaxed plan heuristic detects a dead-end, then $c(g,\phi )=\infty $. Each agent communicates to a central agent each $c(g,\phi )$. The central agent defines a cost matrix, $c(PG,\Phi )$. Next, we have devised seven goals assignment strategies: all-achievable, rest-achievable, best-cost, load-balance, contract-net, all, and subset.

all-achievable: assigns each goal g to all agents $\phi _i$ such that $c(g,\phi _i)<\infty $; that is, if the relaxed plan heuristic estimates g could be reached from the initial state of agent $\phi _i$, then g is assigned to $\phi _i$. So, it can assign the same public goal to more than one agent.
rest-achievable: assigns goals iteratively. It first assigns to the first agent $\phi _1$ all goals that it can reach (cost less than $\infty $). Then, it removes those goals from the goals set, and assigns to the second agent all goals that it can reach from the remaining set of goals. It continues until the goals set is empty. The agents order might be relevant. Thus, we have defined several orderings, as specified later.
best-cost: assigns each goal g to the agent that can potentially achieve it with the least cost: $\arg \min _{\phi _i\in \Phi } c(g,\phi _i)$.
load-balance: tries to keep a good work balance among agents. It first computes the average number of goals per agent, $k=\lceil \frac{\mid PG\mid }{m}\rceil $. Then, it starts assigning goals to agents as in best-cost. When it has assigned k goals to an agent, it stops assigning goals to that agent. The next goals that could be assigned to this agent will be redirected to the second best agent for each goal. At the end, agents will have either all k goals, or $m-1$ agents will have k goals and one agent will have the remaining goals, $\mid PG\mid -k\times (m-1)$.
contract-net: it is inspired by the well-known negotiation scheme in multi-agent literature [62]. Under this setting, the virtual central agent takes the first public goal $g_1\in PG$ and assigns it to the best bidding agent, where each bid is $c(g_1,\phi _i)$. Let us assume, it is $\phi _{b1}$. Then, it selects the second goal $g_2$ and assigns it to the best bidding agent as before. However, in order to compute $c(g_2,\phi _{b1})$ it takes into account that $g_1$ has already been assigned to $\phi _{b1}$, so $\phi _{b1}$ computes the estimated cost to achieve both $g_1$ and $g_2$, while the rest of agents only compute the cost of achieving $g_2$. Usually, but not necessarily, the cost of achieving both $g_1$ and $g_2$ for $\phi _{b1}$ is more expensive than using another agent $\phi _{b2}$ for achieving only $g_2$. In summary, at each iteration (over PG), contract-net assigns the current goal to the best agent, taking into account all previous assignments.
all: assigns each goal g to all agents, independently of whether each agent can achieve the goal or not. It is a naïve strategy that allows mapr and cmap to use all agents, instead of using a subset of them.
subset: as explained in the previous section, there are domains where agents must collaborate; e.g., one agent has to achieve one subgoal for another agent as moving a package to a given location. The subset goal assignment iterates over all goals. For each public goal $g_i\in G$, it computes the relaxed plan as in previous goal assignment strategies. However, instead of just computing the cost, it computes the subset of agents that appear in the relaxed plan for that goal, $\Phi _i\subseteq \Phi $. Those are agents that could potentially be needed to achieve the goal $g_i$. It is computed as the subset of agents that appear as arguments of any action in the relaxed plan. Then, it assigns $g_i$ to all agents in $\Phi _i$. A side effect of this strategy is that the set of selected agents for the combined planning task is the union of the agents subsets for all goals: $\Phi '=\bigcup _{g_i\in G} \Phi _i$.

Table 1 Example of two agents, $\phi $={R1,R2} and three goals (humans using P1, P2 and P3)

Efficient approaches for multi-agent planning

Abstract

Similar content being viewed by others

A brief introduction to distributed systems

Interoperability in Internet of Things: Taxonomies and Open Challenges

Internet of things technology, research, and challenges: a survey

1 Introduction

2 Single and multi-agent planning tasks

Definition 1

Definition 2

3 Preserving privacy

3.1 Definition of privacy

Definition 3

3.2 Obfuscation by random substitution

3.3 Generation and sharing of macro-operators

3.4 Generation of zero-arity predicates

4 Multi-agent planning in mapr

4.1 mapr algorithm

4.2 Goal assignment

4.3 Ordering

4.4 Planning

4.5 Building an augmented solution

4.6 Communication

4.7 Merging

4.8 Termination

4.9 Plans parallelization

4.10 Properties

5 Multi-agent planning in cmap

5.1 cmap algorithm

5.2 Obfuscation and planning

5.3 Properties

6 Experiments and results

6.1 Experimental setup description

6.1.1 Metrics

6.1.2 Planners

6.1.3 Domains

6.1.4 mapr and cmap Configurations

6.1.5 Computational resources

6.2 Comparison among different configurations of mapr and cmap

6.2.1 Ordering the agents

6.2.2 Goal assignment strategy

6.2.3 Quality

6.3 Scalability study

6.4 Replanning algorithm

6.5 Comparison with state-of-the-art MAP

6.5.1 Comparison with other non privacy-preserving MAP Approaches

6.5.2 Comparison with privacy-preserving MAP Approaches

6.6 CoDMAP results

6.7 Analysis of privacy preservation

6.8 Summary of results

7 Related work

8 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Complement to experiments and results

Complement to experiments and results

1.1 Agentification of domains

1.2 The Port domain

1.3 Ordering the agents

1.4 Selected agents

1.5 Coverage results

1.6 Scalability study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation