Two approximation algorithms for probabilistic coalition structure generation with quality bound

How to form effective coalitions is an important issue in multi-agent systems. Coalition Structure Generation (CSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathsf {CSG}}}$$\end{document}) is a fundamental problem whose formalization can encompass various applications related to multi-agent cooperation. CSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathsf {CSG}}}$$\end{document} involves partitioning a set of agents into coalitions such that the social surplus (i.e., the sum of the values of all coalitions) is maximized. In traditional CSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {CSG}}$$\end{document}, we are guaranteed that all coalitions will be successfully established, that is, the attendance rate of each agent for joining any coalition is assumed to be 1.0. Having the real world in mind, however, it is natural to consider the uncertainty of agents’ availabilities, e.g., an agent might be available only two or three days a week because of his/her own schedule. Probabilistic Coalition Structure Generation (PCSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathsf {PCSG}}}$$\end{document}) is an extension of CSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {CSG}}$$\end{document} where the attendance type of each agent is considered. The aim of this problem is to find the optimal coalition structure which maximizes the sum of the expected values of all coalitions. In PCSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {PCSG}}$$\end{document}, since finding the optimal coalition structure easily becomes intractable, it is important to consider approximation algorithms, i.e., to consider a trade-off between the quality of the returned solution and tractability. In this paper, a formal framework for PCSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {PCSG}}$$\end{document} is introduced. Approximation algorithms for PCSG\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {PCSG}}$$\end{document} called Bounded Approximation Algorithm based on Attendance Types (BAAAT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathsf {BAAAT}}}$$\end{document}) and Involved BAAAT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {BAAAT}}$$\end{document} (IBAAAT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathsf {IBAAAT}}}$$\end{document}) are then presented. We prove a priori bounds on the quality of the solution returned by BAAAT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {BAAAT}}$$\end{document} and IBAAAT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathsf {IBAAAT}}$$\end{document} with respect to the optimum and perform experimental evaluations on a number of benchmarks.

involves partitioning a set of agents into coalitions (where each coalition is a subset of the available set of agents) such that the social surplus is maximized. A partition is also called a coalition structure. In traditional , the value of a coalition is assumed to be given by a black box function called the characteristic function, and the value of a coalition structure is provided by the sum of the values of all coalitions. It is well-known that the problem is equivalent to the complete set partitioning problem [33]. Various sophisticated algorithms have been proposed for the problem, for instance, an anytime algorithm with a worst-case guarantee [24], a dynamic programming based algorithm [33], search/dynamic programming based anytime algorithms [19,22,27], and a hybrid algorithm called ODP-IP which is a state-of-the-art algorithm that works by combining dynamic programming and anytime algorithms [14].
Let us consider the following simple scenario. There is a service company dispatching interpreters with three employees, Ali, Bob and Chan. Now, this service company has received seven job requests for simultaneous interpretation, each requiring some specific language skills. Table 1 shows the rewards of each job request.
Assume that you are the manager of this service company and want to assign employees to jobs such that the sum of the rewards is maximized. Then, this problem can be represented as an instance of the problem. If you assign three employees Ali, Bob and Chan to job requests 1, 2 and 3 separately, the sum of the rewards obtained by the coalition structure {{Ali}, {Bob}, {Chan}} is $20 + $50 + $10 = $80 . When you assign Ali to request 1, and Bob and Chan to request 6, the social surplus is maximized. In this case, the coalition structure is {{Ali}, {Bob, Chan}} and the service company earns $20 + $100 = $120.
In what follows, we are interested in the uncertainty of agents' attendances. In traditional , one does need to worry about whether all coalitions will be established or not, i.e., the probability of each agent to join any coalition he is assigned to is assumed to be 1.0. However, if we want to mimic the real world, it is natural to consider the uncertainty of agents' availabilities. For instance, it might happen that an agent is only available on some days or the week or cannot commit in advance with certainty.
In our example of a service company, it is natural that the manager asks the three employees about their schedules before assigning them to jobs. Also, setting an exact probability of each agent to join a coalition is not realistic, e.g., when Ali thinks that he can probably fit job request 1 into his schedule, he will report to the manager "I can probably work for request 1" but will not answer "I can work for it with probability 66% ". Here, or more specifically in the experimental part, we assume that (1) each agent chooses in advance one of the following attendance types (including the probabilities/attendance rates) according to his/her own schedule 1 (2) the maker (e.g. the manager of the service company in our example) knows all the information about the agents' types. Such simplification leads to a lack of generality in the sense that we allow the agents' availabilities to be chosen only from a small set of possible types, while the model allows for an infinite number of agent types. Still, it is arguably reasonable when considering applications, since allowing the agents/the maker to be able to use the whole interval [0, 1] when setting the availability probabilities does not seem sensible, as already discussed above.
In this paper, the main focus is laid on the Probabilistic Coalition Structure Generation ( ) problem which is an extension of where the availability type of each agent is considered. First, a formal framework for which generalizes the one from [26] is introduced. The aim is to find a coalition structure that maximizes the sum of the expected values of all coalitions. Any such coalition structure will be called optimal. What exactly is meant under the term "expected value of a coalition" is an important issue here. In our framework, the expected value of a coalition is parameterized by the parameter k. If up to k agents are missing from a coalition, the contribution of the remaining agents can be read off from the characteristic function. If the number of missing agents is higher than k, we assume that the contribution of the remaining agents is 0, as their (sub)coalition differs too much from the coalition originally planned. Note that when k = n − 1 , any subset of the original coalition gives its contribution to the expected coalition value, and this corresponds to Flexible [26]. On the other hand, if k = 0 , the model corresponds to Cautious [26]. To illustrate the influence of k, let us consider the expected value of the coalition {Ali, Bob, Chan} for k = 1 (i.e., we allow one agent to be missing). The expected value is computed by taking into consideration the following cases (1) Ali, Bob and Chan are available, (2) Ali and Bob are available, (3) Ali and Chan are available and (4) Bob and Chan are available.
Furthermore, we present approximation algorithms for solving the problem called Bounded Approximation Algorithm based on Attendance Types ( ) and Involved Bounded Approximation Algorithm based on Attendance Types ( ). In the problem, since finding the optimal coalition structure becomes easily intractable, it is important to consider fast but approximate algorithms. The basic idea of the proposed algorithm is that for a given parameter p , (1) agents whose attendance probability is below the parameter p are placed into singleton coalitions, (2) an optimal coalition structure of the relaxed problem with the remaining agents is computed. On the other hand, has the following two steps: (1) agents are split into two sets such that one is the set of agents whose attendance probability is below the parameter p and the other is the set of all the remaining agents, (2) optimal coalition structures of the two relaxed problems are computed. We also prove an upper bound on the error of the solution returned by and with respect to the optimum. The error bound is a theoretical worstcase bound that is obtained a priori, that is, before actually running the algorithm. Finally, the performances of and are evaluated on a number of benchmarks. The rest of the paper is organized as follows. In Sect. 2, the Coalition Structure Generation ( ) problem is briefly described. Section 3 introduces a formal framework for the Probabilistic Coalition Structure Generation ( ) problem that will be used in the rest of the paper. In Sect. 4, approximation algorithms for the problem called Bounded Approximation Algorithm based on Attendance Types ( ) and Involved Bounded Approximation Algorithm based on Attendance Types ( ) are presented, together with their approximation guarantees. In Sect. 5, and are evaluated on a number of benchmarks. Section 6 discusses related work, and, finally, Sect. 7 concludes the paper.

Coalition structure generation
We briefly describe the Coalition Structure Generation ( ) problem [1,18,23]. involves partitioning a set of agents into coalitions such that the social surplus (i.e., the sum of the values of all coalitions) is maximized. Let us start with some preliminary definitions.
Let A = {a 1 , a 2 , … , a n } be a finite set of agents. A coalition from A , denoted by C, is a non-empty subset of A . A coalition structure on A , denoted by CS , is a partition of A , that is, a jointly exhaustive set of pairwise disjoint coalitions from A . More formally, a coalition structure on A is a finite set of coalitions satisfying the following two conditions: In other words, each agent belongs to exactly one coalition. Note that some agents may be alone in their coalitions. In our running example of a service company with three employees, there exist seven possible coalitions (i. The Coalition Structure Generation problem description is defined as follows: Definition 1 ( problem description) A coalition structure generation problem description is defined by a pair = ⟨A, v⟩ where A = {a 1 , a 2 , … , a n } is a set of agents and v ∶ 2 A → ℝ is a function called the characteristic function.
The value of a coalition C, denoted by v(C), is given by the characteristic function v. The value of a coalition structure CS , denoted by V(CS) , is provided by the sum of the values of all coalitions, i.e., A coalition structure is said to be optimal, denoted by CS * , if it maximizes the social surplus, that is, if it satisfies the following condition: ) Consider the service company with three employees Ali, Bob and Chan introduced in the previous section. Assume that you are the manager of this service company and want to assign the employees to jobs such that the sum of the rewards from Table 1 is maximized. Then, this problem can be represented as an instance of the problem: let = ⟨A, v⟩ be a problem description with A = {Ali, Bob, Chan} , and the function v is then characterized as follows: Table 2 shows the rewards associated with all possible coalition structures. The optimal coalition structure in this example is CS * = {{Ali}, {Bob, Chan}} , and the obtained reward is V(CS * ) = v({Ali}) + v({Bob, Chan}) = $20 + $100 = $120 . As illustrated by this running example, the characteristic function v is supposed to be represented extensively, as the set of pairs {(C, v(C)) | C ⊆ A and C ≠ �}.

The
problem is defined as follows: -Input A problem description = ⟨A, v⟩, -Question Find an optimal coalition structure CS * .
Let us now focus on the specific cases of the problem, when the characteristic function v is subadditive or superadditive. For a problem description = ⟨A, v⟩ , the characteristic function v is said to be subadditive if for any coalitions C i and C j with . It is well-known that in case v is subadditive, the coalition structure formed by singleton coalitions is optimal, .  i.e., CS * = {{a i } | a i ∈ A} . For the superadditive case, the grand coalition is optimal, namely CS * = {A} [24].

Probabilistic coalition structure generation
We now introduce the Probabilistic Coalition Structure Generation ( ) problem, which will be in focus of this work. As opposed to the traditional , where we are guaranteed that all coalitions will be established, we want to have the real world in mind. So, it is natural to consider the uncertainty of agents' availabilities.
is an extension of where exactly the attendance type of each agent is considered. The aim is to find an optimal coalition structure that maximizes the sum of the expected values of all coalitions. Note that it is not a priori clear how one should arrive at the expected value of a coalition. Therefore, an important issue in is how exactly the expected value of a coalition is computed. In our framework, the computation of the expected value depends on the parameter k. If up to k agents are missing from a coalition, the contribution of the remaining agents can be read off from the characteristic function. If the number of missing agents is higher than k, we assume that the contribution of the remaining agents is 0, as their (sub)coalition differs too much from the coalition originally planned. When k = n − 1 , any subset of the original coalition gives its contribution to the expected coalition value, and this corresponds to Flexible [26]. On the other hand, if k = 0 , the model corresponds to Cautious [26].
In the experimental part, we in addition assume that (1) each agent chooses in advance one of the following attendance types (including the probabilities/attendance rates) according to his/her own schedule: The Probabilistic Coalition Structure Generation problem description is then defined as follows: Definition 3 ( problem description) A probabilistic coalition structure generation problem description is defined by a tuple = ⟨A, v, f , k⟩ where A = {a 1 , a 2 , … , a n } is a set of agents, v ∶ 2 A → ℝ is a characteristic function, f ∶ A → [0, 1] is a function that gives the probability/attendance rate of each agent and k is an integer, 0 ≤ k ≤ n − 1.
Here, the participation of each agent a is a binary random variable that takes value 1 with probability f(a) and value 0 with the remaining probability. Furthermore, we assume these random variables to be mutually independent, that is, the participation of one agent has no influence on those of other agents.
For a coalition C, let Ã ⊆ C be the set of absent agents where |Ã| ≤ k , 0 ≤ k ≤ n − 1 . For any such Ã , the coalition that remains after removing Ã from C is denoted by C ⧵Ã , and the value of this coalition is given by v(C ⧵Ã) . The contribution of this coalition to the expected value of coalition C, denoted by v k e (C,Ã) , is If |Ã| > k , then we define v k e (C,Ã) = 0. Now, the expected value of a coalition C, denoted by v e,k (C) , is given by Finally, the expected value of a coalition structure CS , denoted by V e,k (CS) , is computed as the sum of the expected values of all coalitions, i.e.
A coalition structure is said to be optimal, denoted by CS * e,k , if it maximizes the sum of the expected values of all coalitions, i.e. if the following condition holds: Example 1 (continued) Consider our running example of a service company with three employees. Assume that Ali reported Type 1 (i.e., {available ( 90%)}), Bob chose Type 3 (i.e., {unsure ( 50%)}), and Chan selected Type 4 (i.e., {probably not available ( 30%)}). Moreover, the manager sets the parameter k = 1 , that is, he/she wants to maximize the expected value of a coalition structure where at most one employee may be absent from a coalition in order for the remaining agents to still have a positive contribution to the expected coalition value. The expected value of each coalition is then computed by using the rewards from Table 1 as in Eqs. (2) and (3): Notice that when compared to Flexible [26], the parameter k = 1 only has an influence on the expected value of the grand coalition in this example. All other coalitions have size of at most 2, such that if more than one agents is missing, the set of the remaining agents is empty, and the contribution of the empty set is 0, independently of k. Consider now the grand coalition (i.e., {Ali, Bob, Chan} ). The positive contributions to the expected value of the grand coalition are given as follows: . -Bob and Chan The expected value of the grand coalition is then v e,1 ({Ali, Bob, Chan}) = 14.85 + 22.05 + 8.1 + 1.5 = 46.5. Table 3 shows the expected values of all possible coalition structures for k = 1 . Compared to the optimal coalition structure CS * = {{Ali}, {Bob, Chan}} in our example, the optimal coalition structure here is CS * e,1 = {{Bob}, {Ali, Chan}} , and the expected value obtained by The problem is defined as follows: The problem is a generalization of the problem. In case the attendance rate of each agent is 1.0, that is, it is guaranteed that every agent will join any coalition he/she is assigned to, then the problem reduces to the standard problem. This fact is independent of the choice of parameter k, i.e., it holds for any k, 0 ≤ k ≤ n − 1.
Let us now focus on the special case of the problem when the characteristic function v is subadditive and k = n − 1 . In that case, the analog of the result in the standard framework also holds in the framework (which for k = n − 1 corresponds to Flexible [26]). More precisely, if v is subadditive, the coalition structure formed by singleton coalitions is optimal (i.e., CS * e,n−1 = {{a i }|a i ∈ A} ) [26]. We now investigate the influence of the parameter k on the optimal solution in the problem, when the characteristic function v is subadditive.

Proposition 1 Let
= ⟨A, v, f , k⟩ be a probabilistic coalition structure generation problem description. If v is subadditive, then the coalition structure formed by singleton coalitions is optimal for any k ∈ [0, n − 1] . More precisely,

Proof
Since v is subadditive, we know that for any C 1 , . We want to compare v e,k (C 1 ) + v e,k (C 2 ) and v e,k (C 1 ∪ C 2 ) . Namely, if we prove that v e,k (C 1 ) + v e,k (C 2 ) ≥ v e,k (C 1 ∪ C 2 ) , the claim will follow. We know that Now, notice that for every C ⊆ (C 1 ∪ C 2 ), |C| ≤ k , there exist exactly one C 1 ⊆ C 1 and exactly one C 2 ⊆ C 2 such that C =C 1 ∪C 2 . Then, But now, since |C 1 | ≤ k and |C 2 | ≤ k , the two summands in the last expression of Eq. (6) must also appear as summands in v e,k (C 1 ) and v e,k (C 2 ) , respectively. This means that every summand in Eq. (5) is upper bounded by one summand of v e,k (C 1 ) and one summand of v e,k (C 2 ) . Therefore, we can conclude that v e,k ( When v is superadditive, on the other hand, we cannot come to an immediate conclusion that CS * e,k = {A} . As a preliminary example, let us focus on the case when k = 0 , which corresponds to Cautious [26]. meaning that superadditivity of v does not transfer to superadditivity of v e,0 , and the grand coalition is not an optimal coalition structure. We prove a more general statement in Proposition 2.
, there exists an instance of the problem with problem description = ⟨A, v, f , k⟩ such that v is superadditive and the grand coalition is not an optimal solution.
where the strict inequality follows from (1 − p) n−1 > 0, ∀p > 0 . Since the coalition structure that consists of all singletons has a higher value than the grand coalition, we know that The case k = n − 1 needs to be treated separately, as it yields a different result, that is aligned with the one for superadditivity in the standard framework.
1⟩ be a probabilistic coalition structure generation problem description. If v is superadditive, then the grand coalition is the optimal solution. More precisely, Proof Let CS = {C 1 , … , C m } be any coalition structure different from the grand coalition. The result follows by observing that for any realisation A ⊆ A of the set of the available agents, because of superadditivity of v, we know that Finally, let us note that an instance of the problem can be represented as a zeroone integer program in the same way as for the problem, if we know the v e,k values of all coalitions. For simplicity, we show the zero-one integer programming formulation of our running example for . The objective function and the constraints are formalized as follows: s.t a 1 + a 12 + a 13 + a 123 = 1, (9) a 2 + a 12 + a 23 + a 123 = 1, Variables a 1 , a 2 , … , a 123 represent all possible coalitions, e.g., a 1 is the coalition {Ali} , and a 123 represents the grand coalition (i.e., {{Ali, Bob, Chan}} ) and can take values 0 or 1 [Eq. (11)]. Equation (8) describes that Ali belongs to one of the coalitions {Ali}, {Ali, Bob}, {Ali, Chan}, {Ali, Bob, Chan} , and he cannot belong to more than one coalition simultaneously. Similarly, Eqs. (9) and (10) show the constraints for Bob and Chan. Equation (7) represents the objective function which maximizes the sum of the expected values of all coalitions, and each coefficient shows the expected value obtained by the corresponding coalition, e.g., 18 is the expected value of the coalition {Ali} and 46.5 is the expected value of the grand coalition.

Bounded approximation algorithms
In this section, we present two approximation algorithms for solving the problem called Bounded Approximation Algorithm based on Attendance Types ( ) and Involved Bounded Approximation Algorithm based of Attendance Types ( ). Even though our algorithms can be seen as simple heuristics, we term them algorithms as they offer a theoretical bound on the quality of the returned solutions.
Let us for a moment assume that we can get the v e,k function as input, instead of just the characteristic function v and parameter k. Even though this assumption saves a lot of computation that would be necessary to arrive at v e,k , still the input, i.e., the representation size, is exponential in the number of agents. Thus, it is important to consider approximation algorithms, i.e., to consider a trade-off between the quality of the returned solution and tractability. To this end, we prove an a priori upper bound on the error of the solution returned by both and , i.e., the error bound is obtained before actually running the algorithm.

Approximation algorithm
has the following two phases: Phase 1: For a given parameter p ∈ ℝ and the attendance rates of all agents, form singleton coalitions for every agent a whose attendance rate p is such that p ≤p. Phase 2: Find an optimal coalition structure of the relaxed problem with the remaining agents.
The basic idea of is that for a given parameter p , the singleton coalition is formed for an agent who chooses the attendance type where the given probability/attendance rate is less than or equal to the parameter p . Then, an optimal coalition structure of the relaxed problem with the remaining agents is computed. We denote the coalition structure on the subset of agents that remain after singletons are formed in Phase 1, which is obtained in Phase 2, by CS − e,k and the coalition structure provided by , which is a coalition structure on the whole set of agents, by CS + e,k .
Let us further explain how computes an approximate solution by using our running example. We are given the attendance type (that includes the attendance rate) of each agent, i.e., f (Ali) = 0.9 (Type1: available), f (Bob) = 0.5 (Type 3: unsure), f (Chan) = 0.3 (Type 4: probably not available), the parameter k = 1 , and the following expected values of coalitions: Let p = 0.3 , i.e., the manager does not count on agents who chose attendance Types 4 (i.e., probably not available ( 30% )) and 5 (i.e., not available ( 10%)). Since Chan reported attendance Type 4, i.e., f (Chan) = 0.3 =p , the singleton coalition is formed for Chan in Phase 1 of . Then, the relaxed problem with the remaining agents (i.e., Ali and Bob) is solved in Phase 2 of . 2 That is, the coalitions which include Chan can be ignored in the simplified problem and only the following is still relevant: Since the expected value of the coalition formed by Ali and Bob is equal to the sum of the expected values of singleton coalitions with Ali and Bob, that is,

Quality guarantee of
We show that we can provide an upper bound on the error of the solution returned by a priori, i.e., the error bound is obtained before actually running the algorithm. Let us denote by C a the set of all coalitions that contain agent a as a member, and let Ã be the set of agents whose probability is lower than or equal to a given parameter p such that they form their own coalition in Phase 1. Furthermore, let r max a = max{v e,k (C) | C ∈ C a } , i.e., r max a is the maximal expected value of all coalitions which include agent a.

Lemma 1 Let
= ⟨A, v, f , k⟩ be a probabilistic coalition structure generation problem description. For an optimal coalition structure CS * e,k and a coalition structure CS − e,k obtained by in Phase 2, the following inequality holds: Proof We prove the claim by induction on the size of Ã . In the base case, Ã = � . Then, since no agents are removed in Phase 1 of , the whole instance is solved to optimality in Phase 2. This means that    (12) holds.
Let us now assume that inequality (12) holds for all Ã such that |Ã| = . More specifically, we assume that Next, we consider the case where Ã = {ã 1 , … ,ã ,ã +1 } . We first observe the coalitions that form the expected optimal coalition structure CS * e,k and denote these by CS * e,k = {C * 1 e,k , … , C * m e,k } . We also know that ã 1 must be in one of these coalitions and without loss of generality we can assume that C * 1 e,k = {ã 1 , b 1 , … , b q } . It might happen that b i =ã j for some i ∈ [q] , j ∈ [ + 1] ⧵ {1} but this does not influence the following inequalities: where V e,k (CS −ã 1 e,k ) denotes the optimal coalition structure on the set of agents A ⧵ {ã 1 } . The last inequality follows from the fact that {{b 1 }, … , {b m }, C * 2 e,k , … , C * m e,k } is a coalition structure over the agents in A ⧵ {ã 1 } and CS −ã 1 e,k is an optimal such coalition structure. Now, we need to still remove agents ã 2 , … ,ã n to reach Phase 2 of and to be able to compare V e,k (CS * e,k ) with V e,k (CS − e,k ) . However, by the induction hypothesis (13)   Let us now focus on a special case of the problem in which we can give a better bound on the quality of the solution returned by .

Proposition 4 Let
= ⟨A, v, f , k⟩ be a probabilistic coalition structure generation problem description. In case the expected values of all coalitions satisfy subadditivity, i.e., v e,k is a subadditive function, the coalition structure CS + e,k obtained by is optimal, i.e., it holds that Proof In the case where v e,k is subadditive, the optimal coalition structure CS * e,k is formed by singletons. So, we just need to show that CS + e,k has the form CS + e,k = {{a i }|a i ∈ A} . In Phase 1 of , only singleton coalitions are formed, independently of v e,k . In Phase 2, since v e,k is subadditive, the optimal coalition structure CS − e,k of the relaxed problem is formed by singleton coalitions. Thus, the solution CS + e,k obtained by is formed by singleton coalitions. ◻

Approximation algorithm
Let us now try to give a better approximation algorithm for by having a more involved approach to agents that have a low attandance rate. Instead of immediately forming singleton coalitions, we will find an optimal coalition structure on the subset of such agents. We will do the same on the remaining set of agents (as in ) and the final solution will be the union of these two optima.
has the following two phases: Phase 1: For a given parameter p ∈ ℝ and the attendance rates of all agents, split the agents into two sets such that every agent a whose attendance rate p is such that p ≤p is in A 1 and all other agents are in A 2 . Phase 2: Find optimal coalition structures of the two relaxed problems on A 1 and A 2 and output their union.
Going back to our running example, let us compare the solutions returned by and . Recall that the attendance types are given by f (Ali) = 0.9 (Type1: available), f (Bob) = 0.5 (Type 3: unsure), and f (Chan) = 0.3 (Type 4: probably not available). We still assume k = 1 , and the expected values of coalitions can be seen in Table 3. If we, as before, choose p = 0.3 , then only Chan is placed in A 1 . Previously we already found that the optimal solutions of the relaxed problem on

Tightness of the quality guarantee
Next, we show that the bounds given by Theorems 1 and 2 are in fact tight, meaning that there is an instance for which (14) and (17) hold with equality.

Proposition 5
The quality guarantees for and given by Theorems 1 and 2, respectively, are tight.
Proof Let us construct an instance where A = {a 1 , … , a n , b 1 , … , b n } and the characteristic function is such that all subsets of the agent set have value 0, other than C ∈ {{a 1 , b 1 }, … , {a n , b n }} for which v(C) = 1 . Furthermore, let the attendance of any agent a i , i ∈ [n] be higher than p and the attendance of any agent b i , i ∈ [n] be lower than p.
Then, both and will return a solution of value 0 for any k ∈ [n − 1] , while the optimum has value n. Additionally, the loss of the algorithms is exactly described by the term ∑̃a In summary,

Experimental evaluation
In the experiments, and are evaluated on a number of benchmarks. We note that since this is the very first work that proposes non-complete algorithms for , a comparison to existing algorithms from the literature is not possible. The only possible comparison is the one to the complete algorithm that solves the problem optimally, and this is, indeed, one of the benchmark used. In our experiments, and are implemented in Python and all experiments are carried out on a 6 core running at 3.3GHz with 32GB of RAM.
As preliminary experiments, we first investigate the influence of the value k, i.e., the number of agents who may be absent from a coalition such that the remaining agents still give a contribution to the expected coalition value, on the runtime of . We also look into the number of singleton coalitions in the optimal coalition structure, and also what are the agent types that tend to form singleton coalitions. Then, the quality of the solutions returned by and is evaluated with respect to the optimal solutions. In order to compute optimal coalition structures, we used the CPLEX solver. The attendance type of each agent was randomly chosen from Type 1: {available ( 90% )} to Type 5: {not available ( 10%)}. For each setting, 50 problem instances were generated, each based on one of the several probability distributions for the characteristic function v that are commonly used in the literature:

Preliminary experiments
The influence of the value k is investigated. More precisely, we set p = 0 and compare the runtime for finding an optimal coalition structure by varying the value k. Figure 1 represents the average runtime of all distributions for different k (i.e., k = 0, 1, 2, 3, 4, 5 ). As one can see, we observed similar results for all distributions (1)- (6). When the number of agents is small, the influence of k on the runtime is not significant. However, the difference quickly becomes larger when the number of agents increases, e.g., in the case when the number of agents is 14 and the distribution is uniform, the average runtime is 35.4 s for k = 1 and 809 s for k = 5 . The experimental results thus indeed confirmed what was expected, that is, the influence of k on the runtime becomes larger when the number of agents increases. This is the case, because it is necessary to consider ( |C| k ) (which is equal to ( n k ) = (n k ) in the worst case) summands in Eq. (3) in order to compute the expected value for each coalition C. For instance, in our running example of a service company dispatching interpreters with three employees, when we set k = 2 , it is required to consider 3 0 + 3 1 + 3 2 = 7 cases to compute the expected value for the coalition {Ali, Bob, Chan} . That is, we consider the case where all employees are available, one of them might be absent, and two of them might be absent. However, in case the number of agents increases from 3 to 4, we need to consider 4 0 + 4 1 + 4 2 = 11 possible cases for computing the expected value for k = 2.
Next, we look into the number of singleton coalitions in the optimal coalition structure. Table 4 shows the average number of singletons in the optimal coalition structure CS * e,k where k ∈ [0, 5] for all considered distributions. We can see that the average number of singleton coalitions differs for different distributions. For the Normal case of Cautious problems, i.e., for k = 0 , all of the agents form singleton coalitions in the optimum, while this is not the case with other considered distributions. Additionally, the number of singletons formed drastically decreases when the parameter k changes from 0 to 1. For instance, for the Beta distribution it changes from 7.6 to 1.4 when the number of agents is 14. For all distributions we can see that there are few singleton coalitions in the optimal coalition structure when k ≠ 0.
Lastly, we analyze which agent types tend to form singleton coalitions in the optimal solution. More specifically, building upon the result of Table 4, we count the number of agents of each types that form a singleton coalition in the optimal coalition structure CS * e,0 . Figure 2 shows the ratio of each type among the agents that formed singleton coalitions in the optimum, where the number of agents is 14 and k = 0 . We observed that in most cases the agents with Type5, that is, the agents whose attendance rate is   Table 4 The average number of singletons in the optimal coalition structure CS * e,k The smallest values in each row are written in bold  Beta lowest, tend to form singletons in the optimal coalition structure. A notable exception is the Normal case. Since the optimal coalition structure completely consists of singleton coalitions in the Normal case (see Table 4), one cannot see a difference among types.

Performance of the approximation algorithms
We set k = 1 and test the two algorithms, while using two different values of the parameter p , 0.3 and 0.7, to see the needed computation time and the quality of the returned solutions with respect to the optima. In Table 5 which is an upper bound on (22) by Theorem 2, respectively. All of (a), (b), (c), (d) are always at least 1.0 and a value close to 1 is desirable. From the results for (a) and (b), we can see that when p = 0.3 , and have an observed approximation ratio of less than 1.05 in most cases. For instance, when the number of agents is 16, the result of (a) with the uniform distribution is 1.034, and the result of (b) is 1.019. However, when we set p = 0.7 , in most cases we can see that the values obtained by are more than 1.3, while the ones of are less than 1.2. For instance, when the number of agents is 12, the results of (a) and (b) with modified uniform distribution are 1.444 and 1.156, respectively. Regarding (c) and (d), we can see that the value (i.e., the a priori bound) increases gradually in all cases as the number of agents increases for both algorithms. For instance, with the uniform distribution the result of (c) for the parameter p = 0.3 increases from 2.013 to 2.806 when the number of agents increases from 6 to 16, and the result of (d) for the parameter p = 0.7 increases from 3.339 to 5.255. Figure 3 represents the average runtime of and for all distributions (1)- (6). The x-axis shows the number of agents and the y-axis represents the average runtime. As one can see, we observed similar results for all distributions.
V e,1 (CS * e,1 ) V e,1 (CS ++ e,1 ) When the parameter p is 0.3 and the number of agents is small, the complete algorithm, and can solve the problems very quickly. However, as the number of agents increases, the difference between the complete algorithm and the approximation algorithms becomes significant. For instance, when the number of agents is 16 and the distribution is uniform, the average runtime of and is 1.8 s and 2.1 s, respectively, while it is 593.3 s for the complete algorithm. The average runtime of and are very similar. However, when the parameter p is 0.7, we can see that the average runtime of increases and the one of decreases. For example, with uniform distribution the average runtime of increases from 2.1 to 74.8 s when the parameter p increases from 0.3 to 0.7, while the average runtime of decreases from 1.8 to 0.1 s. In total, the experimental results are not very surprising but instead confirm and make precise the natural intuition of what kind of impact on the performance should changing some of the parameters of our model and the algorithms' parameter p have. In summary, they reveal that (1) when the parameter p is set to 0.3, the approximation ratio of and is less than 1.05 in most cases, and (2) when the parameter p increases, can solve the problems faster than , while has a better solution quality than . Therefore, the user can decide how to set the paremeter p according to his preference and if most of the agents have participation rates that are higher than p , both algorithms have a very similar computation time, but the solution returned by is always at least as good as the solution returned by . Furthermore, if many agents have participation probabilities smaller or equal to p , the user chooses which algorithm to use according to whether his priority is to get an approximately optimal solution fast or he is more interested in the solution quality being as good as possible.

Related work
Our work is most closely related to the probabilistic model introduced in [26]. Our framework includes the two variants of probabilistic from [26], namely Cautious and Flexible , as special cases for values of parameter k = 0 and k = n − 1 , respectively. For the experiments, however, we restrict our attention to a limited number of attendance types, while in [26] there is no limitation placed on the number of types. In coalition formation, which also includes , many works have been devoted to the uncertainty of forming a coalition. Chalkiadakis and Boutilier [3] focused on the uncertainty of the types (capabilities) of the agents, and proposed a Bayesian reinforcement learning framework for repeated coalition formation under type uncertainty. In this framework, the agents maintain and update beliefs about the types of others through the experience gained by repeated interaction, and through this process improve their ability to form useful coalitions. Compared to this work, we rather focus on the attendance type   . The x-axis shows the number of agents and the y-axis represents the computation time in seconds (i.e., the uncertainty of agents' attendances), which is different from capability uncertainty. Also, the coalition values depend on the capabilities and the actions in the aforementioned work, while we compute them with the payoffs given by the characteristic function and the attendance rate of the attendance type.
Kraus et al. [11] worked on coalition formation under coalitional value uncertainty. In this framework, a set of tasks is given, and each task is performed by a different agent. The agents do not know the value of a task of another agent or the cost of performing it, but they know the overall payoff associated with performing a set of tasks and the capabilities of other agents. Faye et al. [8] worked on dynamic coalition formation in dynamic uncertain environments. This work investigates dynamic, uncertain environments in which tasks may evolve during execution, and agents and resource availability may vary rapidly and unpredictably. None of those works actually considers the attendance rate of each agent. Also, how the coalition values are computed in makes our model quite different from previous work.
Moreover, related to our work is the Team Formation Problem ( ) [15,32]. is the problem of forming the best possible team to perform some tasks of interest, given limited resources. Nair and Tambe [15] worked on forming a team with the maximum expected value, under the constraint that it has all the required skills to accomplish the tasks of interest. To give a comparison between the problem and the problem, it is useful to keep in mind that (and ) is similar to the complete set partition problem [33], while is equivalent to the set cover problem [10]. Okimoto et al. [16,17] worked on the robustness issue in team formation problems. In these papers, a set of agents and a set of tasks are given, and the aim is to form a team which is robust, i.e., which can achieve the given tasks even if some agents break down. The parameter k in (i.e., the number of agents who may be absent from a coalition) has a somewhat similar role as the robustness considered in team formation.
In traditional , since the value of each coalition is assumed to be given by a characteristic function, the representation size is exponential in the number of agents [24,25]. In order to solve this problem, several compact representation schemes for characteristic functions have been proposed [4,9,29,31], e.g., concise representation scheme based on agent types [31], marginal contribution nets (MC-nets) [9] and synergy coalition groups (SCGs) [4]. The attendance type used in this paper is based on the idea introduced in [31], where the authors consider a situation where multiple agents have similar capabilities/skills called recognizable types, and the number of possible types is small. In our work, we use this idea for the attendance rate of each agent.

Conclusion
How to form a coalition is a major issue for many applications related to multi-agent cooperation. Coalition Structure Generation ( ) involves partitioning a set of agents into coalitions such that the social surplus is maximized. Probabilistic Coalition Structure Generation ( ) is an extension of where the aim is to find an optimal coalition structure that maximizes the sum of the expected values of all coalitions. The contributions of this paper are as follows: • A formal framework for the Probabilistic Coalition Structure Generation ( ) is introduced where the attendance type of each agent is considered. This framework is a generalization of the framework introduced in [26], namely it includes the Cautious and Flexible treated in [26] as special cases. • Approximation algorithms for solving the problem called Bounded Approximation Algorithm based on Attendance Types ( ) and Involved Bounded Approximation Algorithm based on Attendance Types ( ) are presented. The characteristics of these algorithms are as follows: (1) there is an upper bound on the error of the returned solution and this bound can be obtained a priori, and (2) one can use any complete algorithm for solving to optimality the relaxed problems in Phase 2 of both algorithms.
• The performances of and are evaluated on a number of benchmarks. Our experimental results revealed that (1) both algorithms can solve the problems very quickly and provide high solution quality when the parameter p is low, and (2) we can see a trade-off between these algorithms by varying the parameter p , i.e., can solve the problems faster than for larger p , while has the better solution quality.
Regarding future work, a potential direction is investigating concise representations of the characteristic function in . Since the number of coalitions is exponential in the number of agents, it is reasonable to try to reduce the representation size of the characteristic function which provides the value of each coalition. One could try to apply the existing concise representations of the characteristic function from [4,9,29,31] in our framework, and then try to solve large-scale problem instances.
Furthermore, what could be of interest is applying our framework to real-world problems such as the nurse scheduling problem (NSP) [2] and the distributed vehicle routing problem [25]. Forming effective working groups by considering the nurses's attendances, amounts to solving the problem. Also, drivers' attendances in the vehicle routing problem can be represented in the framework. Lastly, our framework could potentially be extended to a dynamic setting in which the set of agents A may change over time. The objective here would be to apply such an extended framework to the distributed robot team reconfiguration problem [6].