Coalition structure generation in cooperative games with compact representations

This paper presents a new way of formalizing the coalition structure generation problem (CSG) so that we can apply constraint optimization techniques to it. Forming effective coalitions is a major research challenge in AI and multi-agent systems. CSG involves partitioning a set of agents into coalitions to maximize social surplus. Traditionally, the input of the CSG problem is a black-box function called a characteristic function, which takes a coalition as input and returns the value of the coalition. As a result, applying constraint optimization techniques to this problem has been infeasible. However, characteristic functions that appear in practice often can be represented concisely by a set of rules, rather than treating the function as a black box. Then we can solve the CSG problem more efficiently by directly applying constraint optimization techniques to this compact representation. We present new formalizations of the CSG problem by utilizing recently developed compact representation schemes for characteristic functions. We first characterize the complexity of CSG under these representation schemes. In this context, the complexity is driven more by the number of rules than by the number of agents. As an initial step toward developing efficient constraint optimization algorithms for solving the CSG problem, we also develop mixed integer programming formulations and show that an off-the-shelf optimization package can perform reasonably well.


Introduction
Coalition formation is an important capability in automated negotiations among selfinterested agents.Coalition structure generation (CSG) involves partitioning a set of agents into coalitions to maximize social surplus.This problem has become a popular research topic in AI and multi-agent systems (MAS) [4].Possible CSG applications include distributed vehicle routing [32], multi-sensor networks [7], and so on.The CSG problem is equivalent to a complete set partition problem [44], and various algorithms have been developed for solving it.Sandholm et al. [31] propose an anytime algorithm with worst-case guarantees.However, to obtain an optimal coalition structure, this algorithm must check all of the coalition structures.Thus, the worst-case time complexity is O(n n ), where n is the number of agents.On the other hand, dynamic programming (DP) based algorithms [21,25,44] are guaranteed to find an optimal solution in O (3 n ).The CSG problem can also be considered in partition function games (PFGs), where the value of a coalition depends on how the other agents are partitioned.Rahwan et al. [27] first considered the CSG problem in PFGs.We expand the discussion of related literature of CSG problems in the next subsection.
Such existing works on CSG assume that the characteristic function is represented implicitly, and we have only oracle access to the function, where the value of a coalition (or a coalition structure as a whole) can be obtained using some procedure.This is because representing an arbitrary characteristic function explicitly requires Θ(2 n ) numbers, which is prohibitive for a large n.When a characteristic function is represented by a black-box function, there is no room for applying constraint optimization techniques.
However, characteristic functions that appear in practice often display significant structure, and such characteristic functions can probably be represented much more concisely.Indeed, recently, several new methods for representing characteristic functions have been developed [5,6,14,20].These representation schemes, which capture the characteristics of the interactions among agents in a natural and concise manner, can significantly reduce the representation size.It is natural to assume that an organizer who wants to solve a CSG problem has knowledge on possible interactions among agents and can concisely represent her knowledge by a set of rules.For example, let us consider a situation where a professor is dividing students in her laboratory into several research groups.Each student has a specific feature, e.g., good/bad at programming, theory, writing, etc.The professor knows the synergies among these features, e.g., there is positive synergy between a student who is good at programming and another student who is good at theory.From the knowledge about students' features and the synergies, the professor can construct a set of rules.Surprisingly, to our knowledge, prior to our work these representation schemes had not yet been used for CSG, which is our goal in this paper.Using these compact representation schemes, a characteristic function is represented by a set of rules, rather than treating the function as a black box.The idea is to solve the CSG problem more efficiently by directly applying constraint optimization techniques to this compact representation.
Quite interestingly, we find that there exists some common structure among these cases; in essence, the problem is to find a subset of rules that maximizes the sum of rule values under certain constraints.For each case, we show that the CSG problem is NP-hard, and the size of a problem instance is naturally measured by the number of rules rather than the number of agents.Furthermore, as an initial step toward developing efficient constraint optimization algorithms for solving the CSG problem, we give a mixed integer programming (MIP) formulation that captures the above structure.We show that an off-the-shelf optimization package (CPLEX) can solve the resulting MIP problem instances reasonably well.

Related works
This subsection briefly explores related work.Traditional models of coalitional game theory often assume that the characteristic function is super-additive, forming the grand coalition is guaranteed to be optimal, and the main research topic in economics is how to divide the gain of the grand coalition among agents.The traditional theory of coalitional games provides a number of solution concepts, such as the core [10], the Shapley value [34], and the nucleolus [33].The main research topic in computer science is to analyze the computational complexity of problems related to these solution concepts.Since the seminal work by Megiddo [19], many works have been conducted, e.g., [8,9,11,12,17].
More recently, AI and MAS researchers have been considering the case where the characteristic function is not super-additive, i.e., where forming the grand coalition is not optimal.Often, in such a case, agents should form a coalition structure to maximize the reward they can obtain.This is called the coalition structure generation (CSG) problem, which has been an active research topic in AI and MAS. 1 Many algorithms for solving the CSG problem have been developed.For example, as we mentioned in the introduction, Sandholm et al. [31] develop an anytime algorithm with worst-case guarantees, and Rahwan et al. [29] develop another anytime algorithm called IP, while Rahwan and Jennings [25] develop a dynamic programming (DP) based algorithm, which runs in O (3 n ).Furthermore, Michalak et al. [21] develop an algorithm called ODP-IP by combining the anytime and the DP approaches.To the best of our knowledge, the state-of-the-art algorithm is ODP-IP, which just takes seconds to solve an instance with 25 agents.
The above works assume that a characteristic function is given as a black-box function.However, representing that function requires an exponential number of agent combinations.Thus, several concise representation schemes for a characteristic function have been proposed: marginal contribution nets (MC-nets) [14], synergy coalition groups (SCGs) [6], and SCGs in multi-issue domains (SCGs in MID) [5].It is natural to believe that utilizing the structure of a concise representation scheme helps us develop more efficient CSG algorithms.Subsequent to our first conference paper on this topic [24], several papers discuss the CSG problem under some concise representation.For example, Ueda et al. [39] considers the CSG problem where the value of a coalition is calculated by solving a distributed constraint optimization problem [22].Aziz and de Keijzer [1] and Ueda et al. [40] studied the CSG problem under an agent-type representation where agents are partitioned into several types and agents with the same type are identical.
Mixed integer programming (MIP) is a useful technique to solve an optimization problem like the CSG problem.As another work utilizing MIP technique beside our work, Tran-Thanh et al. [37] have proposed the coalitional skill vector model.In their model, there exists a set of skills and each agent has a skill vector which represents the agents' level.They formalized the CSG problem as a MIP formulation.Since their representation is different from our three representations, their MIP formulation has a different structure: 2 n decision variables with an exponential number of constraints.To solve this MIP formulation, they formalized an LP relaxation problem and its dual problem.Then they developed a constraint generation based algorithm that solved the instances with 500 agents in less than an hour.
While the value of a coalition depends on its agents in characteristic function games, it can be affected by how the others are partitioned if we consider real-world applications.Such a game, which is represented by a partition function, is called a partition function game (PFG) [36].In economics, the above solution concepts are extended to handle such games with externalities.The representative solution concepts include the Myerson value [23], which is an extension of the Shapley value for PFG.On the other hand, associated computational problems have been considered by AI and MAS researchers.Rahwan et al. [27] first considered the CSG problem in the restricted classes of PFGs where only positive or negative externalities exist.Michalak et al. [20] extend MC-nets to handle externalities, and their proposed representation is called the embedded MC-nets representation.Skibski et al. [35] propose another representation called Partition Decision Trees and developed efficient algorithms that compute the extensions of the Shapley value in polynomial time.
Another research line in AI and MAS considers games on graphs.In these researches, the existence of an underlying graph is assumed, and the graph represents, for example, a communication network among agents.Voice et al. [42] introduced the independence of disconnected members (IDM) property in which two agents do not affect each other if they are disconnected on the graph.They formalized a graph coalition structure problem (GCSG) where a characteristic function satisfies the IDM property and examined the computational complexity of GCSG.They also developed algorithms for various types of graphs.Furthermore, Voice et al. [43] proposed Coalition Formation with Sparse Synergies (CFSS), where a coalition is feasible if and only if there exists a connected subgraph of the given underlying graph.However, a CSG algorithm for CFSS will be inefficient since the search space grows exponentially with regards to the number of agents.To overcome this issue, Bistaffa et al. [2] proposed an anytime algorithm, which provides an anytime solution with quality guarantees.
Rahwan et al. [26] proposed a very general framework for constrained coalition formation (CCF) games.Even though SCG and CCF utilize an organizer's knowledge of the relations among the agents, they have different properties for representing games.Assume an organizer only knows (1) coalitions where the synergies exist, and (2) the values of these coalitions.If the synergies are sparse, the organizer can directly and concisely represent her knowledge using SCG.It is possible to represent her knowledge on (1) using CCF, where the synergies are represented as positive constraints.However, using CCF, the organizer must provide a characteristic function as well.In principle, a characteristic function takes any coalition as its argument and returns its value, regardless of the coalition has synergy or not.It is not obvious whether the organizer can concisely represent such a characteristic function.If we explicitly represent a characteristic function as a table, we require O(2 n ) space.Liao et al. [18] proposed a CSG algorithm that utilizes an MaxSAT solver.However, their SAT encoding method is for characteristic function games and cannot handle partition function games in which externalities exist among coalitions.On the other hand, our method can handle a partition function game represented as embedded MC-nets.Iwasaki et al. [15] develop an empirically efficient algorithm for computing imputation in such situations.They further propose a new solution concept called weak ε-core + .

Characteristic function games
Let A be the set of all agents, where |A| = n.We assume a characteristic function game, i.e., the value of coalition S is given by characteristic function v. Characteristic function v : 2 A → R assigns a value to each set of agents (coalition) S ⊆ A. Without loss of generality, we assume ∀S ⊆ A, v(S) ≥ 0 holds.As previously shown [31], even if some coalitions' values are negative, as long as each coalition's value is bounded (i.e., not infinitely negative), we can normalize the coalition values so that all the values are non-negative.This rescaled game is strategically equivalent to the original game.
Coalition structure generation (CSG) involves partitioning a set of agents into coalitions to maximize the social surplus.Coalition structure C S is a partition of A into disjoint, exhaustive coalitions.To be more precise, C S = {S 1 , S 2 , . ..} satisfies the following conditions: ∀i, j (i = j), S i ∩ S j = ∅, In other words, in C S, each agent belongs to exactly one coalition, and some agents may be alone in their coalitions.
We denote by Π(A) the space of all coalition structures over A. The value of coalition structure C S, denoted as V (C S), is given by: Optimal coalition structure C S * satisfies the following condition: We say a characteristic function is super-additive, if for any disjoint sets S i , S j , v(S i ∪ S j ) ≥ v(S i ) + v(S j ) holds.If the characteristic function is super-additive, solving CSG becomes trivial, i.e., the grand coalition (the coalition of all agents) is optimal.
Super-additivity means that any pair of coalitions is better off by merging into one.One might think that super-additivity holds in most cases since the agents in the composite coalition can work separately and perform at least as well as the case when they were in different coalitions.However, organizing a large coalition can be costly; e.g., there might be coordination overhead like communication costs or anti-trust penalties.Also, if time is limited, the agents might not have time to carry out the communications and computations required for effective coordination within the composite coalition, so component coalitions may be more advantageous.In any case, even if the characteristic function is superadditive for the simple reason that agents in a composite coalition can always choose to work separately in subteams of the coalition, this still leaves the problem of finding the optimal subteam structure, which is the same problem as the CSG problem we face here.That is, in this case probably the most natural representation of the characteristic function v is a function v that gives the values of coalitions without considering that they can work in subteams, and we would have to solve the CSG problem with respect to v .Thus, we assume a characteristic function can be non-super-additive.
Example 1 Assume four agents, a, b, c, and d.The characteristic function is given as follows: In this case, there exist multiple optimal CSs.For example, {{a, b, c}, {d}} and {{a, b, d}, {c}} are optimal CSs, and the value of these CSs is 10.

Compact representations
Let us briefly describe three existing compact representation schemes: marginal contribution nets, synergy coalition groups, and multi-issue domains.We first introduce a concise representation of a characteristic function called marginal contribution networks (MC-nets), developed by Ieong and Shoham [14].
Definition 1 (MC-nets) An MC-net consists of a set of rules R. Each rule r ∈ R is of the following form: (L r ) → v r , where L r is the condition of this rule, which is a conjunction of literals over A, i.e., a 1 ∧ • • • ∧ a k ∧ ¬a k+1 ∧ • • • ∧ ¬a m .We call P r = {a 1 , . . ., a k } positive literals and N r = {a k+1 , . . ., a m } negative literals.We say that rule r is applicable to coalition S if L r is true when the values of all Boolean variables that correspond to the agents in S are set to true, and the values of all Boolean variables that correspond to agents in A \ S are set to false, i.e., a∈S a ∧ b∈A\S ¬b | L r holds.For coalition S, v(S) is given as r ∈R S v r , where R S is the set of rules applicable to S. Thus, for coalition structure C S, V (C S) is given as S∈C S r ∈R S v r .
In MC-nets, the condition of a rule must be the conjunctions of some literals.Such a rule is basic.Also, we call a rule that has a more complicated condition a non-basic rule.A non-basic rule must be transformed into multiple basic rules, whose conditions are disjointed from each other.For example, a non-basic rule, which has form (a ∨ b ∨ c) → v, is transformed into three basic rules: (a) → v, (¬a ∧b) → v, and (¬a ∧¬b∧c) → v. Furthermore, without loss of generality, we assume each rule has at least one positive literal.For example, if a rule has form ¬a 1 → 1 and there exist agents a 1 , a 2 , . . ., a n , we can create the following equivalent rules: Example 2 Assume five agents, a, b, c, d, and e, and four rules: and r 4 : (c ∧ ¬e) → 1.In this case, r 1 and r 2 are applicable to coalition {a, b, c, e}, but r 3 and r 4 are not.Thus, v({a, b, c, e}) equals 3 + 2 = 5.
Next we describe a concise representation of a characteristic function called a synergy coalition group (SCG), introduced by Conitzer and Sandholm [6].The main idea is to explicitly represent the value of a coalition only when there exists some positive synergy.
Definition 2 (SCG) An SC G consists of a set of pairs of the following form: (S, v(S)).For any coalition S, the value of the characteristic function is where p S is a partition of S, i.e., all the S i are disjoint and S i ∈ p S S i = S, and for all the S i , (S i , v(S i )) ∈ SC G. To avoid senseless cases that have no feasible partitions, we require that ({a}, 0) ∈ SC G whenever {a} does not receive a value elsewhere in SC G.
Thus, if the value of coalition S is not given explicitly in SC G, it is calculated from the possible partitions of S. Using this original definition, we can represent only super-additive characteristic functions, i.e., for any disjoint sets S i , S j , v(S i ∪ S j ) ≥ v(S i ) + v(S j ) holds.But, as mentioned in Sect.2.1, if the characteristic function is super-additive, solving CSG becomes trivial: the grand coalition is optimal.To allow for characteristic functions that are not super-additive, we add the following requirement on partition p S : -For all possible subsets p S of partition p S where This additional condition requires that if the value of a coalition is explicitly given in SC G, then we cannot further divide it into smaller subcoalitions to calculate the values.In this way, we can represent negative synergies.
The (modified) SCG can represent any characteristic function, including characteristic functions that are non-super-additive or even non-monotone.This is because in the worst case, we can explicitly give the value of every coalition.Due to the additional condition, only these explicit values can be used to calculate the characteristic function.
In this case, there exists positive synergy between agents a and b, since v({a, b}) = 3 > v(a) + v(b) = 0 + 0 = 0. On the other hand, there exists negative synergy among agents a, b, and c.This is because v({a, b, c}) = 3 < v({a, b}) + v({c}) = 3 + 1 = 4, which does not satisfy super-additivity.Thus, we cannot divide {a, b, c} into any subcoalitions to calculate v{a, b, c}.Furthermore, when we calculate the value of a coalition including agents a, b, and c, we use the value of {a, b, c}.
For S = {a, b, c, d, e}, the value of S is calculated by v({a, b, c}) Finally, we introduce the concept of a multi-issue domain [5].In a multi-issue domain, there are k independent issues.The overall value of a coalition is the sum of the values of the coalition for individual issues.More specifically, we assume k characteristic functions v 1 , v 2 , . . ., v k such that for any S ⊆ A, v(S) = k i=1 v i (S).If each v i can be represented concisely, then this leads to a concise representation for v.In this paper, we assume that v i is represented by SC G i .

Definition 3 (SCGs in multi-issue domains)
We represent the characteristic function by a vector of SC Gs , where v i is calculated using SC G i .Also, for coalition structure C S, we denote Example 4 Assume four agents a, b, c, and d, and two SC Gs

Partition function games
When externalities exist among coalitions, the value of a coalition depends on the coalition structure to which it belongs.An embedded coalition is a pair (S, C S), where S ∈ C S ∈ Π(A).Denote the set of all embedded coalitions as M, i.e., M := {(S, C S) In this case, there are 4 optimal C Ss.For example, {{a}, {b}, {c, d}} is one optimal C S, and the value of this C S is 6.
The game defined in Example 5 has externalities.In particular, the value of {d} in {{a}, {b}, {c}, {d}} is 3, whereas in {{c}, {d}, {ab}} it is 1.This means that the formation of coalition {a, b} induced a negative externality of 2 on {d}.

Michalak et al. [20] proposed a concise representation of a partition function called embedded MC-nets, which is an extension of MC-nets.
Definition 4 (Embedded MC-nets) An embedded MC-nets consists of set of embedded rules E R. Each embedded rule er ∈ E R has the following form: (L 1 )|(L 2 ), . . ., (L l ) → v er , where each L 1 , L 2 , . . ., L l is a conjunction of literals over A. L 1 , which we call the internal condition, is the condition that must be satisfied in the coalition that receives the value.L 2 , . . ., L l , which we call the external conditions, must be satisfied in other coalitions.We say that embedded rule er is applicable to coalition S in C S if L 1 is applicable to S and each L 2 , . . ., L l is applicable to some coalition S ∈ C S \ {S}.For coalition S, w(S, C S) is given as er ∈E R (S,C S) v er , where E R (S,C S) is the set of embedded rules applicable to S in C S.
Note that for an embedded rule, there exists an implicit constraint such that external conditions must be satisfied in coalitions C S \ {S}.By adding each positive literal in internal condition L 1 to the negative literals of all external conditions L 2 , . . ., L l as well as by adding each positive literal in external conditions L 2 , . . ., L l to the negative literals of internal condition L 1 , we can explicitly represent this implicit constraint.We say an embedded rule is in an explicit form if the above condition is satisfied.For example, if an original rule is For simplicity, in the rest of this paper, we assume each embedded rule is in an explicit form.
Example 6 Assume the following rules.Here, er 1 is an embedded rule.

MIP formulations of coalition structure generation
In this section, we consider coalition structure generation problems, assuming that a characteristic function is given using one of the concise representations introduced in the previous section.For each concise representation, we develop MIP formulations to solve the CSG problem.We also analyze the computational complexity and show that finding an optimal coalition structure is NP-hard.

Marginal contribution nets
We consider CSG problems when a characteristic function is given using MC-nets representations.

Difficulty of handling negative rules
In MC-nets representations, a set of rules corresponds to a coalition structure if each rule in the set is applicable to some coalition in that coalition structure.Thus, if all rules are positive, we can solve CSG problems by solving a reward maximization problem among the rules.However, as shown in the following example, when there exist negative rules handling the negative value rules for CSG problems is a challenging issue.
Example 7 Assume three agents, a, b, and c, and three rules: In this game, R = {r 1 , r 2 } maximizes r ∈R v r and gives 2 + 3 = 5.R is applicable to {{a}, {b, c}}, and there is no such coalition structure other than {{a}, {b, c}}.However, the correct value of this coalition structure is not 5 but 2 + 3 − 3 = 2 since r 3 is also applicable to coalition {a}.In this case, The correct optimal coalition structure is {{a, c}, {b}}, whose value is 3 and a corresponding rule set is {r 2 }.
In general, a negative reward in a reward maximization problem is a pest.When all rules have positive values, choosing a rule never hurts.Thus, we can solve CSG problems by a solver that tries to choose as many rules as possible under some constraints, which only specify the conditions where rules cannot be selected at the same time.If we simply include a negative value rule, the solver just ignores this rule if it is allowed to do so, since choosing it hurts.We must describe the condition under which the solver is forced to choose this negative value rule as a result of choosing several other positive value rules.Since such a condition involves the interaction among multiple rules, it can be quite complicated and difficult to handle efficiently.
To handle negative value rules, we introduce a full transformation approach and a dummy rules approach.We show that the former approach is not scalable and that we can encode the problem as a MIP formulation with the latter approach.

Full transformation approach
One might think that handling negative value rules is unnecessary, since every characteristic function can be represented by only positive value rules as long as no coalition has a negative value.Thus, we introduce an algorithm that we call a full transformation algorithm.We assume that R is divided into two groups: sets of positive value rules R + and negative value rules R − .

Definition 5 (Full transformation algorithm)
The full transformation algorithm is defined as follows: Let us explain the basic ideas of this algorithm.Since we assume that ∀S, v(S) ≥ 0 holds, if negative value rule r x : (L x ) → −v x is applicable to coalition S, there exists at least one positive value rule r i : (L i ) → v i , which is also applicable to S. In other words, r i can partially eliminate the effect of r x .We transform r x and r i into the following three rules: which is added in Step 6, and r 3 : It is obvious that the two original rules, r x and r i , and these three rules are equivalent.Since r 1 and r 3 are non-basic, they must be transformed into multiple basic rules.Using negative value rules can reduce the efforts for describing a characteristic function.In fact, the representation size might significantly increase when we describe a characteristic function by only positive value rules.We here provide an upper bound of the number of transformed rules, although the details of results related to the naïve approach are explained in the Appendix.Consider l agents that are involved in negative value rules in a MC-net.Since l − 1 dummy rules are created to handle the agents, the number does not exponentially increase.The naïve transformation generates an exponential number of rules with respect to the number of original rules.
Before proceeding to the analysis, we restrict our attention to a set of rules that we consider a minimum rule set.Definition 6 (Minimum rule set) Set of rules M is a minimum rule set if -the value of each coalition represented by M is non-negative, -M has at least one negative value rule, -if any positive value rule is excluded from M, the remaining set of rules is not a minimum rule set, and -M is not divided into multiple disjoint minimum rule sets.
From the definition, an arbitrary set of rules M that is not minimum is divided into some disjoint minimum rule set and some positive value rules (that are not minimum).We divide the disjoint minimum rule sets into a collection of rule sets, each of which is a minimum rule set, transform the negative value rules therein into positive value rules, and obtain a rule set with only positive value rules, in conjunction with non-minimum positive value rules, which is equivalent to M.
Let us first consider rule set M with one positive value rule r + and one negative value rule r − .Assume that there exist n agents and define the rules as follows: We show that the upper bound of the number of transformed rules is O(n).We classify the possible coalitions with n agents into those to which each combination of the rules is applicable.In this case, we need to consider two kinds of coalitions: one to which r + and r − are applicable, and another to which only r + is applicable.Let D 1 denote the former set of coalitions and let D 2 denote the latter set.The conditions and values are described as It is clear that we require only a single positive value rule, which is applicable to each element of D 1 .In contrast, since the condition for D 2 , i.e., (L + ∧ ¬L − ), includes the negation of L − , it is the disjunction of the conditions.We need to transform it into the conjunction of the disjoint ones.For example, assume that L − is (a ∧ b ∧ ¬c).We then divide negation ¬L − = (¬a ∨ ¬b ∨ c) into three disjoint conditions, (¬a), (a ∧ ¬b), and (a ∧ b ∧ c).How many conditions we require depends on the number of literals contained in ¬L − .Since ¬L − involves all the agents in the worst case, it is divided into n conditions.Thus, we require O(n) rules to create a rule set with only positive value rules, which is equivalent to M.
Next, consider general rule set M with k positive value rules and s negative value rules.Again assume that there exist n agents and define the rules as follows: Next we show that the upper bound of the number of transformed rules is O(2 (k+s) •n (k+s−1) ).
We classify the possible coalitions with n agents into those to which each combination of the rules is applicable.In this case, the number of their collections is 2 k+s .However, since the values of any coalition are non-negative based on the assumption, we need to exclude 2 s cases where no positive value rules are applied.Thus, we consider 2 k+s − 2 s kinds of coalitions: As well as the case with one positive and one negative value rule, the number of required conditions depends on how many negations are involved in the conditions of D i .Thus, we require O(n k+s−1 ) rules to create a positive value rule set, which is equivalent to the conditions of D i .For all D i , we require O(2 k+s • n k+s−1 ) to create a positive value rule set, which is equivalent to M. Accordingly, this naive transformation generates an exponential number of rules with respect to k and s in the worst case.Our approach with dummy rules which we introduce later is computationally more tractable.

Dummy rules approach
Instead of transforming negative value rules into positive value rules, we introduce the idea of using dummy rules as another way of handling negative value rules that is more concise and efficient.Before introducing dummy rules, we define a feasible set to represent a coalition structure by a set of rules.

Definition 7 (Feasible rule set)
We say set of rules R ⊆ R is feasible if there exists a C S, where each rule r ∈ R is applicable to some S ∈ C S and ∀r − ∈ R − \ R , r − is not applicable to any S ∈ C S.
Clearly, for each coalition structure C S, there exists at least one feasible rule set R ∈ R such that R is applicable to C S and V (C S) = r ∈R v r holds.Thus, the problem of finding C S * is equivalent to finding feasible rule set R to maximize r ∈R v r .

123
In Example 2, {r 2 , r 4 } is feasible because it is applicable to {{a, b, c}, {d, e}} and the value of each rule is not negative.R = {r 1 , r 2 } is also feasible because R is applicable to {{a, b, c, e}{d}}.Since R maximizes r ∈R v r , {{a, b, c, e}{d}} is the optimal coalition structure, whose value is 5. On the other hand, {r 1 , r 2 , r 4 } and {r 2 , r 3 } are infeasible because there is no coalition structure where all the sets of rules are applicable.Let us consider another example that contains negative value rules.
We add dummy rules to directly encode the problem as a MIP formulation as follows.
Definition 8 (Dummy rules (for basic rules)) Assume there exists negative value rule r x : , where L x = a i ∈P x a i ∧ a j ∈N x ¬a j , P x = {a 1 , a 2 , . . .a k }, N x = {a k+1 , a k+2 , . . ., a m }.Dummy rules generated by this negative value rule are of the following two types: We denote D(L x ) as a set of dummy rules created from L x .

Theorem 1 A negative value rule is applicable to a coalition in coalition structure C S if and only if none of its dummy rules are applicable to any coalition in C S.
Proof The condition of a dummy rule can be either a 1 ∧¬a i or a 1 ∧a j .In either case, it is clear that when this dummy rule is applicable to a coalition in C S, the negative value rule is not applicable to any coalition in C S. Also, if all dummy rules are inapplicable to any coalition in C S, it means that a 1 , a 2 , . . ., a k are in identical coalition S, while a k+1 , a k+2 , . . ., a m are not in S. Thus, the negative value rule is applicable to S.
With dummy rules, we can describe the condition where the solver is forced to choose this negative value rule.In brief, we add a constraint where at least one of a negative value rule and the dummy rules created from that rule must be chosen.Note that, from Theorem 1, if no dummy rule created by a negative value rule is chosen, there must exist a coalition such that the negative value rule is applicable, and the solver must choose the rule.
Then we classify the relations between rules to specify the conditions where they cannot be selected at the same time.

Definition 9 (Relation between rules)
The possible relations between two rules, r and r , can be classified into the following four nonoverlapping and exhaustive cases: Compatible on the same coalition: P r ∩ P r = ∅ and P r ∩ N r = P r ∩ N r = ∅.For example, in Example 2, r 1 and r 2 are compatible on the same coalition; if r 1 and r 2 are applicable at the same time, there must be a coalition S with S ⊇ {a, b, c, e} and d / ∈ S. Incompatible: P r ∩ P r = ∅, and (P r ∩ N r = ∅ or P r ∩ N r = ∅).For example, r 2 and r 3 are incompatible; these two rules are not applicable at the same time.Compatible on different coalitions: P r ∩ P r = ∅, and (P r ∩ N r = ∅ or P r ∩ N r = ∅).For example, r 1 and r 4 are compatible on different coalitions; if r 1 and r 4 are applicable at the same time, there must be two different coalitions, S 1 and S 2 , where S 1 ⊇ {b, e} and S 2 ⊇ {c}.Independent: P r ∩ P r = ∅, and P r ∩ N r = P r ∩ N r = ∅.For example, r 1 and r 3 are independent.These two rules can be applied to the same coalition or to different coalitions.
Let us consider a graphical representation of an MC-net in which each vertex is a rule, and between any two vertices, there exists an edge whose type is one of the four cases described above."compatible on the same coalition", "incompatible", "compatible on different coalitions", or "independent".Figure 1 shows the graphical representation of Example 2 ("independent" edges are not shown).
Definition 10 (Consistent) Set of rules R is consistent if it satisfies the following conditions.
(a) R includes no pair of rules/vertices connected by an "incompatible" edge, and (b) if two rules/vertices in R are connected by a "compatible on different coalitions" edge, then they are not reachable via "compatible on the same coalition" edges within R .
Consistency guarantees that set of rules R is applicable to some coalition structure.Let us consider the set of rules in Example 2. Set of rules {r 2 , r 3 } does not satisfy (a) because r 2 and r 3 are connected by an "incompatible" edge.Then let us consider set of rules {r 1 , r 2 , r 4 }.In this case, r 1 and r 4 are connected by a "compatible on different coalitions" edge but they are reachable via r 2 where both r 1 and r 4 are connected to r 2 by "compatible on the same coalition" edges.Thus, {r 1 , r 2 , r 4 } does not satisfy (b).An example of a consistent set of rules is {r 1 , r 3 , r 4 }, which is applicable to coalition structure {{a, d}, {b, e}, {c}}.
Definition 11 (Covering rule set) Set of rules R covers all the negative value rules if, ∀r − ∈ R − , R includes either r − or at least one dummy rule created from r − .By using a notion of consistency and a covering rule set, we can characterize feasible rule sets and the following theorems hold.

Theorem 2 Set of rules R is applicable to some coalition structure if and only if R is consistent.
Proof First, we prove the "if" part.From (a), there exists no incompatible edge within R .From (b), R can be divided into groups G 1 , G 2 , . . ., G k where the rules within G i are reachable from each other by "compatible on the same coalition" edges, there exists no "compatible on different coalitions" edge between the rules in G i , and there exists no "compatible on the same coalition" edge between rules that belong to different groups.
Let us choose C S = {S 1 , S 2 , . . ., S k } so that S i is the union of all positive literals of r ∈ G i .Then, for i = j, S i ∩ S j = ∅ holds.This is because S i ∩ S j = ∅ implies that there exists at least one pair r ∈ G i , r ∈ G j for which r and r are connected by a "compatible on the same coalition" edge (since there cannot be an "incompatible" edge between them).But this contradicts the way in which G 1 , . . ., G k are chosen.Thus, {S 1 , . . ., S k } is a valid coalition structure. 2ext, we show that for any r ∈ G i , r is applicable to coalition S i .Clearly, S i contains all the positive literals of r .It remains to be shown that S i does not contain any negative literals of r .For the sake of contradiction, assume S i contains agent a, where a is a negative literal of r .Then there exists another rule r ∈ G i for which a is a positive literal.There must be a "compatible on different coalitions" or an "incompatible" edge between r and r .Either case leads to a contradiction.Hence, R is applicable to C S.
Next, we prove the "only if" part.If R does not satisfy the above conditions, then there exists no coalition structure where R is applicable.Clearly, if (a) is not satisfied, i.e., some r, r ∈ R are connected by an "incompatible" edge, then there exists no coalition structure where r and r are applicable at the same time.Now, assume (b) is not satisfied, i.e., there exist r i , r j ∈ R such that r i and r j are connected by a "compatible on different coalitions" edge and are reachable by "compatible on the same coalition" edges within R .Assume r i is applicable to coalition S i and r j is applicable to coalition S j .Since r i and r j are connected by a "compatible on different coalitions" edge, S i and S j must be different.However, S i must contain all of the positive literals of the rules reachable from r i via "compatible on the same coalition" edges; otherwise, some rule in R is not applicable.Similarly, S j must contain all the positive literals of rules reachable from r j via "compatible on the same coalition" edges.Since r i and r j are reachable from each other via "compatible on the same coalition" edges, S i and S j must be the same; but this contradicts the fact that they must be different.

Theorem 3 Set of rules R is feasible if it is consistent and covers all the negative value rules. Furthermore, for any feasible rule set R (which does not cover all the negative values), there exists another rule set R (⊇ R
) where r ∈R v r = r ∈R v r and R is consistent and covers all the negative value rules.
Proof First, we prove that if R is consistent and covers all the negative value rules, it is feasible.Since it is consistent, from Theorem 2, there exists coalition structure C S, such that each rule r ∈ R is applicable to some S ∈ C S. Thus, to prove that R is feasible, it suffices to show that ∀r − ∈ R − \ R , r − is not applicable to any S ∈ C S. Since R covers all the negative value rules, for each negative value rule r − ∈ R − \ R , R contains at least one dummy rule created from r − and that rule is applicable to some S ∈ C S. Thus, from Theorem 1, ∀r − ∈ R − \ R , r − is not applicable to any S ∈ C S.
Next, we prove that for any feasible rule set R , there exists rule set R s.t.R ⊇ R , r ∈R v r = r ∈R v r , and R is consistent and covers all the negative value rules.Since R is a feasible rule set, there exists C S, where each rule r ∈ R is applicable to some S ∈ C S and ∀r − ∈ R − \ R , r − is not applicable to any S ∈ C S. Note that R is consistent.Now, for each negative value rule r − ∈ R − \ R , we show that if R does not contain any dummy rule of r − , we can add at least one dummy rule r d to R such that r d is applicable to some coalition in C S, and thus R ∪ {r d } is consistent.We prove this by contradiction; by assuming that for each dummy rule r d of r − , r d is not applicable to any coalition in C S. There exists S ∈ C S such that a 1 ∈ S .From the way dummy rules are created, S contains all the positive literals of r − .If this is not the case, i.e., S does not contain positive literal a i , then the dummy rule (a 1 ∧ ¬a i ) → 0 is applicable to S .Also, S contains no negative literal of r − .If this is not the case, i.e., S contains one negative literal a j , then the dummy rule (a 1 ∧ a j ) → 0 is applicable to S .However, since S contains all of the positive literals of r − and no negative literals of r − , r − is applicable to S .This contradicts the assumption that r − is not applicable to any S ∈ C S. Thus, there exists at least one dummy rule r d such that r d is applicable to some coalition in C S, and thus R ∪ {r d } is consistent.
By continuing to add dummy rules to R , we obtain rule set R that is consistent and covers all the negative value rules.It is clear that for R , r ∈R v r = r ∈R v r holds since the value of a dummy rule is 0.
Hence, for any feasible rule set R , there exists rule set R s.t.R ⊇ R , r ∈R v r = r ∈R v r , and R is consistent and covers all the negative value rules.
From Theorem 3, when considering feasible rule sets, we can restrict our attention to rule sets that are consistent and cover all negative value rules without loss of generality.
Theorem 4 When the characteristic function is represented as an MC-net, finding an optimal coalition structure is NP-hard.Moreover, unless P = N P, there exists no polynomial-time O(|R| 1− ) approximation algorithm for any > 0, where |R| is the number of rules.Proof The maximum independent set problem is to choose V ⊆ V for a graph G = (V, E) such that there exists no edge between vertices in V , and |V | is maximized under this constraint.It is NP-hard, and unless P = N P, there exists no polynomial-time O(|V | 1− ) approximation algorithm for any > 0 [13,45].We reduce an arbitrary maximal independent set instance to a CSG problem instance as follows.For each v ∈ V , let there be agent a v ; also, for each e ∈ E, let there be agent a e .For each v ∈ V , we create a rule r v : ( a i ∈P rv a i ∧ a j ∈N rv ¬a j ) where P r v = {a v }∪{a e : v ∈ e}, N r v = {a w : (v, w) ∈ E}.Thus, rules are "incompatible" if they correspond to the neighboring vertices and "independent" otherwise.It follows that feasible rule sets correspond exactly to the independent sets of vertices.
The reduction in Theorem 4 relies heavily on the "incompatibilities" between rules.If there are no "incompatibilities," then the problem is equivalent to the multi-cut problem [41], which is a generalization of the min-cut problem.Note that even without negative value rules, finding an optimal coalition structure using MC-nets is NP-hard.
A CSG using MC-nets can be modeled as finding a rule set that satisfies the condition in Theorem 3 and maximizes the sum of the values.

Definition 12 (MIP formulation of CSG for MC-nets)
The problem of finding feasible rule set R that maximizes r ∈R v r can be modeled as follows: , where e is an "incompatible" edge, , where e is a "compatible on different coalitions" edge and i < j, dis(e, r i ) = 0, dis(e, r j ) ≥ 1, -(ii) ∀e = (r 1 , r 2 ), where e is a "compatible on the same coalition" edge, dis(e, r 1 ) ≤ dis(e, r 2 ) x(r ) = 1 means that rule r is selected.Constraint (i) ensures that two rules connected by an "incompatible" edge will not be selected at the same time.Also, for each "compatible on different coalitions" edge e = (r i , r j ), we define a distance/potential for e, so that dis(e, r i ) = 0 and dis(e, r j ) ≥ 1 (ii).Constraints (iii) and (iv) ensure that if both r 1 and r 2 are selected, where r 1 and r 2 are connected by a "compatible on the same coalition" edge, then the distance/potential of these two rules for the aforementioned e must be equal.Then the facts that dis(e, r i ) = 0 and dis(e, r j ) ≥ 1 ensure that r i and r j are not reachable from each other via "compatible on the same coalition" edges.Using such a distance/potential is a standard method for representing connectivity constraints in MIP formalization without enumerating possible paths.Constraint (v) ensures that negative value rule r − or at least one dummy rule created from r − is selected.
In this formulation, the number of binary variables equals the number of all the rules including the dummy rules.The number of constraints is d in +d cd (2d cs +1)+|R − |, where d in , d cd , d cs , and |R − | are the number of edges with types "incompatible," "compatible on different coalitions," "compatible on the same coalition," and negative value rules, respectively.

Embedded MC-nets
We extend our method to find an optimal coalition structure when a partition function is represented as an embedded MC-net.
Extending the MIP formulation in Definition 12 to handle embedded MC-nets is rather straightforward.First, we explain how to handle embedded rules whose values are nonnegative.For an embedded rule that has form er : (L 1 )|(L 2 ), . . ., (L l ) → v er , we create the following basic rules: r 1 : (L 1 ) → 0, r 2 : (L 2 ) → 0, …, r l : (L l ) → 0. Assume x(er ), x(r 1 ), . . ., x(r l ) are 0/1 decision variables in the MIP formulation, i.e., when the value is 1, the rule is selected.An objective function is given by er v er •x(er ).Also, we add a constraint where x(er ) can be 1 only when all x(r 1 ), . . ., x(r l ) are 1.Note that such a constraint is not linear.However, there exists a well-known encoding trick to represent such a non-linear constraint in MIP formulation [3].
Next, we introduce dummy rules to handle negative value embedded rules.For a negative value embedded rule, we create dummy rules from each basic rule obtained by the rule.
Definition 13 (Dummy rules (for embedded rules)) Assume there exists negative value embedded rule r x : (L 1 )|(L 2 ), . . ., (L l ) → −v x (v x > 0).Then the dummy rules for r x are L i D(L i ).Note that D(L) is a set of dummy rules created from L.

Theorem 5 A negative value embedded rule is applicable to a coalition with coalition structure C S if and only if all of its dummy rules are not applicable to any coalition in C S.
We omit the proof since it is basically identical to Theorem 1.
Finally, we obtain an extended MIP formulation from Definition 12.

Definition 14 (MIP formulation of CSG using embedded MC-nets)
The problem of finding an optimal coalition structure can be modeled as follows: , where e is an "incompatible" edge, x(r ) + x(r ) ≤ 1, ∀e = (r i , r j ), where e is a "compatible on different coalitions" edge and i < j, dis(e, r i ) = 0, dis(e, r j ) ≥ 1, ∀e = (r 1 , r 2 ), where e is a "compatible on the same coalition" edge, dis(e, r 1 ) ≤ dis(e, r 2 ) ), ∀er , where r 1 , r 2 , . . ., r l are created from er , x(r 1 ) x(er ) ≤ x(r 1 ), …, x(er ) ≤ x(r l ), -(vii) ∀er − , where er − has a negative value and d 1 , . . ., d k are the dummy rules of er − , x(er In the MIP formulation, we add constraints (vi) and (vii) to the MIP formulation in Definition 12 and replace the constraint of the dummy rules as (viii) from (v).Constraints (vi) and (vii) ensure that, for each embedded rule er , er is selected if and only if all of the rules in it are selected.Constraint (viii) ensures that negative value embedded rule er − or at least one dummy rule created from the rules in er − is selected.
In this formulation, the number of binary variables equals the sum of the number of all rules including dummy rules and the number of embedded rules.The number of constraints is are the number of embedded rules and negative value rules, respectively.

Synergy coalition group
In this section, we develop an MIP formulation for finding an optimal coalition structure when a characteristic function is represented as an SC G.We show that when searching for C S * , we need to consider only the coalitions that are explicitly described in SC G.

Theorem 6 There exists coalition structure C S for which
Proof For the sake of contradiction, assume there exists some C S * so that V (C S * ) is strictly larger than any C S that only consists of elements of SC G. Let us examine coalition S ∈ C S * that is not an element of SC G. From the definition of SC G, there exists a partition of S (denoted as p S ) such that v(S) = S i ∈ p S v(S i ), and each S i is an element of SC G.Then, by replacing each such S by p S , we obtain a new coalition structure C S that only consists of elements of SC G, and V (C S) = V (C S * ) holds, so we have the desired contradiction.
Due to Theorem 6, finding C S * is equivalent to a weighted set packing problem: equivalently to the winner determination problem in combinatorial auctions [30], where each agent is an item and each coalition described in SC G is a bid.

Theorem 7
When the characteristic function is represented as an SC G, finding an optimal coalition structure is NP-hard.Moreover, unless P = N P, there exists no polynomial-time O(|SC G| 1− ) approximation algorithm for any > 0.
Proof This follows directly from the corresponding inapproximability for the winner determination problem [30] and the maximum independent set problem [45].
Definition 15 (MIP formulation of CSG for SCG) The problem of finding C S * can be modeled as follows: In this formulation (which corresponds to a standard winner determination formulation), the number of binary variables equals |SC G|, and the number of constraints equals the number of agents.

Multi-issue domain
When there are multiple issues, optimal coalition structure C S * may need to contain a coalition S that is not explicitly described in any SC G i .For example, assume that in issue i, a and b have a strong positive synergy.Also, in issue j, b and c have strong positive synergy.Then coalition {a, b, c} might need to be included in C S * , even though {a, b, c} appears in neither SC G i nor SC G j .
Definition 16 (Value-producing subset) Given coalition structure C S, we say that SC G i (where In Example 4, SC G 1 = {({a, b, c}, 2), ({d}, 0)} and SC G 2 = {({a, b, c}, 2), ({d}, 1)} are value-producing subsets for C S = {{a, b, c}, {d}}.From this definition, value-producing subset SC G i must contain all the agents, and the elements of SC G i must be disjoint.We call a subset that satisfies these conditions a valid subset.

Theorem 8 Valid subset SC G i ⊆ SC G i is a value-producing subset of SC G i for C S if and only if for each S ∈ C S, either one of the following conditions holds
and We omit the proof since it is straightforward from the (modified) definition of SCG.Quite interestingly, we can define the possible relations between elements in SC Gs in the same way as we did for MC-nets.
x( p) = 1 means element p in k i=1 SC G i is selected.This formulation is basically the same as Definition 12, except for constraint (i).This constraint means that for hyper-edge e that connects nodes p 1 , p 2 , . . ., p l , at least one element must be unreachable.The numbers of variables and constraints are basically the same as in the case of the MC-nets.
Theorem 10 When the characteristic function is represented as SC Gs in a multi-issue domain, finding an optimal coalition structure is NP-hard.Moreover, unless P = N P, there exists no polynomial-time O(m 1− ) approximation algorithm for any > 0, where m is the number of elements in SC Gs.
Proof We can use the same proof as Theorem 4.

Settings
We experimentally evaluate our proposed methods.All of the tests were run on a Core i7-4790 processor with 32GB RAM on a Windows 8.1 Pro Edition.We used CPLEX 12.6.1 for solving the integer programming problem instances.
Michalak et al. [21] report that their ODP-IP algorithm can solve problem instances with 25 agents in less than 100 seconds.We cannot directly compare our results with these results since the CSG formalizations are different.Here, we are not comparing the efficiency of particular algorithms, but checking the scalability of different formalizations.Their algorithm inevitably evaluates all of the possible O(2 n ) coalitions.Thus, it is very unlikely that their approaches can scale up to n = 100.On the other hand, the advantage of these approaches is that they do not rely on particular representations.
Let us classify problem instances by how many agents are involved in each element of a compact representation, i.e., a rule in MC-nets or a coalition in SC G.We concentrate on three simple and typical cases that are likely to be observed in practice: (i) each element tends to involve a small number of agents; (ii) each element tends to involve a large number of agents; and (iii) there is no bias on the number of agents in each element, that is, each element involves any number of agents with equal probability.Unfortunately, there exist no widely accepted standard benchmark instances for coalition structure generation problems.Thus, in a similar manner to Iwasaki et al. [15], we randomly generate instances using probability distributions, described as follows.To generate problem instances, we choose one of three distributions, decay, normal, and uniform, and determine the number of agents in each element based on the chosen distribution. 3The instances made by using the decay distribution capture case (i).The normal distribution corresponds to case (ii), and the uniform distribution corresponds to case (iii).They are quite likely to occur in practical situations and are useful to deepen the understanding of the features of our proposed technique, although we admit that this classification is slightly rough.
Let us explain how we construct the base elements for each case.For case (i), using the decay distribution we create elements, e.g., the rules included in MC-nets or the coalitions included in SC G. First, we create a coalition with one randomly chosen agent.Then we repeatedly add a new random agent with probability α until an agent is not added or the element includes all the agents, where α = 0.55.For case (ii), the size of element |S| is drawn from the normal distribution, and then we randomly add agents to the element so that the number of agents who belong to it equals |S|.For case (iii), we use a uniform distribution so that the size of each element |S| is consistent with the uniform distribution over [1, n].Notice that for any of the distributions, the value of each element is drawn from uniform distribution (0, |S| × 10]. For MC-nets, each of the elements with their values corresponds to each rule r and its value v r .We apply each element to rule ( a∈S a) → v r and modify each rule by randomly moving an agent from positive to negative literals with probability p = 0.2.For embedded MC-nets, we further repeatedly add a new condition of a rule (L 1 ) → v e r with probability β = 0.15 until a new one is not added any more.We here create a new condition, i.e., conjunction of literals over A as we construct base elements from each probability distribution.Finally, for both MC-nets and embedded MC-nets, we pick some rule with probability q = 0.2 and convert the positive values drawn from (0, |S| × 10] to negative values [−|S| × 10, 0).Note that there is no problem instance such that some coalition has a negative value computed from the positive and negative value rules, i.e., in all of the generated problem instances, v(S) > 0 holds for all S ⊆ A.
For SCG, each of the elements with values corresponds to each coalition and its value.We apply each element to coalition S with value v(S).As we explained, we generate the sizes of coalitions included in SC G based on each probability distribution and specify the values from the uniform distribution (0, |S| × 10].For SCGs in MID, we create a set of coalitions for each issue that has the identical number of elements.We also fix the number of issues at five. In our experiments, we fix the number of all agents, which we refer to as #agents, and vary the number of elements in a compact representation.We here refer to the number of rules as #rules and the number of coalitions as #coalitions.Because we fix #agents, the characteristic (or partition) functions have the same size across #rules or #coalitions.Thus, the difficulty of each instance is influenced by #rules or #coalitions.
Through the following experiments, we generate 100 problem instances for each combination of cases and the representations show the performance of the geometric average of the instances.Also, we set the time limit to 10 5 msec; if the runtime of the solver (CPLEX) exceeds this time limit, we terminate the execution and exclude this problem instance when calculating the average runtime.

MC-nets and embedded MC-nets
This subsection explores the performance of the standard and embedded MC-nets.Figures 2 and 4 illustrate the average runtimes of 100 problem instances for each distribution on the y-axis.Figures 3 and 5 show the ratio of instances where the optimal coalition structure is obtained within a time limit of 10 5 msec.
For case (i) where rules are generated from the decay distribution, we set #agents = 100 and vary the number of rules #rules in the (embedded) MC-nets from 50 to 150 (the x-axis).Figure 2 shows that the runtimes of MC-nets gradually increase and we can handle instances with up to 150 rules.In particular, when #rules is less than 100, our MIP formulations provide optimal coalition structures within less than 10 4 msec on average and solve every instance within the time limit, as described in Fig. 3.In contrast, when #rules exceeds 100, some instances cannot be solved within the predetermined time limit.The number of unsolved instances increases in #rules.In fact, when #rules = 130, we can solve 80% (82/100) of Turning to the difference between the standard and embedded MC-nets, it is relatively small and is magnified when the number of rules increases.When #rules is 100, the differences are at most 3 × 10 3 msec across #rules, and when it is 150, they reach 10 4 msec.The magnitude of the differences is affected by the number of constraints in Definition 14, which is essentially the number of embedded rules.Because we herein assume that a new embedded rule is added with probability β = 0.15, only 15 ∼ 20 embedded rules are generated for 100 rules.Let us examine cases (ii) and (iii) for the normal and uniform distributions, whose tendencies closely resemble each other.Figure 4 illustrates the runtimes and Fig. 5 shows the ratio of the solvable instances within the time limit.For those cases, we reduce #agents from 100 to 10 and vary #rules in (embedded) MC-nets from 10 to 50.We discuss why the instance size is rather smaller than the case for the decay distributions later.Note that we here use normal distribution N (8, 1) with a mean of 8 and a variance of 1 from which we draw the number of agents involved in each rule.
Figure 4 shows that our MIP formulations for cases (ii) and (iii) take even more time to obtain the solutions than for case (i), although the instance sizes are rather small.In fact, the upper limit of #rules, with which they provide a solution within the time limit, is only 50 for the cases (ii) and (iii), while it reaches 150 for case (i).When #rules = 50, case (i) takes 71 msec to obtain the solutions, but case (ii) requires about 61309 msec.Furthermore, Fig. 5 reveals that many instances are not solvable in a reasonable amount of time.Even with #rules = 30, for both normal and uniform distributions, 34/100 and 5/100 instances could not be solved within the time limit.Especially for the embedded MC-nets, we can solve no instances for the normal distribution when #rules = 50.
Let us briefly discuss why the normal or uniform distribution generates many more difficult instances than the decay distribution.One key reason can be found in the fact that the rules generated from the latter tend to involve fewer agents than those from the former.An arbitrary pair of rules in an instance is less likely to share some agents for the decay distribution than for the normal or uniform distribution.For example, for ten agents, a rule involves approximately three agents for the decay distribution, while one involves approximately eight agents for the normal distribution.Thus, a pair of rules from the decay distribution shares fewer agents than one from the other distributions.Consider our MIP formulations with a set of MC-net rules as a graph.A graph of an instance from the decay distribution is sparser than one from the normal or uniform distribution.In particular, the latter likely constructs a complete graph with many constraints as edges.Therefore, the graphs are much less complicated for case (i) than case (ii) or (iii).To solve cases (ii) or (iii), we need to explore a huge amount of combinations of associated rules.
The other is the sharp increase of the number of the dummy rules.To solve an instance with negative value rules, we must create constraints for each agent involved in each negative value rule.For example, if a rule involves eight agents, we require seven dummy rules for each negative value rule.Since a rule likely involves more agents for the normal or uniform distribution than for the decay distribution, the required number of dummy rules increases, and our MIP formulations face an increasing number of constraints.

SCG and SCGs in MID
This subsection evaluates the performance of SCG and SCGs in MID.First, we explain how well we perform on SCG in our settings.Figure 6 illustrates the average runtimes of 100 problem instances for each distribution on the y-axis.We set #agent = 1000 and vary the number of coalitions #coalitions from 1000 to 10000 (the x-axis).Note that we use normal distribution N (900, 50 2 ) with a mean of 900 and a variance of 50 2 from which we draw the number of agents involved in each element for case (ii).
Figure 6 shows that our MIP formalization can handle SCG instances with up to 10, 000 coalitions.From this result, we can solve instances with more base elements by applying the MIP formalization based on SCG, compared with the MIP formalization based on MC-nets.For example, for the decay distribution (case (i)), SCG takes 908 msec to obtain the solution on average when it handles 10, 000 coalitions (instances made from 10, 000 base elements), while the standard MC-nets takes 1568 msec when it has only ninety rules (instances made from 90 base elements).For the normal and decay distributions (cases (ii) and (iii)), SCG is still easier to solve than MC-nets, although the runtimes are much longer than for case (i).For example, when #coalitions = 10000, cases (ii) and (iii) are performed in 23574 and 26094 msec, while case (i) takes 908 msec.Also, there is only a slight difference between cases (ii) and (iii), which is at most approximately 3000 msec across the number of coalitions.Note that, in those cases, we can solve all the SCG instances within 10 5 msec (the time limit).
These results show that the MIP formulation of SCG is more scalable than that of MCnets with regard to the number of base elements.Note that this does not directly means SCG is better than MC-nets.When a game is represented by MC-nets and SCG, MC-nets tends to be more concise.For example, let us consider a game represented by rules of MC-nets R, where each rule r ∈ R consists of only positive literals.To transform R into SC G, firstly, we need |R| pairs of a coalition and its value, each of which has the form: (P r , v r ), where P r is the set of positive literals and v r is the value of rule r .Next, for each set of rules R that shares some agents in positive literals, we need a pair ( r ∈R P r , r ∈R v r ).For example, assume four agents a, b, c, and d and three rules r 1 : In this case, SC G contains 7 pairs, which is larger than |R|.If some rule in MC-nets contains negative literals, transforming MC-nets to SCG becomes more complicated.Second, let us turn to SCGs in MID. Figure 7 illustrates the average runtimes and Fig. 8 shows the ratio of instances where we obtain the optimal coalition structures within the time limit of 10 5 msec.Recall that we fix the number of issues to five.We let the sizes of the problem instances be smaller than SCG.Precisely, we set #agent = 100 and vary the number of coalitions #coalitions from 50 to 150.The mean and variance for the normal distribution remain unchanged.
Figure 7 reveals that SCGs in MID is easier to solve than MC-nets, but harder than SCG.While, for case (i), the runtimes never exceed 10 4 msec across #coalitions, for cases (ii) and (iii), they take at least 10 4 msec for any #coalitions.Also, for case (i), we can solve all the instances within the time limit.However, for the other two cases, we could not solve some instances, particularly when #coalitions exceeds 90.In fact, for case (ii), when #coalitions is 120, we can solve only 16/100 instances, while for case (iii), when it is 110, we can solve only 29/100 instances.If we further increase the number of coalitions, we could not solve the instances at all.

Conclusion
This paper provides MIP formulations for CSG problems by utilizing four compact representation schemes: for characteristic function games, MC-nets, SCG, and SCGs in MID, and for partition function games, embedded MC-nets.Though we proved that CSG problems under these representations are NP-hard and inapproximable, we could solve instances of significant size by off-the-shelf optimization packages, such as CPLEX and GUROBI.Our simulation reveals that our proposed methods with MC-nets or SCGs in MID solved the problems with 150 rules or coalitions within 10 5 msec, and those with SCG solved the problem with up to 10000 coalitions within 10 5 msec.Future works will develop algorithms (i) that can find an optimal solution more efficiently, (ii) that can return a suboptimal solution in any time, and (iii) that can find an approximate solution quickly by utilizing constraint optimization techniques.
Acknowledgements This research was partially supported by KAKENHI 24220003, 26280081, 15K16058, 17H00761, 16KK0003, 17H01787 and 18H03299.We wish to thank Takato Hasewaga, Naoyuki Hashimoto, and Ryo Ichimura for their research assistance.Original conference papers were partially supported by KAK-ENHI 20240015 and 20240003.Conitzer was supported by NSF award number IIS-0812113, a Research Fellowship from the Alfred P. Sloan Foundation, and a Yahoo!Faculty Research Grant.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix: Transformation algorithms for MC-nets with negative value rules
In the Apendix, we explain the details of results related to the naïve approach.We can guarantee that the full transformation algorithm terminates, i.e., the following theorem holds.

Theorem 11
The full transformation algorithm terminates.
Proof By one iteration of this algorithm, negative value rule r x is eliminated if L x ∧¬L i | ⊥ and v i ≥ v x .If L x ∧ ¬L i | ⊥, a set of negative value rules is added in Step 7, but the conditions of these rules, i.e., L x ∧ ¬L i , are more specific than L x .Also, if v i < v x , a new negative value rule is added in Step 6, but its condition, i.e., L x ∧ L i , is more specific than L x and also disjoint with L x ∧ ¬L i .Furthermore, the value of this rule, i.e., v i − v x , is closer to 0 than original value −v x .Thus, by one iteration of this algorithm, the conditions of the negative value rules become more specific and/or the negative value becomes closer to 0. Therefore, this algorithm cannot be infinitely iterated and will eventually terminate.With the full transformation algorithm, we can eliminate all of the negative value rules.However, this approach is not scalable.There exists an instance where the number of newly generated rules becomes Ω(n 2 ) using the full transformation algorithm.
Example 10 Consider the following rules: This rule set contains k + 1 positive value rules and one negative value rule, where the total number of agents is 2k + 1. Figure 9 shows the number of newly generated rules from these rule sets by varying k.The number of newly generated rules becomes Ω(k 2 ), which is also Ω(n 2 ).
Can we reduce the number of required rules using a more clever encoding trick?No, because the following theorem holds: Theorem 12 To represent the characteristic function in Example 10 only using positive value rules, we need Ω(n 2 ) rules.Proof For all 1 ≤ i < j ≤ k, we denote { p 0 , p 1 , . . ., p k , n i , n j } as S i, j .For S i, j , since only rules r x , r i , r j are applicable, v(S i, j ) equals 1. Assume that set of positive value rules R + represents v.There must be at least one rule in R + that is applicable to S i, j .Represent such a rule as r i, j .Now, we show that r i, j is not applicable to any S i , j , where 1 ≤ i < j ≤ k and i = i ∨ j = j .We derive a contradiction by assuming that r i, j is applicable to S i , j .
When i = i or i = j , consider coalition S = {p 0 , p 1 , . . ., p k , n i }.For S, since only rules r x , r i are applicable, v(S) equals 0. However, we show that r i, j is applicable to S, and thus v(S) cannot be 0. r i, j is not applicable to S, if (i) its positive literals include agent n l , where l = i, or (ii) its negative literals include at least one of { p 0 , p 1 , . . ., p k , n i }.For (i), if l = j, r i, j is not applicable to S i , j .Also, if l = j, r i, j is not applicable to S i, j .For (ii), r i, j is not applicable to either S i, j and S i , j .This contradicts the assumption that r i, j is applicable to both S i, j and S i , j .We can use a similar argument for the cases where j = i or j = j .