Concise integer linear programming formulation for clique partitioning problems

A Clique Partitioning Problem (CPP) finds an optimal partition of a given edge-weighted undirected graph, such that the sum of the weights is maximized. This general graph problem has a wide range of real-world applications, including correlation clustering, group technology, community detection, and coalition structure generation. Although a CPP is NP-hard, due to the recent advance of Integer Linear Programming (ILP) solvers, we can solve reasonably large problem instances by formulating a CPP as an ILP instance. The first ILP formulation was introduced by Grötschel and Wakabayashi (Mathematical Programming, 45(1-3), 59–96, 1989). Recently, Miyauchi et al. (2018) proposed a more concise ILP formulation that can significantly reduce transitivity constraints as compared to previously introduced models. In this paper, we introduce a series of concise ILP formulations that can reduce even more transitivity constraints. We theoretically evaluate the amount of reduction based on a simple model in which edge signs (positive/negative) are chosen independently. We show that the reduction can be up to 50% (dependent of the ratio of negative edges) and experimentally evaluate the amount of reduction and the performance of our proposed formulation using a variety of graph data sets. Experimental evaluations show that the reduction can exceed 50% (where edge signs can be correlated), and our formulation outperforms the existing state-of-the-art formulations both in terms of memory usage and computational time for most problem instances.


Introduction
A Clique Partitioning Problem (CPP) finds an optimal partition of a given edge-weighted undirected graph that maximizes the sum of the weights within clusters [9]. An edge weight can be positive or negative. As the popularity of analyzing complex networks has increased (e.g., the Internet and social networks), CPPs can be found in a variety of important AI applications: Correlation clustering [1]: For given objects (e.g., documents) with similarity scores, we are interested in finding clusters of similar objects (e.g., clusters of documents with the same topic). This problem can be formalized as a CPP where an object/document is a vertex and similarity scores among objects are edge weights.
Group technology [16,19]: Assume we are developing a manufacturing system that requires several parts and machines for processing. The goal is to find a suitable partition of the parts and machines for efficient production. This problem can be formalized as a CPP where each part/machine is a vertex. If part i is processed by machine j , edge (i, j ) has weight 1 (otherwise, its weight is −1). An edge between parts (or machines) has weight 0.
Community detection [2,4,8]: For a given network, we are interested in partitioning the vertices into communities. A community is a subset of vertices densely/positively connected within the same community and sparsely/negatively connected with other communities. Community detection in a network has a long research history (e.g., see the survey by Fortunato [8]). Community detection in bipartite networks, which can be formalized as a special case of a CPP called a modularity maximization problem, is proposed by Brandes et al. [4] and has gathered a significant amount of interest [2,8].
Coalition structure generation [17]: A pair of agents can have positive/negative synergy, e.g., friends who collaborate well or rivals who do not. The goal is to partition agents into groups (coalitions) to maximize the sum of their synergies. This problem can be formalized as a CPP where an agent is a vertex and the synergies among the agents are edge weights.
Although a CPP is NP-hard, due to the recent advance of Integer Linear Programming (ILP) solvers (e.g., CPLEX and Gurobi), we can solve reasonably large problem instances by formulating a CPP as an ILP instance. Grötschel and Wakabayashi [9] proposed the first IPL formulation of a CPP, where the goal was selecting a subset of edges such that the sum of their weights is maximized and satisfies transitivity constraints. The number of transitivity constraints is Θ(n 3 ), where n is the number of vertices. Thus, the growth of the number of transitivity constraints creates a bottleneck for solving larger CPP instances.
To address this scalability issue, several researchers have proposed more concise ILP formulations [3,5,6,11,14,15]. Dinh and Thai [7] proposed a concise ILP formulation for the modularity maximization problem. Inspired by this work, Miyauchi and Sukegawa [12] proposed an ILP formulation for a general CPP by identifying redundant transitivity constraints by focusing on the signs of edge weights.
In this paper, we propose further reductions of the transitivity constraints that utilize the characterization of transitivity based on reachability among vertices. 1 Our method can be used in conjunction with the state-of-the-art method proposed by Miyauchi et al. [13], whose formulation removed significantly more constraints, including non-redundant ones. As a result, the obtained ILP solution can be suboptimal. Thus, they proposed a method for obtaining an optimal solution by post-processing.
We theoretically evaluate the amount of reduction based on a model in which edge signs are chosen independently and show that the ratio of the number of constraints between our proposed and the existing formulations is 50% when the ratio of the non-negative edges approaches 0. Furthermore, we experimentally evaluate the amount of reduction and the performance of our proposed formulation using well-known, real-world graph datasets related to CPP applications. We show that the reduction can exceed 50% when the edge signs can be correlated, and our formulations outperform the existing state-of-the-art formulations in terms of both memory usage and computational time for most problem instances.

Model
A Clique Partitioning Problem (CPP) instance is a complete undirected edge-weighted graph G = (V , E, w) consisting of vertices {1, 2, . . . , n}, n(n − 1)/2 = |E| edges, and weight function w : E → R. For simplicity, we denote We call each V a cluster ( = 1, . . . , p). CPP's goal is to find clique partitioning A to maximize the sum of weights {i,j }∈A w i,j .

Existing concise ILP formulations
Let us introduce the ILP formulation based on Grötschel and Wakabayashi [9]. Decision variable x i,j is introduced for each edge {i, j } ∈ E where i < j. x i,j equals 1 if i and j are in the same cluster, and 0 otherwise.
Transitivity constraints are introduced for any i, j, k ∈ V . If i and j are in the same cluster and j and k are in the same cluster, then i and k must also be in the same cluster. This ILP formulation, called P(G), is defined: where This formulation has n 2 = Θ(n 2 ) variables and 3 n 3 = Θ(n 3 ) transitivity constraints. Miyauchi and Sukegawa [12] proposed a more concise ILP formulation called RP(G), which reduces the number of transitivity constraints. More specifically, we replace (1) in the above formulation with: A transitivity constraint, such that w i,j < 0 ∧ w j,k < 0 holds, is redundant and safely removed. RP(G) has O(nm ≥0 ) constraints, where m ≥0 indicates the number of non-negative weighted edges in G.

Overview of our main idea and contributions
We introduce several terms to explain our main idea. Let x * = (x * i,j ) 1≤i<j ≤n denote an arbitrary optimal solution of an ILP formulation, } denote a set of connected components of (V , E * ). For each ∈ {1, . . . , p}, for a pair of vertices i, j ∈ V (where i < j), they are directly reachable if {i, j } ∈ E * , and nonnegatively, directly reachable if {i, j } ∈ E * ∩ E ≥0 . They also are reachable if a path exists from i to j that consists of edges in E * , and non-negatively reachable if a path exists from i to j that consists of edges in E * ∩ E ≥0 .  Figure 1 illustrates the component. Here 1 and 2 are directly reachable but not non-negatively, directly reachable, and 3 and 4 are non-negatively, directly reachable. 3 and 5 are non-negatively reachable, and 1 and 5 are not non-negatively reachable.
The goal of transitivity constraints is to ensure that the following property holds.
If i and j are reachable, they are directly reachable, i.e., x i,j = 1 must hold.
As shown by Lemma 1, if i and j are reachable, then they are non-negatively reachable. The intuition of this lemma is simple. If i, j ∈ V are not non-negatively connected, by removing only negative weight edges, we can divide V into two parts, contradicting the fact that x * is an optimal solution. For example, in Fig. 1, since vertices 1 and 5 are not non-negatively reachable, the component is not optimal; we can divide V into two parts, i.e., {1} and {2, 3, 4, 5}, by removing only negative weight edge {1, 2}.
Thus, the above goal can be modified to guarantee the following property, which corresponds to Lemma 2: Transitivity constraints must be defined to ensure this property. With these terms, the transitivity constraints in [12] can be translated: For any i < k and for any j s.t. j = i, j = k, if at least one of the following conditions holds, then i and k must be directly reachable (i.e., x i,k = 1 must hold): (i) i and j are directly reachable, and j and k are non-negatively, directly reachable, or (ii) i and j are non-negatively, directly reachable, and j and k are directly reachable.
We argue that having only (i) instead of (i) and (ii) together is sufficient, i.e., (ii) is redundant if we have (i) to derive Lemma 2. Similarly, if we have (ii), (i) is redundant. This fact is verified easily by mathematical induction on the length of the non-negative path from i to k. Assume the length of the path is q, and divide it into two parts: a non-negative path from i to j with length q − 1, and a non-negative edge between j and k (for (i)); or a non-negative edge between i and j , and a non-negative path from j to k with length q − 1 (for (ii)).
In the rest of this section, we show a series of ILP formulations based on this idea, called CP(G), CP * (G), ACP(G), and ACP * (G). Intuitively, we can remove about half of the transitivity constraints. However, since (i) and (ii) can overlap, this intuition is not exact. In the rest of this section, we scrutinize the amount of reduction.

CP(G )
We first present our new concise ILP formulation called CP(G), which is based on RP(G). It is identical to RP(G) (or P(G)) except that we replace (1) with: Here the requirement of (i) is given as 1≤i<j ≤n be an arbitrary optimal solution of CP(G). We first present the following lemma where Proof We prove this lemma in the same manner as the proof of Lemma 1 [12]. It suffices to show that for any partition {S i , S j } of V , where i ∈ S i and j ∈ S j , there exists a non-negative edge in E * ∩ E ≥0 whose one endpoint is in S i and its other is in S j .
From the definition of V , at least one edge exists in E * between S i and S j . Suppose that none of these connecting edges belong to E ≥0 , i.e., their weights are strictly negative. We focus on a CP solution obtained by changing the values of the variables that correspond to these edges from 1 to 0 on x * . This operation resembles partitioning V into S i and S j . This solution is feasible for CP because each transitivity constraint that includes a removed edge is satisfied since at least two terms equal 0. Thus, the objective value of this solution is strictly greater than that of x * . This contradicts the optimality of x * . Lemma 1 argues that if i and j in V are reachable, then they are non-negatively reachable. Next we prove that if i and j are non-negatively reachable, then they are directly reachable. In the following, for notation simplicity, let x {i,j } denote x i,j when i < j, and otherwise x j,i . With this notation, the three inequalities in (1) are merged into one inequality Proof Let i and j be two vertices in V (i < j). According to Lemma 1, there exists path Assume the induction hypothesis where the lemma is true for an arbitrary value of q − 1. Next we must show that Now we are ready to present our main theorem.

Theorem 1 CP(G) and P(G) have the same set of optimal solutions.
Proof It is sufficient to show that any optimal solution x * of CP(G) is feasible for P(G). This is equivalent to the statement that (V , E * ) is complete for each ∈ {1, 2, . . . , p}. This is what Lemma 2 argues.
It is straightforward to show that Theorem 1 holds if we choose (ii) instead of (i) for transitivity constraints.
According to Theorem 1 [12], RP(G) and P(G) have the same set of optimal solutions. Thus, we obtain the following corollary.

Corollary 1 CP(G) and RP(G) have the same set of optimal solutions.
We examine the numbers of constraints in RP(G) and CP(G). Let a denote the probability that an edge has a non-negative weight and formally define it as follows: , the number of constraints in RP(G) and CP(G).

Theorem 2 When the probability in which the weight of each edge is non-negative is independent for each edge, the ratio of the number of constraints in CP(G) against that in RP(G)
can be estimated: Proof We first estimate the number of transitivity constraints in RP(G).
except when both w i,j and w j,k are negative. Thus, this probability is given as Then we estimate the number of transitivity constraints in CP(G).
Therefore, we obtain Furthermore, this ratio approaches 1/2 when a approaches 0, and it approaches 1 when a approaches 1.

CP * (G )
We present another concise ILP formulation called CP * (G), based on RP * (G) [13]. In a real-life network, most edges are commonly zero-weighted. By taking this fact into account, Miyauchi et al. [13] introduced RP * (G), which is identical to RP(G) (or P(G)) except that we replace (1) with: Since RP * (G) reduces non-redundant transitivity constraints, a connected component can be incomplete and the obtained partition can be infeasible for P(G). The following example presented by [13] shows the infeasibility.
Thus, Miyauchi et al. [13] proposed RP * (G)+pp, which first runs RP * (G) and then performs post-processing (pp) for the solution obtained by RP * (G) to recover the feasibility. We can modify RP * (G) using the same idea as CP(G) and obtain CP * (G). Furthermore, by applying the same post-processing procedure pp, we obtain CP * (G)+pp.
In CP * (G), we replace (1) with: The following lemma, which corresponds to Lemma 2 for CP(G), holds for CP * (G). Here vertices i and j are positively reachable if a path exists from i to j that consists of edges inĚ * >0 .
The proof is obtained from that of Lemma 2 by replacing 'non-negatively' with 'positively', E ≥0 with E >0 , and CP(G) with CP * (G).
Here we show that CP * (G)+pp returns an optimal solution to P(G). This follows the proof in Subsection 4.1 in Miyauchi et al. [13], which shows that RP * (G)+pp returns an optimal solution to P(G). For the proof, it suffices to show that the objective value of x * remains identical as that ofx * in P(G) and CP * (G) because x * is feasible for P(G) anď x * is optimal to CP * (G), which is a relaxation of P(G) (i.e., having fewer constraints). For convenience, we define E * in = {{i, j } ∈ E | x * i,j = 1} and E * out = E \ E * in . Note that graph (V , E * in ) consists of p cliques V 1 , V 2 , . . . , V p . We have the following lemmas.

Lemma 4 It holds that
Proof It suffices to show that for any ∈ {1, 2, . . . , p}, it holds thatx * i,j = 1 for each i, j ∈ V where i < j. Since V is weakly connected byĚ * >0 , a path exists onĚ * >0 that connects i and j . This implies that every edge on it has a positive weight; i and j are positively reachable. According to Lemma 3, we conclude that i and j are directly reachable;x * i,j = 1.

Lemma 5 It holds that
Proof For each {i, j } ∈ E * out , we have {i, j } ∈ E * >0 . If otherwise, then x * i,j = 1 and thus {i, j } ∈ E * in . Therefore, for each {i, j } ∈ E * out , we havex * i,j = 0 or w i,j ≤ 0, which proves the lemma.

By Lemmas 4 and 5, we have
Thus, the following theorem holds.

Theorem 3 Any 0-1 vector returned by CP * (G)+pp is optimal to P(G).
To estimate the ratio of |CP * (G)| to |RP * (G)|, we define b as the probability that an edge has a positive weight:

Theorem 4 When the probability in which the weight of each edge is positive is independent
for each edge, the ratio of the number of constraints in CP * (G) compared with that in RP * (G) can be estimated: The proof is almost identical to Theorem 2. The ratio approaches 1/2 when b approaches 0, and it approaches 1 when b approaches 1.

ACP(G )/ACP * (G )
As discussed in Section 4.1, either (i) or (ii) is required for transitivity constraints. We can adaptively switch between (i) or (ii) to minimize the number of transitivity constraints. Such a formulation, which introduces this idea to CP(G), is called Adaptive CP(G) (ACP(G)) and that for CP * (G) is called Adaptive CP * (G) (ACP * (G)).
Let nn(i) denote the number of vertices j ( = i) that satisfy w i,j ≥ 0. In ACP(G), we replace (1) with: ACP * (G) is defined in the same manner based on CP * (G).

Theorem 5 ACP(G) and P(G) have the same set of optimal solutions.
The proof resembles Theorem 1. By Theorem 1 [12] and this theorem, we obtain the following corollary.

Corollary 2 ACP(G) and RP(G) have the same set of optimal solutions.
Compare the number of transitivity constraints in ACP(G) and CP(G). For given i, k, s.t. i < k, the number of transitivity constraints related to i, k, i.e., the number of edges with non-negative weight between k and j , s.t. j = i, j = k, is given by binomial distribution B(n − 2, a). The probability that the number of such edges equals y is given: The expected number of transitivity constraints related to i, k for CP(G) is given as n−2 y=0 y · p(y) = a(n − 2). For ACP(G), we chose the smaller one from two values generated by binomial distribution B(n − 2, a). The probability that the smaller one equals y is given: Then the expected number of transitivity constraints related to i, k for ACP(G) is given as n−2 y=0 y · q(y). When n becomes large and the mean (which equals (n − 2)a) and the variance (which equals (n − 2) a(1 − a)) are sufficiently large, we can approximate binomial distribution B(n − 2, a) by normal distribution N ((n − 2)a, (n − 2)a(1 − a)). Then for each i, k, the difference between the expected number of transitivity constraints for CP(G) and ACP(G) is given as 1 (1 − a). Thus, |CP(G)| − |ACP(G)| is approximately: For ACP * (G), the following theorem holds.

Theorem 6 Any 0-1 vector returned by ACP * (G) + pp is optimal to P(G).
We can prove this theorem based on the same reason as for Theorem 3. We can also estimate the reduction of the transitivity constraints in ACP * (G) against CP * (G) by replacing a with b in (2) for ACP(G).

Experiments
We experimentally evaluated the performance of our new concise ILP formulations with Gurobi package version 9.0.3 with default parameters to solve all the ILP formulations. All problem instances were solved by a workstation equipped with an Intel Xeon Gold 6130 CPU @ 2.10GHz processor with 192GB RAM and Ubuntu 18.04.2 LTS. For each instance and formulation, we set the time limit to four hours.

Problem settings
We used for the experiments four types of problem settings, based on popular CPP application domains. The first three settings are based on real-world datasets from Miyauchi et al. [13]. The last is based on synthetic (randomly generated) graphs, which were used as a benchmark for coalition structure generation algorithms [20]. (S) Coalition structure generation: We adopted the setting of the benchmark problem used in an existing work on coalition structure generation algorithms [20]. For each pair of nodes, an edge exists with probability 95%. With probability 5%, the edge weight is positive. Otherwise (i.e., with probability 95%), it is negative. A positive weight is chosen randomly from a range [1, 100]; a negative weight is chosen randomly from a range [−100, −1]. We set the number of nodes n to 30, 60, 90, and 120 and computed the average over 100 instances for each setting. Table 1 shows the ratio of the non-negative or positive edges for each instance.

Experimental results
We compared our four new concise ILP formulations against two existing ILP formulations: RP(G) and RP * (G). Table 2 shows the number of transitivity constraints contained in the   Table 3 shows the number for RP * (G), CP * (G), and ACP * (G).
In the CP(G), ACP(G), CP * (G), and ACP * (G) columns, the pair of numbers in parentheses shows the ratio of the number of transitivity constraints. The first shows the actual   Table 1. When the difference between the actual ratio and the theoretical one exceeds 10%, the larger one is shown in bold. CP(G) contains slightly more transitivity constraints than the theoretical estimation for (G) as well as C4, C5 and C6. The ratio of CP(G) almost meets the theoretical estimation for We obtained our theoretical estimation under the assumption that the sign of each edge (positive or negative) is determined independently. For real-world datasets, some correlations among edge signs likely exist. Thus, the actual reduction rate and our estimation do not exactly coincide except for synthetic graphs. Table 4 shows the computation times in seconds. For the RP * (G), CP * (G), and ACP * (G) results, we included the computational times for running the post-processing pp. The pp time was negligible in our experiments. In most cases it was less than 0.05 seconds. The Table 4 Computational time in seconds of each formulation Bold numbers indicate the ILP formulation that obtained the best result. OM indicates that we failed to solve the instance due to insufficient memory, which occurs when the problem instance contains more than tens of millions of constraints. We show the relative gaps for problem instances that could be stored on the machine but failed to obtain an optimal solution within the time limit. More specifically, Gurobi returns an upper bound (UB) and a lower bound (LB) of the optimal value when terminated before reaching an optimal solution. We show (U B − LB)/LB. For each instance, the best computation time (or the relative gap) among the formulations is highlighted in bold.
ACP * (G) successfully solved the most instances, i.e., 24 out of 27 in the (P), (G), and (C) categories, among the six formulations; RP(G), CP(G) and ACP(G) solved 18 instances, and RP * (G) and CP * (G) solved 22. There were no instances that ACP * (G) could not solve that the other formulations could. Thus, ACP * (G) is the best formulation with respect to the number of solved instances.
For the computation time, RP * (G) works well on the (G) instances, except G33, even though CP * (G) and ACP * (G) have fewer transitivity constraints. On the other instance categories of (P),(C), and (S), ACP * (G) shows the best performance.
As shown in our example, although reducing redundant constraints effectively shortened for the computational times in general, in some cases (mostly in (G) instances), our new formulations required more computational time than the existing methods, even though our formulations have fewer constraints. Our current conjecture is that even though some constraints are redundant, they can still be useful for constraint propagation and reducing the computational times.

Conclusions
We investigated the ILP formulations for CPP. Based on the existing ILP formulations, we developed a series of new ILP formulations that can significantly reduce transitivity constraints. We characterized the transitivity constraints based on their reachablity among vertices and identified the redundant parts in the existing ILP formulations. All our ILP formulations are guaranteed to obtain optimal clique partitions. Experimental results showed that our new ILP formulations outperformed the existing ILP formulations for most instances. Transitivity constraints are ubiquitous in a variety of problems and their possible formulations. For CPP, we can use MaxSAT encoding instead of an ILP formulation. We would like to apply our idea to reduce transitivity constraints for MaxSAT encoding of CPP, as well as other discrete optimization problems, e.g., [10,18,21].

Conflict of Interests
The authors declare that they have no conflict of interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.