1 Introduction

A Clique Partitioning Problem (CPP) finds an optimal partition of a given edge-weighted undirected graph that maximizes the sum of the weights within clusters [9]. An edge weight can be positive or negative. As the popularity of analyzing complex networks has increased (e.g., the Internet and social networks), CPPs can be found in a variety of important AI applications:

Correlation clustering [1]: For given objects (e.g., documents) with similarity scores, we are interested in finding clusters of similar objects (e.g., clusters of documents with the same topic). This problem can be formalized as a CPP where an object/document is a vertex and similarity scores among objects are edge weights.

Group technology [16, 19]: Assume we are developing a manufacturing system that requires several parts and machines for processing. The goal is to find a suitable partition of the parts and machines for efficient production. This problem can be formalized as a CPP where each part/machine is a vertex. If part i is processed by machine j, edge (i, j) has weight 1 (otherwise, its weight is − 1). An edge between parts (or machines) has weight 0.

Community detection [2, 4, 8]: For a given network, we are interested in partitioning the vertices into communities. A community is a subset of vertices densely/positively connected within the same community and sparsely/negatively connected with other communities. Community detection in a network has a long research history (e.g., see the survey by Fortunato [8]). Community detection in bipartite networks, which can be formalized as a special case of a CPP called a modularity maximization problem, is proposed by Brandes et al. [4] and has gathered a significant amount of interest [2, 8].

Coalition structure generation [17]: A pair of agents can have positive/negative synergy, e.g., friends who collaborate well or rivals who do not. The goal is to partition agents into groups (coalitions) to maximize the sum of their synergies. This problem can be formalized as a CPP where an agent is a vertex and the synergies among the agents are edge weights.

Although a CPP is NP-hard, due to the recent advance of Integer Linear Programming (ILP) solvers (e.g., CPLEX and Gurobi), we can solve reasonably large problem instances by formulating a CPP as an ILP instance. Grötschel and Wakabayashi [9] proposed the first IPL formulation of a CPP, where the goal was selecting a subset of edges such that the sum of their weights is maximized and satisfies transitivity constraints. The number of transitivity constraints is Θ(n3), where n is the number of vertices. Thus, the growth of the number of transitivity constraints creates a bottleneck for solving larger CPP instances.

To address this scalability issue, several researchers have proposed more concise ILP formulations [3, 5, 6, 11, 14, 15]. Dinh and Thai [7] proposed a concise ILP formulation for the modularity maximization problem. Inspired by this work, Miyauchi and Sukegawa [12] proposed an ILP formulation for a general CPP by identifying redundant transitivity constraints by focusing on the signs of edge weights.

In this paper, we propose further reductions of the transitivity constraints that utilize the characterization of transitivity based on reachability among vertices.Footnote 1 Our method can be used in conjunction with the state-of-the-art method proposed by Miyauchi et al. [13], whose formulation removed significantly more constraints, including non-redundant ones. As a result, the obtained ILP solution can be suboptimal. Thus, they proposed a method for obtaining an optimal solution by post-processing.

We theoretically evaluate the amount of reduction based on a model in which edge signs are chosen independently and show that the ratio of the number of constraints between our proposed and the existing formulations is 50% when the ratio of the non-negative edges approaches 0. Furthermore, we experimentally evaluate the amount of reduction and the performance of our proposed formulation using well-known, real-world graph datasets related to CPP applications. We show that the reduction can exceed 50% when the edge signs can be correlated, and our formulations outperform the existing state-of-the-art formulations in terms of both memory usage and computational time for most problem instances.

2 Model

A Clique Partitioning Problem (CPP) instance is a complete undirected edge-weighted graph G = (V, E, w) consisting of vertices {1,2,…,n}, n(n − 1)/2 = |E| edges, and weight function \(w: E\rightarrow \mathbb {R}\). For simplicity, we denote wi, j = w({i, j}) for each {i, j}∈ E. Clearly, wi, j = wj, i. Set A, which is a subset of E, is called clique partitioning if partition \(\{V_{1}, V_{2}, \dots , V_{p}\}\) of V exists such that \(A = \bigcup _{\ell = 1}^{p}\{\{i,j\}\in E | i,j\in V_{\ell }\}\). We call each V a cluster (\(\ell =1,\dots ,p\)). CPP’s goal is to find clique partitioning A to maximize the sum of weights \({\sum }_{\{i,j\}\in A} w_{i,j}\).

3 Existing concise ILP formulations

Let us introduce the ILP formulation based on Grötschel and Wakabayashi [9]. Decision variable xi, j is introduced for each edge {i, j}∈ E where i < j. xi, j equals 1 if i and j are in the same cluster, and 0 otherwise.

Transitivity constraints are introduced for any i, j, kV. If i and j are in the same cluster and j and k are in the same cluster, then i and k must also be in the same cluster. This ILP formulation, called P(G), is defined:

$$ \max \sum\limits_{i=1}^{n-1} \sum\limits_{j=i+1}^{n} w_{i,j}x_{i,j} $$

s.t.

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1& \forall (i,j,k) \in T, \\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1& \forall (i,j,k) \in T, \\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in T, \\ &x_{i,j} \in \{0,1\} &1\leq i <j\leq n, \end{array} $$
(1)

where

$$ {T = \{ (i,j,k) \mid 1\leq i < j < k\leq n\}.} $$

This formulation has \(\left (\begin {array}{c} n \\ 2 \end {array} \right ) = {\varTheta }(n^{2})\) variables and \(3\left (\begin {array}{c} n \\ 3 \end {array} \right ) = {\varTheta }(n^{3})\) transitivity constraints.

Miyauchi and Sukegawa [12] proposed a more concise ILP formulation called RP(G), which reduces the number of transitivity constraints. More specifically, we replace (1) in the above formulation with:

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1 &\forall (i,j,k) \in T_{\ge 0}^{1},\\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1 & \forall (i,j,k) \in T_{\ge 0}^{2},\\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1& \forall (i,j,k) \in T_{\ge 0}^{3},\\ &x_{i,j} \in \{0,1\} &1\leq i <j\leq n, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} T_{\ge 0}^{1} = \{ (i,j,k)\in T: w_{i,j}\ge 0 \vee w_{j,k}\ge 0\},\\ T_{\ge 0}^{2} = \{ (i,j,k)\in T: w_{i,j}\ge 0 \vee w_{i,k}\ge 0\},\\ T_{\ge 0}^{3} = \{ (i,j,k)\in T: w_{j,k}\ge 0 \vee w_{i,k}\ge 0\}. \end{array} $$

A transitivity constraint, such that wi, j < 0 ∧ wj, k < 0 holds, is redundant and safely removed. RP(G) has O(nm≥ 0) constraints, where m≥ 0 indicates the number of non-negative weighted edges in G.

4 New concise ILP formulations

4.1 Overview of our main idea and contributions

We introduce several terms to explain our main idea. Let \(\boldsymbol {x}^{\ast } = (x_{i,j}^{\ast })_{1\leq i< j\leq n}\) denote an arbitrary optimal solution of an ILP formulation, E≥ 0 denote {{i, j}∈ Ewi, j ≥ 0}, and E denote \(\{\{i,j\}\in E \mid x_{{{i,j}}}^{\ast } = 1\}\). Furthermore, let \(\{(V_{1},E_{1}^{\ast }),(V_{2},E_{2}^{\ast }),\dots ,(V_{p},E_{p}^{\ast })\}\) denote a set of connected components of (V, E). For each ∈{1,…,p}, for a pair of vertices i, jV (where i < j), they are directly reachable if \(\{i, j\} \in E_{\ell }^{\ast }\), and non-negatively, directly reachable if \(\{i, j\} \in E_{\ell }^{\ast } \cap E_{\geq 0}\). They also are reachable if a path exists from i to j that consists of edges in \(E_{\ell }^{\ast }\), and non-negatively reachable if a path exists from i to j that consists of edges in \(E_{\ell }^{\ast }\cap E_{\geq 0}\).

Example 1

Consider one connected component \((V_{\ell },E^{\ast }_{\ell })\) where V = {1,2,3,4,5} and \(E^{\ast }_{\ell } = \{\{1,2\},\){2,3},{2,4},{3,4},{4,5}}. Weight function w on \(E^{\ast }_{\ell }\) is as follows: w1,2 = − 1, w2,3 = 0, w2,4 = 1, w3,4 = 0 and w4,5 = 2. Figure 1 illustrates the component. Here 1 and 2 are directly reachable but not non-negatively, directly reachable, and 3 and 4 are non-negatively, directly reachable. 3 and 5 are non-negatively reachable, and 1 and 5 are not non-negatively reachable.

Fig. 1
figure 1

Example of connected component

The goal of transitivity constraints is to ensure that the following property holds.

If i and j are reachable, they are directly reachable, i.e., xi, j = 1 must hold.

As shown by Lemma 1, if i and j are reachable, then they are non-negatively reachable. The intuition of this lemma is simple. If i, jV are not non-negatively connected, by removing only negative weight edges, we can divide V into two parts, contradicting the fact that x is an optimal solution. For example, in Fig. 1, since vertices 1 and 5 are not non-negatively reachable, the component is not optimal; we can divide V into two parts, i.e., {1} and {2,3,4,5}, by removing only negative weight edge {1,2}.

Thus, the above goal can be modified to guarantee the following property, which corresponds to Lemma 2:

If i and j are non-negatively reachable, they are directly reachable.

Transitivity constraints must be defined to ensure this property. With these terms, the transitivity constraints in [12] can be translated:

For any i < k and for any j s.t. ji, jk, if at least one of the following conditions holds, then i and k must be directly reachable (i.e., xi, k = 1 must hold):

(i):

i and j are directly reachable, and j and k are non-negatively, directly reachable, or

(ii):

i and j are non-negatively, directly reachable, and j and k are directly reachable.

We argue that having only (i) instead of (i) and (ii) together is sufficient, i.e., (ii) is redundant if we have (i) to derive Lemma 2. Similarly, if we have (ii), (i) is redundant. This fact is verified easily by mathematical induction on the length of the non-negative path from i to k. Assume the length of the path is q, and divide it into two parts: a non-negative path from i to j with length q − 1, and a non-negative edge between j and k (for (i)); or a non-negative edge between i and j, and a non-negative path from j to k with length q − 1 (for (ii)).

In the rest of this section, we show a series of ILP formulations based on this idea, called CP(G), CP(G), ACP(G), and ACP(G). Intuitively, we can remove about half of the transitivity constraints. However, since (i) and (ii) can overlap, this intuition is not exact. In the rest of this section, we scrutinize the amount of reduction.

4.2 CP(G)

We first present our new concise ILP formulation called CP(G), which is based on RP(G). It is identical to RP(G) (or P(G)) except that we replace (1) with:

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1 &\forall (i,j,k) \in C_{\ge 0}^{1},\\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in C_{\ge 0}^{2},\\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1& \forall (i,j,k) \in C_{\ge 0}^{3},\\ &x_{i,j} \in \{0,1\} &1\leq i <j\leq n, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} C_{\ge 0}^{1} = \{ (i,j,k)\in T: w_{j,k}\ge 0\},\\ C_{\ge 0}^{2} = \{ (i,j,k)\in T: w_{i,k}\ge 0\},\\ C_{\ge 0}^{3} = \{ (i,j,k)\in T: w_{j,k}\ge 0\}.\\ \end{array} $$

Here the requirement of (i) is given as \(C_{\geq 0}^{1}\cup C_{\geq 0}^{2} \cup C_{\geq 0}^{3}\).

Let \(\boldsymbol {x}^{\ast } = (x_{i,j}^{\ast })_{1\leq i< j\leq n}\) be an arbitrary optimal solution of CP(G). We first present the following lemma where \(\{(V_{1},E_{1}^{\ast }),(V_{2},E_{2}^{\ast }),\dots ,(V_{p},E_{p}^{\ast })\}\) denotes the connected components of (V, E) and \(E^{\ast } = \{\{i,j\}\in E \mid x_{{i,j}}^{\ast } = 1\}\).

Lemma 1

\((V_{\ell }, E_{\ell }^{\ast }\cap E_{\geq 0})\) is connected for each ∈{1,…,p}.

Proof

We prove this lemma in the same manner as the proof of Lemma 1 [12]. It suffices to show that for any partition {Si, Sj} of V, where iSi and jSj, there exists a non-negative edge in \(E^{\ast }_{\ell }\cap E_{\geq 0}\) whose one endpoint is in Si and its other is in Sj.

From the definition of V, at least one edge exists in \(E^{\ast }_{\ell }\) between Si and Sj. Suppose that none of these connecting edges belong to E≥ 0, i.e., their weights are strictly negative. We focus on a CP solution obtained by changing the values of the variables that correspond to these edges from 1 to 0 on x. This operation resembles partitioning V into Si and Sj. This solution is feasible for CP because each transitivity constraint that includes a removed edge is satisfied since at least two terms equal 0. Thus, the objective value of this solution is strictly greater than that of x. This contradicts the optimality of x. □

Lemma 1 argues that if i and j in V are reachable, then they are non-negatively reachable. Next we prove that if i and j are non-negatively reachable, then they are directly reachable. In the following, for notation simplicity, let x{i, j} denote xi, j when i < j, and otherwise xj, i. With this notation, the three inequalities in (1) are merged into one inequality x{i, j} + x{j, k}xi, k ≤ 1, where i < k.

Lemma 2

\((V_{\ell }, E_{\ell }^{\ast })\) is complete for each ∈{1,…,p}.

Proof

Let i and j be two vertices in V (i < j). According to Lemma 1, there exists path i = u0, u1, \(\dots ,u_{q} = j\) on \(E_{\ell }^{\ast }\cap E_{\geq 0}\). We prove \(x_{i,j}^{\ast } = 1\) by induction on the path’s length q. (Base case): \(x_{i,j}^{\ast } = x_{u_{0},u_{1}}^{\ast } = 1\) is clearly true. (Inductive step): Assume the induction hypothesis where the lemma is true for an arbitrary value of q − 1. Next we must show that \(x_{i,j}^{\ast } = 1\), that is, \(x^{\ast }_{u_{0},u_{q}} = 1\). According to the induction hypothesis, \(x_{\{u_{0},u_{q-1}\}}^{\ast } = 1\). Since u0 = i < j = uq and the edge between uq− 1 and uq is in E≥ 0, constraint

$$ x_{\{u_{0},u_{q-1}\}} + x_{\{u_{q-1},u_{q}\}} - x_{u_{0},u_{q}} \leq 1 $$

is contained in CP(G). Thus, using \(x_{\{u_{0},u_{q-1}\}}^{\ast } = 1\) and \(x_{\{u_{q-1},u_{q}\}}^{\ast } = 1\), we have \(x_{u_{0},u_{q}}^{\ast } = 1\). □

Now we are ready to present our main theorem.

Theorem 1

CP(G) and P(G) have the same set of optimal solutions.

Proof

It is sufficient to show that any optimal solution x of CP(G) is feasible for P(G). This is equivalent to the statement that \((V_{\ell },E_{\ell }^{\ast })\) is complete for each \(\ell \in \{1,2,\dots ,p\}\). This is what Lemma 2 argues. □

It is straightforward to show that Theorem 1 holds if we choose (ii) instead of (i) for transitivity constraints.

According to Theorem 1 [12], RP(G) and P(G) have the same set of optimal solutions. Thus, we obtain the following corollary.

Corollary 1

CP(G) and RP(G) have the same set of optimal solutions.

We examine the numbers of constraints in RP(G) and CP(G). Let a denote the probability that an edge has a non-negative weight and formally define it as follows:

$$ a = \frac{|\{ \{i,j\}\in E \mid i < j, w_{i,j} \geq 0\}|} {\left( \begin{array}{l} n \\ 2 \end{array} \right)}. $$

Let |RP(G)| and |CP(G)| denote \(|T^{1}_{\geq 0}\cup T^{2}_{\geq 0}\cup T^{3}_{\geq 0}|\) and \(|C^{1}_{\geq 0}\cup C^{2}_{\geq 0}\cup C^{3}_{\geq 0}|\), i.e., the number of constraints in RP(G) and CP(G).

Theorem 2

When the probability in which the weight of each edge is non-negative is independent for each edge, the ratio of the number of constraints in CP(G) against that in RP(G) can be estimated:

$$ \frac{|\mathrm{CP(}G \mathrm{)}|} {|\mathrm{RP(}G \mathrm{)}|}= \frac{1}{2-a}. $$

Proof

We first estimate the number of transitivity constraints in RP(G). |T| = |{(i, j, k)∣1 ≤ i < j < kn}| equals \(\left (\begin {array}{l} n \\ 3 \end {array} \right )\). Tuple (i, j, k) is included in \(T_{\geq 0}^{1}\) except when both wi, j and wj, k are negative. Thus, this probability is given as 1 − (1 − a)2= 2aa2. The same argument is applicable to \(T_{\geq 0}^{2}\) and \(T_{\geq 0}^{3}\). Then \(|T_{\geq 0}^{1}\cup T^{2}_{\geq 0}\cup T_{\geq 0}^{2}|\) can be estimated as \(3(2a-a^{2})\left (\begin {array}{l} n \\ 3 \end {array} \right )\).

Then we estimate the number of transitivity constraints in CP(G). Tuple (i, j, k) is included in \(C^{1}_{\geq 0}\) only when wj, k ≥ 0, which occurs with probability a. Thus, \(|C^{1}_{\geq 0}\cup C^{2}_{\geq 0}\cup C^{3}_{\geq 0}|\) is estimated: \(3a\left (\begin {array}{l} n \\ 3\end {array} \right )\).

Therefore, we obtain

$$ \frac{|\mathrm{CP(}G \mathrm{)}|}{|\mathrm{RP(}G \mathrm{)}|} = \frac{a}{2a-a^{2}} = \frac{1}{2-a}. $$

Since 0 ≤ a ≤ 1,

$$ \frac{1}{2}\leq \frac{|\mathrm{CP(}G \mathrm{)}|}{|\mathrm{RP(}G \mathrm{)}|} \leq 1. $$

Furthermore, this ratio approaches 1/2 when a approaches 0, and it approaches 1 when a approaches 1. □

4.3 CP(G)

We present another concise ILP formulation called CP(G), based on RP(G) [13]. In a real-life network, most edges are commonly zero-weighted. By taking this fact into account, Miyauchi et al. [13] introduced RP(G), which is identical to RP(G) (or P(G)) except that we replace (1) with:

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1 &\forall (i,j,k) \in T_{> 0}^{1},\\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in T_{> 0}^{2},\\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in T_{> 0}^{3},\\ &x_{i,j} \in \{0,1\}& 1\leq i <j\leq n, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} T_{> 0}^{1} = \{ (i,j,k)\in T: w_{i,j}> 0 \vee w_{j,k}> 0\},\\ T_{> 0}^{2}= \{ (i,j,k)\in T: w_{i,j}> 0 \vee w_{i,k}> 0\},\\ T_{> 0}^{3}= \{ (i,j,k)\in T: w_{j,k}> 0 \vee w_{j,k}> 0\}. \end{array} $$

Since RP(G) reduces non-redundant transitivity constraints, a connected component can be incomplete and the obtained partition can be infeasible for P(G). The following example presented by [13] shows the infeasibility.

Example 2

Let G = (V, E, w) with V = {1,2,3,4},w1,2 = 1,w1,3 = w2,3 = − 1, and w1,4 = w2,4 = w3,4 = 0. A 0-1 vector \({\overline {\boldsymbol {x}}}^{\ast } = (\overline {x}_{i,j})^{\ast }\) such that \({\overline {x}}_{1,2}^{\ast } ={\overline {x}}_{1,4}^{\ast } ={\overline {x}}_{2,4}^{\ast } ={\overline {x}}_{3,4}^{\ast } = 1\) and \({\overline {x}}_{1,3}^{\ast } ={\overline {x}}_{2,3}^{\ast } = 0\) is one of the optimal solutions to RP(G); however, transitivity constraint − x1,3 + x3,4 + x1,4 ≤ 1 in P(G) is violated.

Thus, Miyauchi et al. [13] proposed RP(G)+pp, which first runs RP(G) and then performs post-processing (pp) for the solution obtained by RP(G) to recover the feasibility. We can modify RP(G) using the same idea as CP(G) and obtain CP(G). Furthermore, by applying the same post-processing procedure pp, we obtain CP(G)+pp.

In CP(G), we replace (1) with:

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1 &\forall (i,j,k) \in C_{> 0}^{1},\\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in C_{> 0}^{2},\\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in C_{> 0}^{3},\\ &x_{i,j} \in \{0,1\}& 1\leq i <j\leq n, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} C_{> 0}^{1} = \{ (i,j,k)\in T: w_{j,k}> 0\},\\ C_{> 0}^{2} = \{ (i,j,k)\in T: w_{i,k}> 0\},\\ C_{> 0}^{3} = \{ (i,j,k)\in T: w_{j,k}> 0\}. \end{array} $$

The post-processing procedure pp is described as follows. Let \(\boldsymbol {\check {x}}^{\ast } = (\check {x}_{i,j}^{\ast })_{1\leq i<j\leq n}\) denote an optimal solution of CP(G). Let \(\check {E}^{\ast }\) denote \(\{\{i,j\}\in E | \check {x}_{{i,j}}^{\ast } = 1\},\) E> 0 denote {{i, j}∈ E|wi, j > 0}, and \(\check {E}_{>0}^{\ast }\) denote \(\check {E}^{\ast }\cap E_{>0}\). Now let us consider a new graph \(G_{new} = (V, \check {E}_{>0}^{\ast })\). We partition V into a set of weakly connected components \(\{(V_{1},\check {E}_{1}^{\ast }),(V_{2},\check {E}_{2}^{\ast }),\dots ,(V_{p},\check {E}_{p}^{\ast })\}\) of Gnew by the depth-first search and outputs 0-1 vector x that corresponds to partition \(\{V_{1},V_{2},\dots ,V_{p}\}\), i.e., x, such that \(x_{{i,j}}^{\ast } = 1\) if and only if i, jV for some \(\ell \in \{1,2,\dots ,p\}\). Note that this procedure runs in time linear with the size of G.

The following lemma, which corresponds to Lemma 2 for CP(G), holds for CP(G). Here vertices i and j are positively reachable if a path exists from i to j that consists of edges in \(\check {E}_{>0}^{\ast }\).

Lemma 3

\((V_{\ell }, \check {E}^{\ast }_{{\ell }})\) is complete for each ∈{1,…,p}.

The proof is obtained from that of Lemma 2 by replacing ‘non-negatively’ with ‘positively’, E≥ 0 with E> 0, and CP(G) with CP(G).

Here we show that CP(G)+pp returns an optimal solution to P(G). This follows the proof in Subsection 4.1 in Miyauchi et al. [13], which shows that RP(G)+pp returns an optimal solution to P(G). For the proof, it suffices to show that the objective value of x remains identical as that of \(\boldsymbol {\check {x}}^{\ast }\) in P(G) and CP(G) because x is feasible for P(G) and \(\boldsymbol {\check {x}}^{\ast }\) is optimal to CP(G), which is a relaxation of P(G) (i.e., having fewer constraints). For convenience, we define \(E_{in}^{\ast } = \{\{i,j\}\in E | x_{{i,j}}^{\ast }= 1\}\) and \(E_{out}^{\ast } = E\setminus E_{in}^{\ast }\). Note that graph \((V,E_{in}^{\ast })\) consists of p cliques \(V_{1},V_{2},\dots , V_{p}\).

We have the following lemmas.

Lemma 4

It holds that \(\sum \limits _{\{i,j\}\in {E^{\ast }}_{in}} w_{i,j}x_{{i,j}}^{\ast }= \sum \limits _{\{i,j\}\in {E^{\ast }}_{in}} w_{i,j}\check {x}_{{i,j}}^{\ast }\).

Proof

It suffices to show that for any \(\ell \in \{1,2,\dots ,p\}\), it holds that \(\check {x}^{\ast }_{i,j} = 1\) for each i, jV where i < j. Since V is weakly connected by \(\check {E}_{>0}^{\ast }\), a path exists on \(\check {E}_{>0}^{\ast }\) that connects i and j. This implies that every edge on it has a positive weight; i and j are positively reachable. According to Lemma 3, we conclude that i and j are directly reachable; \(\check {x}_{i,j}^{\ast } =~1\). □

Lemma 5

It holds that \(\sum \limits _{\{i,j\}\in {E^{\ast }}_{out}} w_{i,j} \check {x}_{{i,j}}^{\ast } \leq 0\).

Proof

For each \(\{i,j\}\in E^{\ast }_{out}\), we have \(\{i,j\}\not \in E^{\ast }_{>0}\). If otherwise, then \(x^{\ast }_{{i,j}} = 1\) and thus \(\{i,j\}\in E^{\ast }_{in}\). Therefore, for each \(\{i,j\}\in E^{\ast }_{out}\), we have \(\check {x}^{\ast }_{{i,j}} = 0\) or wi, j ≤ 0, which proves the lemma. □

By Lemmas 4 and 5, we have \({\sum }_{\{i,j\}\in E} w_{i,j}x^{\ast }_{{i,j}} = {\sum }_{\{i,j\}\in E^{\ast }_{in}} w_{i,j}x^{\ast }_{{i,j}} \geq {\sum }_{\{i,j\}\in E^{\ast }_{in}} w_{i,j}\check {x}^{\ast }_{{i,j}} + {\sum }_{\{i,j\}\in E^{\ast }_{out}} w_{i,j}\check {x}^{\ast }_{{i,j}} = {\sum }_{\{i,j\}\in E} w_{i,j}\check {x}^{\ast }_{{i,j}}\). Thus, the following theorem holds.

Theorem 3

Any 0-1 vector returned by CP(G)+pp is optimal to P(G).

To estimate the ratio of |CP(G)| to |RP(G)|, we define b as the probability that an edge has a positive weight:

$$ b = \frac{|\{\{i,j\} \in E \mid i < j, w_{i,j} > 0\}|} {\left( \begin{array}{l} n \\2 \end{array} \right)}. $$

Theorem 4

When the probability in which the weight of each edge is positive is independent for each edge, the ratio of the number of constraints in CP(G) compared with that in RP(G) can be estimated:

$$ \frac{|\text{CP}^{\ast}(G)|} {|\text{RP}^{\ast}(G) |}= \frac{1}{2-b}. $$

The proof is almost identical to Theorem 2. The ratio approaches 1/2 when b approaches 0, and it approaches 1 when b approaches 1.

4.4 ACP(G)/ACP(G)

As discussed in Section 4.1, either (i) or (ii) is required for transitivity constraints. We can adaptively switch between (i) or (ii) to minimize the number of transitivity constraints. Such a formulation, which introduces this idea to CP(G), is called Adaptive CP(G) (ACP(G)) and that for CP(G) is called Adaptive CP(G) (ACP(G)).

Let nn(i) denote the number of vertices j(≠i) that satisfy wi, j ≥ 0. In ACP(G), we replace (1) with:

$$ \begin{array}{@{}rcl@{}} &x_{i,j} + x_{j,k} - x_{i,k} \le 1 &\forall (i,j,k) \in F_{\geq 0}^{1}\cup F_{\geq 0}^{2},\\ &x_{i,j} - x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in F_{\geq 0}^{3}\cup F_{\geq 0}^{4},\\ &- x_{i,j} + x_{j,k} + x_{i,k} \le 1 &\forall (i,j,k) \in F_{\geq 0}^{5}\cup F_{\geq 0}^{6},\\ &x_{i,j} \in \{0,1\} &1\leq i <j\leq n, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} F_{\geq 0}^{1} = \{ (i,j,k)\in T : w_{j,k}\geq 0 \wedge nn(i) > nn(k)\},\\ F_{\geq 0}^{2} = \{ (i,j,k)\in T : w_{i,j}\geq 0 \wedge nn(i) \leq nn(k)\},\\ F_{\geq 0}^{3} = \{ (i,j,k)\in T : w_{i,k}\geq 0 \wedge nn(j) > nn(k)\},\\ F_{\geq 0}^{4} = \{ (i,j,k)\in T : w_{i,j}\geq 0 \wedge nn(j) \leq nn(k)\},\\ F_{\geq 0}^{5} = \{ (i,j,k)\in T : w_{j,k}\geq 0 \wedge nn(i) > nn(j)\},\\ F_{\geq 0}^{6} = \{ (i,j,k)\in T : w_{i,k}\geq 0 \wedge nn(i) \leq nn(j)\}. \end{array} $$

ACP(G) is defined in the same manner based on CP(G).

Theorem 5

ACP(G) and P(G) have the same set of optimal solutions.

The proof resembles Theorem 1. By Theorem 1 [12] and this theorem, we obtain the following corollary.

Corollary 2

ACP(G) and RP(G) have the same set of optimal solutions.

Compare the number of transitivity constraints in ACP(G) and CP(G). For given i, k, s.t. i < k, the number of transitivity constraints related to i, k, i.e., the number of edges with non-negative weight between k and j, s.t. ji, jk, is given by binomial distribution B(n − 2,a). The probability that the number of such edges equals y is given:

$$ p(y) = \left( \begin{array}{l} n-2 \\ y \end{array} \right)a^{y}(1-a)^{n-2-y}. $$

The expected number of transitivity constraints related to i, k for CP(G) is given as \({\sum }_{y=0}^{n-2} y\cdot p(y) = a(n-2)\). For ACP(G), we chose the smaller one from two values generated by binomial distribution B(n − 2,a). The probability that the smaller one equals y is given:

$$ q(y) = (p(y))^{2} + 2p(y)\sum\limits_{z = y+1}^{n-2}p(z). $$

Then the expected number of transitivity constraints related to i, k for ACP(G) is given as \({\sum }_{y=0}^{n-2} y\cdot q(y)\).

When n becomes large and the mean (which equals (n − 2)a) and the variance (which equals (n − 2)a(1 − a)) are sufficiently large, we can approximate binomial distribution B(n − 2,a) by normal distribution N((n − 2)a,(n − 2)a(1 − a)). Then for each i, k, the difference between the expected number of transitivity constraints for CP(G) and ACP(G) is given as \(\frac {1}{\sqrt {\pi }}\) of standard deviation \(\sqrt {(n-2)a(1 - a)}\).

Thus, |CP(G)| − |ACP(G)| is approximately:

$$ \frac{(n-1)n \sqrt{(n-2)a(1 - a)}}{2\sqrt{\pi}}. $$
(2)

For ACP(G), the following theorem holds.

Theorem 6

Any 0-1 vector returned by ACP(G) + pp is optimal to P(G).

We can prove this theorem based on the same reason as for Theorem 3. We can also estimate the reduction of the transitivity constraints in ACP(G) against CP(G) by replacing a with b in (2) for ACP(G).

5 Experiments

We experimentally evaluated the performance of our new concise ILP formulations with Gurobi package version 9.0.3 with default parameters to solve all the ILP formulations. All problem instances were solved by a workstation equipped with an Intel Xeon Gold 6130 CPU @ 2.10GHz processor with 192GB RAM and Ubuntu 18.04.2 LTS. For each instance and formulation, we set the time limit to four hours.

5.1 Problem settings

We used for the experiments four types of problem settings, based on popular CPP application domains. The first three settings are based on real-world datasets from Miyauchi et al. [13]. The last is based on synthetic (randomly generated) graphs, which were used as a benchmark for coalition structure generation algorithms [20].

(P) Correlation clustering:

For instance G = (V, E, w), E = E>E<E0 where E>, E< and E0 are mutually disjointed. For each {i, j}∈ E, we set wi, j = 1 if {i, j}∈ E>, wi, j = − 1 if {i, j}∈ E<, and wi, j = 0 if {i, j}∈ E0. These instances were generated from a protein sequence dataset, http://www.paccanarolab.org/scps. The number of variables always equals(n 2 ) = n(n− 1)/2.

(G) Group technology:

Instance G = (V, E, w) is an undirected bipartite graph. V is divided into two sets: p parts and q machines. For edge {i, j}∈ E between part i and machine j, wi, j = 1 if i is proceeded by machine j and wi, j = 0 otherwise. These instances are generated based on a manufacturing cell formation dataset, http://mauricio.resende.info/data.

(C) Community detection:

Instance G = (V, E, w) is based on a bipartite modularity network. V is divided into two sets, VR and VL, so that each edge has one end point in VL and another in VR. Edge {i, j}∈ E between iVL and jVR has weight Aij |E| −didj |E|2, where Aij is the (i, j) component of the adjacency matrix of G and di is the degree of iN.

(S) Coalition structure generation:

We adopted the setting of the benchmark problem used in an existing work on coalition structure generation algorithms [20]. For each pair of nodes, an edge exists with probability 95%. With probability 5%, the edge weight is positive. Otherwise (i.e., with probability 95%), it is negative. A positive weight is chosen randomly from a range [1,100]; a negative weight is chosen randomly from a range [− 100,− 1]. We set the number of nodes n to 30, 60, 90, and 120 and computed the average over 100 instances for each setting.

Table 1 shows the ratio of the non-negative or positive edges for each instance.

Table 1 Details of instances used in experiments

5.2 Experimental results

We compared our four new concise ILP formulations against two existing ILP formulations: RP(G) and RP(G). Table 2 shows the number of transitivity constraints contained in the ILP formulations for P(G), RP(G), CP(G), and ACP(G), and Table 3 shows the number for RP(G), CP(G), and ACP(G).

Table 2 Number of constraints: P(G), RP(G), CP(G), and ACP(G)
Table 3 Number of constraints: RP(G), CP(G), and ACP(G)

In the CP(G), ACP(G), CP(G), and ACP(G) columns, the pair of numbers in parentheses shows the ratio of the number of transitivity constraints. The first shows the actual ratio: |CP(G)|/|RP(G)| for the CP(G) column, |ACP(G)/|RP(G)| for the ACP(G) column, |CP(G)|/|RP(G)| for the CP(G) column, and |ACP(G)|/|RP(G)| for the ACP(G) column. The second shows the theoretical ratio estimated by Theorems 2 and 4 or the approximation method presented in Section 4.4. These estimations are based on the ratios in Table 1. When the difference between the actual ratio and the theoretical one exceeds 10%, the larger one is shown in bold.

CP(G) contains slightly more transitivity constraints than the theoretical estimation for (G) as well as C4, C5 and C6. The ratio of CP(G) almost meets the theoretical estimation for (P) and (S). On the other hand, the ratio of ACP(G) almost meets the theoretical estimation for all the instances. The maximum difference between the actual and theoretical ratios is five percent. Note that ACP(G) eliminates more transitivity constraints than CP(G) for all the instances.

CP(G) has almost the same or fewer constraints than the theoretical estimation for all the instances except for C2 and C3. ACP(G) significantly reduces the redundant transitivity constraints compared to RP(G) (by more than 60% in most instances). Similar to CP(G), the ratio of CP(G) is close to the theoretical estimation for synthetic graphs (S). Note that ACP(G) eliminates more transitivity constraints than CP(G) for all the instances.

We obtained our theoretical estimation under the assumption that the sign of each edge (positive or negative) is determined independently. For real-world datasets, some correlations among edge signs likely exist. Thus, the actual reduction rate and our estimation do not exactly coincide except for synthetic graphs.

Table 4 shows the computation times in seconds. For the RP(G), CP(G), and ACP(G) results, we included the computational times for running the post-processing pp. The pp time was negligible in our experiments. In most cases it was less than 0.05 seconds. The longest pp time was 0.15 seconds for C6, which has the most nodes (i.e., 2,549 nodes) in our experiments.

Table 4 Computational time in seconds of each formulation

Bold numbers indicate the ILP formulation that obtained the best result. OM indicates that we failed to solve the instance due to insufficient memory, which occurs when the problem instance contains more than tens of millions of constraints. We show the relative gaps for problem instances that could be stored on the machine but failed to obtain an optimal solution within the time limit. More specifically, Gurobi returns an upper bound (UB) and a lower bound (LB) of the optimal value when terminated before reaching an optimal solution. We show (UBLB)/LB. For each instance, the best computation time (or the relative gap) among the formulations is highlighted in bold.

ACP(G) successfully solved the most instances, i.e., 24 out of 27 in the (P), (G), and (C) categories, among the six formulations; RP(G), CP(G) and ACP(G) solved 18 instances, and RP(G) and CP(G) solved 22. There were no instances that ACP(G) could not solve that the other formulations could. Thus, ACP(G) is the best formulation with respect to the number of solved instances.

For the computation time, RP(G) works well on the (G) instances, except G33, even though CP(G) and ACP(G) have fewer transitivity constraints. On the other instance categories of (P),(C), and (S), ACP(G) shows the best performance.

As shown in our example, although reducing redundant constraints effectively shortened for the computational times in general, in some cases (mostly in (G) instances), our new formulations required more computational time than the existing methods, even though our formulations have fewer constraints. Our current conjecture is that even though some constraints are redundant, they can still be useful for constraint propagation and reducing the computational times.

6 Conclusions

We investigated the ILP formulations for CPP. Based on the existing ILP formulations, we developed a series of new ILP formulations that can significantly reduce transitivity constraints. We characterized the transitivity constraints based on their reachablity among vertices and identified the redundant parts in the existing ILP formulations. All our ILP formulations are guaranteed to obtain optimal clique partitions. Experimental results showed that our new ILP formulations outperformed the existing ILP formulations for most instances. Transitivity constraints are ubiquitous in a variety of problems and their possible formulations. For CPP, we can use MaxSAT encoding instead of an ILP formulation. We would like to apply our idea to reduce transitivity constraints for MaxSAT encoding of CPP, as well as other discrete optimization problems, e.g., [10, 18, 21].