1 Introduction

Graph partition problems have gained importance recently due to their applications in the area of engineering and computer science such as telecommunication [16] and parallel computing [12]. The solution of a graph partition problem would serve to partition the vertices of a graph G(VE) into several groups under certain constraints for capacity or cardinality in each group. The optimal solution is expected to have the smallest total weight of cut edges. This problem is NP-complete [8]. Previous studies worked on improving quadratic programming or linear programming formulations to reduce the computational expense with commercial solvers [6]. Relaxations are also used to approximate this problem. Garey et al. [8] were the first to use eigenvalue and eigenvector information to get relaxations for graph partition. Ghaddar et al. [9] have recently used a branch-and-cut algorithm based on SDP relaxations to compute global optimal solutions of k-partition problems.

The k-equipartition problem, which is to find a partition with the minimal total weight of cut edges and the vertex set V is equally partitioned into k groups, is one of the most popular graph partition problems. Another problem that interests us is the graph partition problem under knapsack constraints (GPKC). In the GPKC, each vertex of the graph network is assigned a weight and the knapsack constraint needs to be satisfied in each group.

Lisser and Rendl [16] compared various semidefinite programming (SDP) relaxations and linear programming (LP) relaxations for k-equipartition problems. They showed that the nonnegativity constraints become dominating in the SDP relaxation when k increases. However, the application of this formulation is limited by the computing power of SDP solvers, because adding all the sign constraints for a symmetric matrix of dimension n causes \(O(n^2)\) new constraints and entails a huge computational burden especially for large instances. Nguyen [19] proposed a tight LP relaxation for GPKC problems and a heuristic to build upper bounds as well. Semidefinite programming has shown advantages in generating tight lower bounds for quadratic problems with knapsack constraints [11] and k-equipartition problem, but so far there have been no attempts to apply SDP relaxations to GPKC.

Algorithms for solving SDPs have been intensively studied in the previous years. Malick et al. [18] designed the boundary point method to solve SDP problems with equations. It can solve instances with a huge number of constraints that interior point methods (IPMs) fail to solve. This method falls into the class of alternating direction method of multipliers (ADMM). ADMM has been studied in the area of convex optimization and been proved of linear convergence when one of the objective terms is strongly convex [20]. In recent years, there have been studies focusing on generalizing this idea on solving convex optimizations with more blocks of variables. Chen et al. [3], for example, have proved the convergence of 3-block ADMM on certain scenarios, but the question as to whether the direct extension 3-block ADMM on SDP problems is convergent is still open.

There have also been varied ideas about combining other approaches with ADMM for solving SDP problems. De Santis et al. [4] added the dual factorization in the ADMM update scheme, while Sun et al. [23] combined ADMM with Newton’s methods. Both attempts have improved the performance of the algorithms.

1.1 Main results and outline

In this paper, we will introduce an extended ADMM algorithm and apply it to the tight SDP relaxations for graph partition problems with nonnegativity constraints. We will also introduce heuristics to obtain a feasible partition from the solution of the SDP relaxation.

This paper is structured as follows. In Sect. 2, we will introduce two graph partition problems, the k-equipartition problem and the graph partition problems with knapsack constraints (GPKC). We will discuss different SDP relaxations for both problems. In Sect. 3, we will design an extended ADMM and illustrate its advantages in solving large SDP problems with nonnegativity constraints. In Sect. 4, we will introduce two post-processing methods used to generate lower bounds using the output from the extended ADMM. In Sect. 5, we will design heuristics to build a tight upper bound from the SDP solution to address the original problem. Numerical results of experiments carried out on graphs with different sizes and densities will be presented in Sect. 6. Section 7 concludes the paper.

1.2 Notation

We define by \(e_n\) the vector of all ones of length n, by \({\mathbf {0}}_n\) the vector of all zeros of length n and by \({\mathbf {0}}_{n\times n}\) the square matrix of all zeros of dimension n. We omit the subscript in case the dimension is clear from the context. The notation [n] stands for the set of integers \(\{1,\dots ,n\}\). Let \({{\mathcal {S}}}^n\) denote the set of all \(n\times n\) real symmetric matrices. We denote by \(M\succeq 0\) that the matrix M is positive semidefinite and let \({{\mathcal {S}}}_+^n\) be the set of all positive semidefinite matrices of order \(n\times n\). We denote by \({\langle \cdot ,\cdot \rangle }\) the trace inner product. That is, for any \(M, N \in {\mathbb {R}}^{n\times n}\), we define \({\langle M,N \rangle }:= \text {trace}(M^\top N )\). Its associated norm is the Frobenius norm, denoted by \(\Vert M\Vert _F := \sqrt{\text {trace}(M^\top M )}\). We denote by \(\text {diag}(M)\) the operation of getting the diagonal entries of matrix M as a vector. The projection on the cone of positive semidefinite matrices is denoted by \({\mathcal {P}}_{\succeq 0}(\cdot )\). The projection onto the interval [LU] is denoted by \({\mathcal {P}}_{[L,U]}\). We denote by \(\lambda (\cdot )\) the eigenvalues. That is, for any \(M \in {\mathbb {R}}^{n\times n}\), we define \(\lambda (M)\) the set of all eigenvalues of M. Also, we denote \(\lambda _{\max }(\cdot )\) the largest eigenvalue. We denote by \(x \sim U(0,1)\) a variable x from uniform distribution between 0 and 1. We define by \(\text {argmaxk}(\cdot ,s)\) the index set of the s largest elements.

2 Graph partition problems

2.1 k-equipartition problem

For a graph G(VE), the k-equipartition problem is the problem of finding an equipartition of the vertices in V with k groups that has the minimal total weight of edges cut by this partition. The problem can be described with binary variables,

$$\begin{aligned} \begin{aligned} \min&~ \frac{1}{2}\langle L,Y Y^\top \rangle \\ \text {s.t.}~&Y e_k = e_n, \\&Y^\top e_n = m e_k,\\&Y_{ij} \in \{ 0, 1 \}, \forall i \in [n], j\in [k], \end{aligned} \end{aligned}$$
(1)

where L is the Laplacian matrix for G, variable \(Y \in {\mathbb {R}}^{n\times k}\) indicates which group each vertex is assigned to and \(e_n\) (resp. \(e_k\)) is the all-one vector of dimension n (resp. k).

This problem is NP-hard and Lisser and Rendl [16] proposed the SDP relaxation

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2}\langle L,X \rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X e = m e, \\&X \succeq 0, \end{aligned} \end{aligned}$$
(2)

where \(X \in {\mathcal {S}}_n\), and \(e \in {\mathbb {R}}^n\) is the all-one vector.

To tighten this SDP relaxation, we can add more inequalities to problem (2). Here, we introduce two common inequalities for SDP relaxations derived from 0/1 problems.

The process of relaxing \(Y Y^\top\) to X implies that X is a nonnegative matrix, hence the first group of inequalities we consider is \(X \ge 0\) and the corresponding new SDP relaxation is

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2}\langle L,X \rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X e = me,\\&X \succeq 0,\\&X \ge 0. \end{aligned} \end{aligned}$$
(3)

This kind of SDP is also called a doubly nonnegative program (DNN) since the matrix variable is both, positive semidefinite and elementwise nonnegative.

Another observation is the following. For any vertices triple (ijk), if vertices i and j are in the same group, and vertices j and k are in the same group, then vertices i and k must be in the same group. This can be modeled by the transitivity constraints [16] given as follows

$$\begin{aligned} \text {MET}:= \{X=(X_{ij}) \mid X_{ij}+ X_{ik} \le 1 + X_{jk},\forall i,j,k \in [n]\}. \end{aligned}$$

The set formed by these inequalities is the so-called metric polytop. Adding the transitivity constraints to the SDP relaxation (3) gives

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2}\langle L, X \rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X e = m e,\\&X \succeq 0,\\&X \ge 0,\\&X \in \text {MET}. \end{aligned} \end{aligned}$$
(4)

2.2 Graph partition problem under knapsack constraints (GPKC)

Given a graph G(VE) with nonnegative weights on the vertices and a capacity bound W, the GPKC asks to partition the vertices such that the total weight of cut edges is minimized and the total weight of vertices in each group does not exceed the capacity bound W.

A mathematical programming formulation is given as

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2} \langle L, YY^\top \rangle \\ \text {s.t.}~&Y e_n = e_n,\\&Y^\top a \le We_n,\\&Y_{ij} \in \{0,1\}^{n\times n},\forall i \in [n], j\in [n], \end{aligned} \end{aligned}$$
(5)

where \(Y\in {\mathbb {R}}^{n\times n}\), \(a\in {\mathbb {R}}^n\) is the vertex weight vector and W is the capacity bound. We assume \(a_i \le W~\forall i \in [n]\), otherwise the problem is infeasible. Again, we can derive the SDP relaxation

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2}\langle L,X\rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X a\le W e, \\&X \succeq 0. \end{aligned} \end{aligned}$$
(6)

Similar as the k-equipartition problem, we can tighten the relaxation by imposing sign constraints, i.e.,

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2} \langle L,X\rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X a\le W e,\\&X \succeq 0,\\&X \ge 0, \end{aligned} \end{aligned}$$
(7)

and additionally by imposing the transitivity constraints which gives

$$\begin{aligned} \begin{aligned} \min ~&\frac{1}{2}\langle L,X\rangle \\ \text {s.t.}~&\text {diag}(X) = e,\\&X a\le W e,\\&X \succeq 0,\\&X \ge 0,\\&X \in \text {MET}. \end{aligned} \end{aligned}$$
(8)

3 Extended ADMM

The SDP relaxations introduced in Sect. 2 have a huge number of constraints, even for medium-sized graphs. The total number of sign constraints for X is \(O(n^2)\) and adding the constraint \(X \in \text {MET}\) in the SDP relaxations causes \(3\left( {\begin{array}{c}n\\ 3\end{array}}\right)\) extra constraints. Therefore, solving these tight relaxations is out of reach for state-of-the-art algorithms like interior point methods (IPMs). However, finding high quality lower bounds by tight SDP relaxations for graph partition problems motivates us to develop an efficient algorithm that can deal with SDP problems with inequalities and sign constraints on large-scale instances. Since the 2-block alternating direction method of multiplier (ADMM) has shown efficiency in solving large-scale instances that interior point methods fail to solve, we are encouraged to extend this algorithm for SDP problems with inequalities in the form

$$\begin{aligned} \begin{aligned} \min ~&\langle C, X \rangle \\ \text {s.t.}~&{\mathcal {A}}(X) = b, \\&{\mathcal {B}}(X) = s, \\&X \succeq 0,\\&L \le X \le U,\\&l\le s \le u, \end{aligned} \end{aligned}$$
(9)

where \(C \in {{\mathcal {S}}}^n\), \({\mathcal {A}}: {{\mathcal {S}}}^n\rightarrow {\mathbb {R}}^m\), \({\mathcal {B}}: {{\mathcal {S}}}^n\rightarrow {\mathbb {R}}^q\), \(b \in {\mathbb {R}}^m\), \(l, u \in {\mathbb {R}}^q\). We have the slack variable \(s \in {\mathbb {R}}^q\) to form the inequality constraints, and l and u can be set to \(-\infty\) and \(+\infty\) respectively. Also, \(L \in {{\mathcal {S}}}^n\) and \(U \in {{\mathcal {S}}}^n\) can be symmetric matrices filled with all elements as \(-\infty\) and \(+\infty\) respectively. That makes formulation (9) able to represent SDP problems with any equality and inequality constraints. This formulation is inspired by the work of [23]. All semidefinite programs given above fit into this formulation. E.g., in (3) operator \({\mathcal {A}}\) includes the diagonal-constraint and the constraint \(Xe=me\), the operator \({\mathcal {B}}\) as well as the variables s are not present, L is the matrix of all zeros and U the matrix having \(+\infty\) everywhere.

Following the ideas for the 2-block ADMM [18], we form the update scheme in Algorithm 1 to solve the dual of problem (9).

Lemma 1

The dual problem for (9) is given as

$$\begin{aligned} \begin{aligned} \max ~&b^\top y + {\mathcal {F}}_{1}(S) + {\mathcal {F}}_{2}(v)\\ \text {s.t.}~&{\mathcal {A}}^{*}y + {\mathcal {B}}^{*}{\bar{y}} + S + Z = C,\\&{\bar{y}} = v,\\&Z \succeq 0, \end{aligned} \end{aligned}$$
(10)

where \({\mathcal {F}}_1(S) = \inf _{W} \{ \langle S,W \rangle \mid L \le W \le U \}\) and \({\mathcal {F}}_2(v) = \inf _{\omega } \{ \langle v,\omega \rangle \mid l \le \omega \le u\}\).

Proof

We derive this dual problem by rewriting the primal SDP problem (9) in a more explicit way, namely

$$\begin{aligned} \begin{aligned} \min ~&\langle C, X \rangle \\ \text {s.t.}~&{\mathcal {A}}(X) = b,\\&{\mathcal {B}}(X) -s = {\mathbf {0}}_q,\\&X \succeq 0,\\&X \ge L,\\&-X \ge -U,\\&s \ge l,\\&-s \ge -u.\\ \end{aligned} \end{aligned}$$
(11)

Then, the dual of (11) is

$$\begin{aligned} \begin{aligned} \max \qquad&b^\top y + {\mathbf {0}}_q^\top {\bar{y}} + \langle {\mathbf {0}}_{n\times n},Z \rangle + \langle L,S_L\rangle -\langle U, S_U\rangle + l^\top v_l - u^\top v_u \\ \text {s.t. } \qquad&{\mathcal {A}}^*y + {\mathcal {B}}^* {\bar{y}} + Z + S_L - S_U = C,\\&-{\bar{y}} + v_l - v_u = {\mathbf {0}}_q,\\&Z \succeq 0, \\&S_L, S_U , v_l, v_u \ge 0. \end{aligned} \end{aligned}$$
(12)

The following equivalences hold for each entry of the dual variables \(S_L\) and \(S_U\) in (12)

$$\begin{aligned} \begin{aligned}&X_{ij} = L_{ij} \iff S_{L,ij} \ne 0, S_{U,ij} = 0 ;\\&X_{ij} = U_{ij} \iff S_{U,ij} \ne 0, S_{L,ji} = 0 ;\\&L_{ij}< X_{ij} < U_{ij} \iff S_{L,ij} = S_{U,ij} = 0. \end{aligned} \end{aligned}$$

If we let \(S := S_L - S_U\), then for each entry of S

$$\begin{aligned}\begin{aligned}&\forall i \in [n], j \in [n], S_{ij} = {\left\{ \begin{array}{ll} S_{L,ij}, &{} \hbox { if}\ S_{L,ij} \ne 0,\\ -S_{U,ij}, &{} \text {otherwise}. \end{array}\right. } \end{aligned} \end{aligned}$$

Thus, in the dual objective function, we have

$$\begin{aligned} \begin{aligned} \langle L, S_L \rangle -\langle U, S_U \rangle = \sum _{ S_{L,ij} \ne 0 } L_{ij} S_{L,ij} - \sum _{ S_{L,ij} =0 } U_{ij} S_{U,ij}. \end{aligned} \end{aligned}$$
(13)

Expressing \(\inf _{W}\{ \langle S, W\rangle \mid L\le W \le U \}\) element-wisely gives

$$\begin{aligned} \begin{aligned}&\inf _{W_{ij}} \{ S_{ij} W_{ij} \mid L_{ij} \le W_{ij} \le U_{ij} \} = {\left\{ \begin{array}{ll} L_{ij}S_{ij}, &{} \text {if}~ S_{ij} \ge 0,\\ U_{ij}S_{ij}, &{} \text {if}~ S_{ij} < 0. \end{array}\right. } \end{aligned} \end{aligned}$$
(14)

Combining the observations above, we end up with

$$\begin{aligned} \langle L, S_L \rangle - \langle U, S_U \rangle = \inf _{W}\{ \langle S, W\rangle \mid L\le W \le U \}. \end{aligned}$$
(15)

Similarly, let \(v: = v_l- v_u\), then we have

$$\begin{aligned} \begin{aligned} l^\top v_l - u^\top v_u = \sum _{v_{l,k}\ne 0} l_{k} v_{l,k} - \sum _{v_{l,k} =0} u_{k}v_{u,k} = \inf _{\omega } \{ \langle v, \omega \rangle \mid l \le \omega \le u \} \end{aligned} \end{aligned}$$
(16)

Hence, problem (10) is equivalent to (12) and it is the dual of (9). \(\square\)

We now form the augmented Lagrangian function corresponding to (10).

$$\begin{aligned} \begin{aligned} {\mathcal {L}}(y, {\bar{y}}, Z, S,v; X, s) =~&b^\top y +{\mathcal {F}}_{1}(S) + {\mathcal {F}}_{2}(v) \\&- \langle {\mathcal {A}}^*y + {\mathcal {B}}^* {\bar{y}} + S + Z - C ,X \rangle - \langle -{\bar{y}} + v, s\rangle \\&- \frac{\sigma }{2} \Vert {\mathcal {A}}^*y + {\mathcal {B}}^* {\bar{y}} + S + Z - C\Vert ^2_F - \frac{\sigma }{2} \Vert -{\bar{y}} + v \Vert ^2. \end{aligned} \end{aligned}$$
(17)

The saddle point of this augmented Lagrangian function is

$$\begin{aligned} (y^*, {\bar{y}}^*, Z^*, S^*,v^*, X^*, s^*) := {{\,\mathrm{arg\,min}\,}}_{X,s}{{\,\mathrm{arg\,max}\,}}_{y,{\bar{y}},v,Z,S} {\mathcal {L}}(y, {\bar{y}}, Z, S,v; X, s), \end{aligned}$$
(18)

which is also an optimal solution for the primal and dual problems. If both, the primal and the dual problem, have strictly feasible points, then a point \((X, s, y, {\bar{y}}, Z, S, v)\) is optimal if and only if

$$\begin{aligned}&{\mathcal {A}}(X) = b,\quad {\mathcal {B}}(X) = s,\quad L \le X \le U,\quad l\le s \le u, \end{aligned}$$
(19a)
$$\begin{aligned}&{\mathcal {A}}^{*}y + {\mathcal {B}}^{*}{\bar{y}} + S + Z = C,\quad {\bar{y}} = v, \end{aligned}$$
(19b)
$$\begin{aligned}&X \succeq 0,\quad Z \succeq 0,\quad {\langle X,Z \rangle } = 0, \end{aligned}$$
(19c)
$$\begin{aligned}&(X_{ij} -L_{ij})(U_{ij} -X_{ij})S_{ij} = 0, ~ \forall i \in [n]~ \forall j \in [n],\quad L\le X \le U,\end{aligned}$$
(19d)
$$\begin{aligned}&(s_k-l_k)(u_k-s_k)v_k = 0, ~ \forall k \in [q],\quad l \le v \le u . \end{aligned}$$
(19e)

Remark 1

(19d) and (19e) is derived from the optimality conditions for (11), namely from

$$\begin{aligned} \begin{aligned}&(X_{ij}-L_{ij})S_{L,ij} = 0, S_{L,ij} \ge 0, X_{ij} \ge L_{ij}, \forall i \in [n] ~\forall j \in [n],\\&(U_{ij}-X_{ij})S_{U,ij} = 0, S_{U,ij} \ge 0,X_{ij} \le U_{ij}, \forall i \in [n] ~\forall j \in [n],\\&(s_{k}-l_{k})v_{l,k} = 0, v_{l,k} \ge 0, s_k \ge l_k, \forall k \in [q] ,\\&(l_{k}-s_{k})v_{l,k} = 0, v_{l,k} \ge 0, s_k \le u_k, \forall k \in [q] .\\ \end{aligned} \end{aligned}$$
(20)

With \(S = S_L - S_U\) and \(v = v_l - v_u\), we obtain (19d) and (19e).

We solve problem (18) coordinatewise, i.e., we optimize only over a block of variables at a time while keeping all other variables fixed. The procedure is outlined in Algorithm 1.

figure a

In Step 1, the minimization over \((y, {\bar{y}})\), we force the first order optimality conditions to hold, i.e., we set the gradient with respect to \((y, {\bar{y}})\) to zero and thereby obtain the explicit expression

$$\begin{aligned} \begin{aligned} \begin{pmatrix} y^{k+1}\\ {\bar{y}}^{k+1} \end{pmatrix} =\begin{pmatrix} {\mathcal {A}}{\mathcal {A}}^* &{} {\mathcal {A}}{\mathcal {B}}^* \\ {\mathcal {B}}{\mathcal {A}}^* &{} {\mathcal {B}}{\mathcal {B}}^* + I \end{pmatrix}^{-1} \begin{pmatrix} \frac{b}{\sigma ^{k}} - {\mathcal {A}}(S^k+Z^k-C + \frac{1}{\sigma ^{k}}X^k) \\ -{\mathcal {B}}(S^k+Z^k-C+\frac{1}{\sigma ^{k}}X^k) + v^k +\frac{1}{\sigma ^{k}} s^k \end{pmatrix}. \end{aligned} \end{aligned}$$
(21)

Note that the size of \(y^k\) is the number of equality constraints and the size of \({\bar{y}}^k\) the number of inequality constraints. By abuse of notation we write \({\mathcal {A}}{\mathcal {A}}^*\) for the matrix product formed by the system matrix underlying the operator \({\mathcal {A}}(\cdot )\). Similarly for \({\mathcal {B}}{\mathcal {A}}^*\), \({\mathcal {B}}{\mathcal {B}}^*\).

In practice, we solve (21) in the following way. First, we apply the Cholesky decomposition

$$\begin{aligned} RR^\top = \begin{pmatrix} {\mathcal {A}}{\mathcal {A}}^* &{} {\mathcal {A}}{\mathcal {B}}^* \\ {\mathcal {B}}{\mathcal {A}}^* &{} {\mathcal {B}}{\mathcal {B}}^* + I \end{pmatrix} = : Q. \end{aligned}$$
(22)

Since \({\mathcal {A}}\) and \({\mathcal {B}}\) are row independent, the Cholesky decomposition exists. Moreover, the Cholesky decomposition only needs to be computed once since matrix Q remains the same in all iterations. Then, we update \((y, {\tilde{y}})\) as

$$\begin{aligned} \begin{aligned} RR^\top \begin{pmatrix} y^{k+1}\\ {\bar{y}}^{k+1} \end{pmatrix} = rhs \end{aligned} \end{aligned}$$
(23)

by solving two systems of equations subsequently, i.e., \(R {\mathbf {x}} = rhs\) and then solve the system \(R^\top {\mathbf {y}} = {\mathbf {x}}\) and thereby having solved \(RR^\top {\mathbf {y}} = rhs\).

In Step 2, the minimization amounts to a projection onto the non-negative orthant.

$$\begin{aligned} \begin{aligned} S^{k+1} =&~ {{\,\mathrm{arg\,min}\,}}_{S} -{\mathcal {F}}_1(S) + \langle X^k ,S\rangle + \frac{\sigma ^k}{2}\Vert {\mathcal {A}}^*y^{k+1} +{\mathcal {B}}^*{\bar{y}}^{k+1} +S +Z^k-C \Vert ^2_{F}, \\ S^{k+1}_{ij} =&~{\left\{ \begin{array}{ll} {{\,\mathrm{arg\,min}\,}}_{S_{ij}\ge 0} \Vert M^{k+1}_{ij}+S_{ij} -\frac{1}{\sigma ^k} L_{ij} \Vert ^2, S_{ij} \ge 0,\\ - {{\,\mathrm{arg\,min}\,}}_{S_{ij}\le 0} \Vert M^{k+1}_{ij} -S_{ij} + \frac{1}{\sigma ^k} U_{ij} \Vert ^2, S_{ij} \le 0, \end{array}\right. }\\ =&~{\left\{ \begin{array}{ll} {\mathcal {P}}_{\ge 0}(-M^{k+1}_{ij}+\frac{1}{\sigma ^k} L_{ij}), S_{ij} \ge 0,\\ - {\mathcal {P}}_{\le 0}(M^{k+1}_{ij} + \frac{1}{\sigma ^k} U_{ij} ), S_{ij} \le 0, &{}~\end{array}\right. } \\ =&~ {\left\{ \begin{array}{ll} \frac{1}{\sigma ^k}{\mathcal {P}}_{\ge L_{ij}} (\sigma ^k M^{k+1}_{ij}) -M^{k+1}_{ij},S_{ij} \ge 0,\\ \frac{1}{\sigma ^k}{\mathcal {P}}_{\le U_{ij}} (\sigma ^k M^{k+1}_{ij}) -M^{k+1}_{ij}, S_{ij} \le 0, \end{array}\right. }\\ =&\frac{1}{\sigma ^k}{\mathcal {P}}_{[L_{ij},U_{ij}]} (\sigma ^k M^{k+1}_{ij}) -M^{k+1}_{ij}, \end{aligned} \end{aligned}$$
(24)

where \(M^{k+1}:= {\mathcal {A}}^*y^{k+1} + {\mathcal {B}}^*{\bar{y}}^{k+1} + Z^k + \frac{1}{\sigma ^k} X^k -C\). Hence,

$$\begin{aligned} S^{k+1}= \frac{1}{\sigma ^k}{\mathcal {P}}_{[L,U]} (\sigma ^{k} M^{k+1}) -M^{k+1}. \end{aligned}$$
(25)

Similarly, in Step 3 for \(v^{k+1}\) we have

$$\begin{aligned} v^{k+1} = \frac{1}{\sigma ^k} {\mathcal {P}}_{[l,u]}(\sigma ^k{\bar{y}}^{k+1} - s^k) -({\bar{y}}^{k+1} - \frac{1}{\sigma ^k} s^{k}) \end{aligned}$$
(26)

In Step 3, for \(Z^{k+1}\succeq 0\), the minimizer is found via a projection onto the cone of positive semidefinite matrices

$$\begin{aligned} \begin{aligned} Z^{k+1} =&~ {{\,\mathrm{arg\,min}\,}}_{Z\succeq 0 } \langle X^k ,Z\rangle + \frac{\sigma ^k}{2}\Vert {\mathcal {A}}^*y^{k+1} +{\mathcal {B}}^*{\bar{y}}^{k+1} +S^{k+1} +Z-C \Vert ^2_{F}\\ =&~ {{\,\mathrm{arg\,min}\,}}_{Z\succeq 0 } \Vert {\mathcal {A}}^*y^{k+1} +{\mathcal {B}}^*{\bar{y}}^{k+1} +S^{k+1} +Z + \frac{1}{\sigma ^k} X^k-C \Vert ^2_{F}\\ =&~ - {\mathcal {P}}_{\preceq 0} (N^{k+1}), \end{aligned} \end{aligned}$$
(27)

where \(N^{k+1}:= {\mathcal {A}}^*y^{k+1} +{\mathcal {B}}^*{\bar{y}}^{k+1} +S^{k+1} + \frac{1}{\sigma ^k}X^k-C\).

Finally, by substituting \(Z^{k+1}\) and \(v^{k+1}\) into Step 4 we obtain

$$\begin{aligned} \begin{aligned} X^{k+1}&=\sigma ^k {\mathcal {P}}_{\succeq 0} (N^{k+1}),\\ s^{k+1}&= {\mathcal {P}}_{[l,u]}(\sigma ^k{\bar{y}}^{k+1} - s^k). \end{aligned} \end{aligned}$$
(28)

Remark 2

Throughout Algorithm 1, the complementary slackness condition for \((X^{k+1},Z^{k+1})\) holds. This is since \(\frac{1}{\sigma ^k}X^{k+1}\) is the projection onto the positive semidefinite cone of matrix \(N^{k+1}\), while \(-Z^{k+1}\) is a projection onto the negative semidefinite cone of the same matrix \(N^{k+1}\).

3.1 Stepsize adjustment

Previous numerical results showed that the practical performance of an ADMM is strongly influenced by the stepsize \(\sigma\). The most common way is to adjust \(\sigma\) to balance primal and dual infeasibilities: if \(\varepsilon _p/ \varepsilon _d < c\) for some constant c, then increase \(\sigma\); if \(\varepsilon _p/ \varepsilon _d > \frac{1}{c}\), then decrease \(\sigma\).

Lorenz and Tran-Dinh [17] derived an adaptive stepsize for the Douglas-Rachford Splitting (DRS) scheme. In the setting of the 2-block ADMM, this translates to the ratio between norms of the primal and dual variables \(\frac{\Vert X_k\Vert }{\Vert Z_k\Vert }\) in the k-th iteration. In general, for the 2-block ADMM this update rule yields a better performance than the former one.

In this paper, we use either of these update rules, depending on the type of problems we solve. For SDP problems with equations and nonnegativity constraints only, we apply the adaptive stepsize method from Lorenz and Tran-Dinh [17] since it works very well in practice. However, the situation is different for SDP problems with inequalities different than nonnegativity constraints. In this case, we use the classic method to adjust the stepsize \(\sigma\) according to the ratio between the primal and dual infeasibilities.

4 Lower bound post-processing algorithms

We relax the original graph partition problem to an SDP problem, thereby generating a lower bound on the original problem. However, when solving the SDP by a first-order method, it is hard to reach a solution to high precision in reasonable computational time. Therefore, we stop the ADMM already when a medium precision is reached. In this way, however, the solution obtained by the ADMM is not always a safe underestimate for the optimal solution of the SDP problem. Hence, we need a post-processing algorithm that produces a safe underestimate for the SDP relaxation, which is then also a lower bound for the graph partition problem.

Theorem 1 leads to the first post-processing algorithm. Before stating it, we rewrite Lemma 3.1 from [13] in our context.

Lemma 2

Let \(X, Z \in {\mathcal {S}}_n\), and \(0 \le \lambda (X)\le {\bar{x}}\), where \(\lambda (\cdot )\) indicates the operation of getting the eigenvalues of the respective matrix. Then

$$\begin{aligned} \langle X, Z \rangle \ge \sum _{\lambda (Z) < 0} {\bar{x}} \cdot \lambda (Z). \end{aligned}$$

Theorem 1

Given \(Z\in {{\mathcal {S}}}^n\), \(y \in {\mathbb {R}}^m\), \({\tilde{y}} \in {\mathbb {R}}^q\), \({\tilde{v}} \in {\mathbb {R}}^q\), \({\tilde{S}} \in {{\mathcal {S}}}^n\), and let \(X \in {{\mathcal {S}}}^n_+\) be an optimal solution for (9) and \({\bar{x}} \ge \lambda _{\max }(X)\), then we have a safe lower bound for the optimal value \(p^*\)

$$\begin{aligned} \text {lb}:= b^\top {\tilde{y}} + {\mathcal {F}}_1({\tilde{S}}) + {\mathcal {F}}_2({\tilde{v}}) + \sum _{\lambda (Z) <0} {\bar{x}} \lambda (Z). \end{aligned}$$
(29)

Proof

We recall the alternative formulation of (9),

$$\begin{aligned} \begin{aligned} \min ~&\langle C, X \rangle \\ \text {s.t.}~&{\mathcal {A}}(X) = b,\\&{\mathcal {B}}(X) -s = 0,\\&X \succeq 0,\\&X \ge L,\\&-X \ge -U,\\&s \ge l,\\&-s \ge -u,\\ \end{aligned} \qquad \quad \qquad \qquad{(11)} \end{aligned}$$

and the corresponding dual problem

$$\begin{aligned} \max \qquad&b^\top y +{\mathbf {0}}_q^\top {\bar{y}} +\langle {\mathbf {0}}_{n\times n}, Z \rangle + \langle L, S_L \rangle -\langle U,S_U \rangle + l^\top v_l - u^\top v_u \\ \text {s.t. } \qquad&{\mathcal {A}}^*y + {\mathcal {B}}^* {\bar{y}} + Z + S_L - S_U = C,\\&-{\bar{y}} + v_l - v_u = 0, \qquad \quad \qquad \qquad (12)\\&Z \succeq 0, \\&S_L, S_U , v_l, v_u \ge 0. \end{aligned} $$

Given an optimal solution \(X^*\) from (11) and the free variable \({\tilde{y}}\) and nonnegative variables \(({\tilde{v}}_l,{\tilde{v}}_u,{\tilde{S}}_L, {\tilde{S}}_U )\), we define \(Z: = C - {\mathcal {A}}^*{\tilde{y}}- {\mathcal {B}}^*{\tilde{v}}_l + {\mathcal {B}}^* {\tilde{v}}_u - S_L + S_U\) and have

$$\begin{aligned}&\langle C,X^*\rangle - (b^\top {\tilde{y}} + l^\top {\tilde{v}}_l - u^\top {\tilde{v}}_u+ \langle L, {\tilde{S}}_L \rangle -\langle U, {\tilde{S}}_U \rangle ), \end{aligned}$$
(30a)
$$\begin{aligned} =&~ \langle C,X^*\rangle - \langle {\mathcal {A}}^*{\tilde{y}},X^*\rangle - (l^\top {\tilde{v}}_l - u^\top {\tilde{v}}_u+ \langle L, {\tilde{S}}_L\rangle - \langle U,{\tilde{S}}_U\rangle ),\end{aligned}$$
(30b)
$$\begin{aligned} \geq&~ \langle C,X^*\rangle - \langle {\mathcal {A}}^*{\tilde{y}},X^*\rangle - \langle {\mathcal {B}}^*{\tilde{v}}_l,X^* \rangle + \langle {\mathcal {B}}^*{\tilde{v}}_u,X^* \rangle - \langle X, {\tilde{S}}_L \rangle + \langle U, {\tilde{S}}_U\rangle , \end{aligned}$$
(30c)
$$\begin{aligned} =&~ \langle C - {\mathcal {A}}^*{\tilde{y}}- {\mathcal {B}}^*{\tilde{v}}_l + {\mathcal {B}}^* {\tilde{v}}_u - S_L + S_U, X^*\rangle , \end{aligned}$$
(30d)
$$\begin{aligned} =&~ \langle Z, X^*\rangle , \end{aligned}$$
(30e)
$$\begin{aligned} \ge&\sum _{\lambda (Z) <0} {\bar{x}} \lambda (Z), \end{aligned}$$
(30f)

where inequality (30c) holds because \({\tilde{v}}_l,{\tilde{v}}_u,{\tilde{S}}_U\) and \({\tilde{S}}_L\) are nonnegative. This gives us a lower bound for the problem (11) as

$$\begin{aligned} \text {lb}:= b^\top {\tilde{y}} + l^\top {\tilde{v}}_l - u^\top {\tilde{v}}_u+ \langle L, {\tilde{S}}_L \rangle - U\cdot {\tilde{S}}_U + \sum _{\lambda (Z) <0} {\bar{x}} \lambda (Z). \end{aligned}$$
(31)

On substituting \(S:= S_L-S_U\) and \(v: = v_l- v_u\) into the objective function we have

$$\begin{aligned} \begin{aligned} \langle L, S_L \rangle -\langle U, S_U \rangle = \inf _{W}\{ \langle S, W\rangle \mid L\le W \le U \},\\ l^\top v_l - u^\top v_u = \inf _{\omega } \{ \langle v, \omega \rangle \mid l \le \omega \le u \}. \end{aligned} \end{aligned}$$
(32)

Consequently, we can rewrite (31) as

$$\begin{aligned} \text {lb}:= b^\top {\tilde{y}} + {\mathcal {F}}_1({\tilde{S}}) + {\mathcal {F}}_2({\tilde{v}}) + \sum _{\lambda (Z) <0} {\bar{x}} \lambda (Z). \qquad \quad \qquad \qquad{(29)} \end{aligned}$$

\(\square\)

For specifically structured SDP problems, a value of \({\bar{x}}\) might be known. Otherwise, without any information about an upper bound \({\bar{x}}\) in (29) for the maximal eigenvalue \(\lambda _{\max }(X)\), we approximate \({\bar{x}}\) as \(\lambda _{\max }({\tilde{X}})\) where the output from the extended ADMM is \(({\tilde{X}},{\tilde{y}}, {\tilde{v}},{\tilde{S}})\). Then, we scale it with \(\mu > 1\), e.g., \(\mu = 1.1\), to have a safe bound \(\mu {\bar{x}}\). Note that this requires that the solution of the extended ADMM, i.e., Algorithm 1, is satisfied with reasonable accuracy, say \(\varepsilon = 10^{-5}\).

The complete post-processing algorithm is summarized in Algorithm 2.

figure b

As for the k-equipartition problem (3), we have \(X \preceq m\cdot I\) for any feasible solution X. Hence, we let \({\bar{x}} = m\) when applying post-processing Algorithm 2 for k-equipartition problems. As for the GPKC, we have no value \({\bar{x}}\) at hand.

Another way to get a safe lower bound for (9) is to tune the output results and get a feasible solution for its dual problem (10). This is outlined as Algorithm 3. The brief idea is to build a feasible solution \((y_{new},v_{new},Z_{new},S_{new})\) from an approximate solution \(({\tilde{y}},{\tilde{v}},{\tilde{Z}},{\tilde{S}})\). To guarantee feasibility of \((y_{new},v_{new}\), \(Z_{new},S_{new})\), we first get a \(Z_{new}\) by projecting \({\tilde{Z}}\) on the cone of positive semidefinite matrices. We then keep \(Z_{new}\) fixed and hence have a linear problem. The final step is to find the optimal solution for this linear programming problem.

In Algorithm 1, the condition \(Z \succeq 0\) is guaranteed by the projection operation onto the cone of positive semidefinite matrices. Hence, we can skip Step 1 in Algorithm 3.

We would like to remark that the linear program can be infeasible, but this algorithm works well when the input solution has a good precision. The comparisons of numerical results of these two post processing algorithms are given in Sect. 6.2.

figure c

5 Building upper bounds from the SDP solutions

Computing upper bounds of a minimization problem is typically done via finding feasible solutions of the original problem by heuristics.

A k-equipartition problem can be transformed into a quadratic assignment problem (QAP), and we can find feasible solutions for a QAP by simulated annealing (SA), see, e.g., [22]. However, this method comes with a high computational expense for large graphs. Moreover, it cannot be generalized to GPKC problems.

Here we consider building upper bounds from the optimizer of the SDP relaxations. We apply different rounding strategies to the solution X of the SDP relaxations presented in Sect. 2.

5.1 Randomized algorithm for k-equipartition

The first heuristic is a hyperplane rounding algorithm that is inspired by the Goemans and Williamson algorithm for the max-cut problem [10] and Frieze and Jerrum [7]’s improved randomized rounding algorithm for k-cut problems.

Note that the Goemans and Williamson algorithm as well as the Frieze and Jerrum algorithm are designed for cut-problems formed as models on variables in \(\{-1/(k-1),1\}^n\), while our graph partition problems are formed on \(\{0,1\}^n\). Therefore, we need to transform the SDP solutions of problems (3) and (7) before applying the hyperplane rounding procedure. Our hyperplane rounding algorithm for k-equipartition is given in Algorithm 4.

figure d

5.2 Vector clustering algorithm for k-equipartition

We next propose a heuristic via the idea of vector clustering. Given a feasible solution X of (3), we can get \(V \in {\mathbb {R}}^{n\times n}\) with \(V V^\top =X\). Let \(v_i\) be the i-th row of V and associate it with vertex i in the graph. The problem of building a feasible solution from X can then be interpreted as the problem of clustering vectors \(v_1, \dots , v_n\) into k groups. This can be done heuristically as follows.

  1. 1.

    Form a new group with an unassigned vector.

  2. 2.

    Select its \(m-1\) closest unassigned neighbors and add them in the same group.

  3. 3.

    Update the status of those vectors as assigned.

This process is repeated \(k-1\) times until all vectors are assigned in a group, yielding a k-equipartition for the vertices in V. The details are given in Algorithm 5.

figure e

5.2.1 Measure closeness between vertices

We explain in this section how we determine the closest neighbor for a vector. The idea of vector clustering is to have vectors with more similarities in the same group. In our setting, we need a measure to define the similarity between two vectors according to the SDP solution.

For a pair of unit vectors \(v_i\), \(v_j\), using the relationship \(\cos \measuredangle (v_i,v_j) = v_i^\top v_j\) one can measure the angle between \(v_i\) and \(v_j\).

By the setting \(V V^\top = X\), we have for any \(i \in [n]\)

$$\begin{aligned} x_i = \begin{pmatrix} v_i^\top v_1 \\ \vdots \\ v_i^\top v_n \end{pmatrix}= \begin{pmatrix} \cos \measuredangle (v_i,v_1) \\ \vdots \\ \cos \measuredangle (v_i,v_n) \end{pmatrix}, \end{aligned}$$
(33)

where \(x_i\) is the i-th row vector in X.

Hence, \(x_i\) consists of the cosines of the angle between \(v_i\) and other vectors. We define \(\text {sim}(v_i,v_j): = \sum _{k=1}^{n} \cos \measuredangle (v_i,v_k) \cos \measuredangle (v_j,v_k) = x_i^\top x_j\) and use this as a measure in Algorithm 5. In other words, we measures the closeness between \(v_i\) and \(v_j\) by their geometric relationships with other vectors.

In Algorithm 5, we choose a vector as the center of its group and then find vectors surrounding it and assign them to this group.

In each iteration we randomly choose one vector to be the center.

5.3 Vector clustering algorithms for GPKC

Using similar ideas as in Algorithm 5, we construct a rounding algorithm (see Algorithm 6) for GPKC as follows.

  1. 1.

    In each iteration, randomly choose an unassigned vector \(v_i\) to start with.

  2. 2.

    Add vectors in the group of \(v_i\) in the order according to \(\text {sim}(v_i,v_j)\), \(\forall j\ne i \in [n]\), until the capacity constraint is violated.

  3. 3.

    If no more vector fits into the group, then this group is completed and we start forming a new group.

figure f

5.4 2-opt for graph partition problems

2-opt heuristics are used to boost solution qualities for various combinatorial problems, e.g., TSP [15]. We apply this method after running our rounding algorithms for the graph partition problems to improve the upper bounds. According to the rounding method we choose, the hybrid strategies are named as Hyperplane+2opt (also short as Hyp+2opt) and Vc+2opt for Algorithms 4 and 5, respectively.

The 2-opt heuristic for bisection problems is outlined in Algorithm 7. Given a partition with more than two groups, we apply 2-opt on a pair of groups \((P_s,P_t)\), which is randomly chosen from all groups in the partition, and repeat it on a different pair of groups until no more improvement can be found.

For GPKC, some adjustments are needed because of the capacity constraints. We only traverse among swaps of vertices that still give feasible solutions to find the best swap that improves the objective function value.

figure g

6 Numerical results

We implemented all the algorithms in MATLAB and run the numerical experiments on a ThinkPad-X1-Carbon-6th with 8 Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz. The maximum iterations for extended ADMM is set to be 20 000 and the stopping tolerance \(\varepsilon _{tol}\) is set to be \(10^{-5}\) by default.

The code can be downloaded from https://github.com/shudianzhao/ADMM-GP.

6.1 Instances

In order to evaluate the performance of our algorithms, we run numerical experiments on several classes of instances. All instances can be downloaded from https://github.com/shudianzhao/ADMM-GP. The first set of instances for the k-equipartition problem are described in [16], the construction is as follows.

  1. 1.

    Choose edges of a complete graph randomly with probability \(20\%\), \(50\%\) and \(80\%.\)

  2. 2.

    The nonzero edge weights are integers in the interval (0, 100].

  3. 3.

    Choose the partition numbers as divisors of the graph size n.

We name those three groups of instances rand20, rand50 and rand80, respectively.

Furthermore, we consider instances that have been used in [2]. These are constructed in the following way.

  • \(G_{|V |,|V |_p}\): Graphs G(VE), with \(|V | \in \{124, 250, 500, 1000\}\) and four individual edge probabilities p. These probabilities were chosen depending on |V|, so that the average expected degree of each node was approximately \(|V |_p = 2.5, 5, 10, 20\) [14].

  • \(U_{|V |,|V |_{\pi d^2}}\): For a graph G(VE), first choose 2|V| independent numbers uniformly from the interval (0, 1) and view them as coordinates of |V| nodes on the unit square. Then, an edge is inserted between two vertices if and only if their Euclidian distance is less or equal to some pre-specified value d [14]. Here \(|V | \in \{ 500, 1000\}\) and \(|V |_{\pi d^2} \in \{ 5, 10, 20, 40\}\).

  • mesh: Instances from finite element meshes; all edge weights are equal to one [5].

For GPKC we generate instances as described in [19]. This is done by the following steps.

  1. 1.

    Generate a random matrix with \(20\%\), \(50\%\) and \(80\%\) of nonzeroes edge weights between 0 and 100, as vertex weights choose integers from the interval (0, 1000].

  2. 2.

    Determine a feasible solution for this instance for a k-equipartition problem by some heuristic method.

  3. 3.

    Produce 1000 permutations of the vertices in this k-equipartition.

  4. 4.

    Calculate the capacity bound for each instance and select the one such that only \(10\%\) of instances are feasible.

We name those three groups of instances GPKCrand20, GPKCrand50 and GPKCrand80, respectively.

6.2 Comparison of Post-processing methods

Our first numerical comparisons evaluate the different post-processing methods used to produce safe lower bounds for the graph partition problems. Recall that in Sect. 4, we introduced Algorithms 2 and 3.

Fig. 1
figure 1

Lower bounds obtained with post processing

Figure 1 shows how the lower bounds from the post-processing methods evolve as the number of iterations of the extended ADMM increases. We used the DNN relaxation on an instance of the k-equipartition problem of size \(n=100\) and \(k=2\). There are three lines: EB_eADMM represents the lower bounds obtained by the rigorous lower bound method given in Algorithm 3, LpB_eADMM represents the linear programming bound given in Algorithm 2 and dualOfv_eADMM displays the approximate dual objective function value obtained by our extended ADMM. Figure 1a shows that the rigorous error bound method gives tighter bounds in general, while the linear programming bound method is more stable and less affected by the quality of the dual objective function value. The other figures indicate that for small k, the rigorous error bound method gives tighter bounds (Fig. 1b), but as k increases, the linear programming bound method dominates (Fig. 1c, d).

Remark 3

We choose Algorithm 2 in all following experiments as post-processing for k-equipartition problems because this method is more stable for varying k. For GPKC we use Algorithm 3 for the post-processing since we have no information on the eigenvalue of an optimal solution.

6.3 Results for k-equipartition

6.3.1 Comparison of the Lower Bounds using SDP, DNN and Transitivity Constraints

In this section we want to highlight the improvement of the bounds obtained from the relaxations introduced in Sect. 2. Note that the timings for computing these bounds are discussed later in Sect. 6.3.2.

In practice, adding all the transitivity constraints is computationally too expensive, we run a DNN-based loop instead. The idea is as follows.

  1. 1.

    Solve the DNN (3) to obtain the solution \(X^{DNN}\).

  2. 2.

    Add \(m_{met}\) transitivity constraints that are most violated by \(X^{DNN}\) to the relaxation.

  3. 3.

    Solve the resulting relaxation and repeat adding newly violated constraints until the maximum number of iterations is reached or no more violated constraints are found.

Tables 1, 2 and 3 compare the lower bounds obtained from the relaxations for the k-equipartition problem. The improvements are calculated as \((d_{DNN} -d_{SDP})/d_{SDP}\) and \((d_{DNN+MET} -d_{SDP})/d_{SDP}\), respectively; a ‘−’ indicates that no transitivity constraints violated by the SDP solution of problem (3) have been found. In [21] it has been observed that the violation of the transitivity constraints is small and the nonnegativity constraints \(X \ge 0\) are more important than \(X \in \text {MET}\) when the partition number k increases. In our experiments we also observe that the improvement due to the nonnegativity constraints gets even better as k increases.

Table 1 k-equipartitioning lower bounds on rand80
Table 2 k-equipartitioning lower bounds on rand50
Table 3 k-equipartitioning lower bounds on rand20

6.3.2 Comparisons between extended ADMM and Interior Point Methods (IPMs) on k-equipartition

In this section we want to demonstrate the advantage of our extended ADMM over interior point methods. For our comparisons we use Mosek [1], one of the currently best performing interior point solvers.

Note that computing an equipartition for these graphs using commercial solvers is out of reach. For instance, Gurobi obtains for a graph with \(n=100\) vertices and \(k\in \{2,4,5,10,20,25\}\) after 120 s a gap of at least 80 %, whereas we obtain gap up to at most 7 %.

We list the results for solving the SDP (2) using an ADMM, and the results when solving the DNN relaxation (3) by our extended ADMM and by Mosek. We run the experiments on randomly generated graphs with 80% density, the results are given in Table 4.

Table 4 Computation times for k-equipartitioning problems

Table 4 shows that the convergence behavior of the extended ADMM is not worse than the 2-block ADMM for SDP problems, and we can get a tighter lower bound by forcing nonnegativity constraints in the model without higher computational expense.

The results for Mosek solving problem (3) clearly show that these problems are out of reach for interior point solvers. A “−” indicates that Mosek failed to solve this instance due to memory requirements.

6.3.3 Heuristics on k-equipartition Problems

We now compare the heuristics introduced in Sect. 5 to get upper bounds for the graph partition problems.

We use the solutions obtained from the DNN relaxation to build upper bounds for the k-equipartition problem since the experimental results in Sect. 6.4.1 showed that the DNN relaxation has a good tradeoff between quality of the bound and solution time.

We compare to the best know primal solution given in [2]; we set the time limit for our heuristics to 5 s. The gaps between the upper bounds by Vc+2opt (resp. Hyp+2opt) and the best know solution are shown in Table 5. The primal bounds that are proved to be optimal are marked with “\(^*\)”.

Table 5 Feasible solutions for the graphs from [2] (the time limit is 5 s, optimal solutions are indicated by a “\(^*\)”)

Table 5 shows that on small instances our heuristics can find upper bounds not worse than the best known upper bounds. For large instances, Vc+2opt performs better than Hyp+2opt. The corresponding upper bounds are less than 10 % away from the best known upper bounds, some of them computed using 5 h.

We next compare the upper bounds for the instances rand80, rand50, and rand20. Figure 2 shows that, for small instances (i.e., \(n=100\)), our hybrid methods (eg. Vc+2opt and Hyp+2opt) can find tight upper bounds quickly while simulated annealing (SA) needs a longer burning down time to achieve an upper bound of good quality.

Figure 3 shows how the heuristics behave for large-scale instances (i.e, \(n= 1000\)). The time limit is set to 5 s. Compared to Fig. 2, Vc+2opt and Hyp+2opt take more time to generate the first upper bounds but these upper bounds are much tighter than the one found by SA. Also, when the time limit is reached, the upper bounds found by Vc+2-opt and Hyp+2opt are much tighter than those from SA.

Fig. 2
figure 2

Upper bounds for k-equipartition problems on rand80 with \(n=100\)

Fig. 3
figure 3

Upper bounds for k-equipartition problems on rand80 with \(n=1000\)

Tables 6, 7 and 8 give a detailed comparison of the upper bounds for the instances rand80, rand50, and rand20, respectively. We display the gap between the lower bounds obtained from the DNN relaxation (3) and the upper bounds built by varied heuristics. The time limit for the heuristics is set to 1 s for \(n\in \{100,200\}\) and 3 s for \(n\in \{900,1000\}\) for rand80. For rand50 and rand20 we set the limit to 5 s. The best upper bounds are typeset in bold.

The numbers confirm that Vc+2opt and Hyp+2opt can build tighter upper bounds than SA, in particular for the dense graphs rand80. Overall, Vc+2opt has the best performance.

Comparing lower and upper bounds, the numerical results show that our methods perform very well on dense graphs; for rand80 the largest gap is less than 4%, for rand50 the largest gap is less than 6%. As the randomly generated graph gets sparser, the gap between lower bounds and upper bounds increases, for rand20 the gap is bounded by 12%.

Table 6 Feasible solutions for randomly generated graphs rand80 (for instances with \(n \in \{100,200\}\), the time limit is 1 s; for instances with \(n \in \{ 900, 1000\}\), the limit is 3 s)
Table 7 Feasible solutions for randomly generated graphs rand50 (time limit 5 s)
Table 8 Feasible solutions for randomly generated graphs rand20 (time limit 5 s)

Comparing Tables 6, 7 and 8 , it can be observed that the gaps get larger as the graph gets sparser. We conjecture that this is due to less tightness of the lower bound which is supported by the following experiment.

We regard the best upper bounds obtained from all three heuristics within an increased time limit of 10 s. In this way, we should have an upper bound that approximates the optimal solution well enough for all densities. As an example, in Table 9 we report for a graph on 100 vertices and three different densities the lower and these upper bounds. We can clearly see that when the graph gets sparser, adding the nonnegativity constraints to the SDP relaxation (2) gains more improvement. However, the gap between lower and upper bound gets worse as the graph gets sparser.

Table 9 Feasible solutions for the randomly generated graphs rand80, rand50, rand20 (\(n=100\) \(k=5\), with an increased time limit for heuristics of 10 s)

6.4 Results for GPKC

6.4.1 Comparison of the Lower Bounds using SDP, DNN and Transitivity Constraints

We now turn our attention to the GPKC problem. We run experiments similar to those presented in Sect. 6.3.1, i.e., we solve DNN (7) to obtain the solution \(X^{DNN}\). Table 10 shows the lower bounds for GPKC problems on the randomly generated graphs rand80. The improvements are calculated in the same way as in the previous section. The experimental results on GPKCrand50 and GPKCrand20 are omitted since they have a similar behavior.

The lower bounds obtained from different SDP relaxations show that when the capacity bound W decreases (and thus the number of groups increases), the improvement of the nonnegativity constraints gets more significant. This is in line with the results for k-equipartition. And, also similar to k-equipartition, for GPKC the improvement due to the transitivity constraints is only minor.

Table 10 GPKC lower bounds on GPKCrand80

6.4.2 Comparisons between extended ADMM and IPMs for GPKC

Table 11 compares the computation times when solving the DNN relaxations for the GPKC (7) by the extended ADMM and Mosek, respectively. A “–” indicates for extended ADMM that the maximum number of iterations is reached, and for Mosek that the instance could not be solved due to memory requirements. The results of the SDP relaxation (6) in Table 10 are computed using Mosek, hence we omit these timings in Table 11.

In the thesis [19], numerical results on the GPKC are presented using an LP relaxation. However, the method therein is capable of getting bounds either for very sparse graphs of density at most 6 % (up to 2000 vertices) or for graphs with up to 140 vertices and density of at most 50 %. We clearly outperform these results in terms of the density of the graphs that can be considered.

While for instances of size \(n=100\), the timings of the extended ADMM and Mosek are comparable, the picture rapidly changes as n increases. For \(n\ge 300\), Mosek cannot solve any instance while the extended ADMM manages to obtain bounds for instances with \(n=500\) within 1 h.

Table 11 Computation times for GPKC problems

6.4.3 Heuristics on GPKC problems

As mentioned in Sect. 5, the simulated annealing heuristic for the QAP cannot be applied to the GPKC, because there is no equivalence between the GPKC and the QAP. Therefore, we compare the upper bounds for the GPKC from the heuristic introduced in Sect. 5.3 with the lower bounds given by the DNN relaxation (7). We set a time limit of 5 s.

Also, we set the maximum number of iterations to be 50,000 for the sparse graph GPKCrand20, while the maximum numbers of iterations for GPKCrand50 and GPKCrand80 are 20,000. In Tables 12, 13 and 14, a \(^*\) indicates for the extended ADMM that the maximum number of iterations is reached.

Table 12 shows that the gaps between the lower and upper bounds are less than 3% for GPKCrand80, they are less than 7% for GPKCrand50, see Table 13, and for GPKCrand20, the gaps are less than 15%, see Table 14. Similar to the k-equipartition problem, we note that computing the lower bound on the sparse instances is harder. The maximum number of iterations is reached for rand20 much more often than for rand80 or rand50.

Table 12 Feasible solutions for randomly generated graphs on GPKC problems GPKCrand80 (the maximum number of iterations for eADMM is 20 000, a \(^*\) indicates that the maximum number of iterations is reached)
Table 13 Feasible solutions for randomly generated graphs on GPKC problems GPKCrand50 (the maximum number of iterations for eADMM is 20 000, a \(^*\) indicates that the maximum number of iterations is reached)
Table 14 Feasible solutions for randomly generated graphs on GPKC problems GPKCrand20 (the maximum number of iterations for eADMM is 50 000, a \(^*\) indicates that the maximum number of iterations is reached)

7 Conclusions

In this paper we first introduce different SDP relaxations for k-equipartition problems and GPKC problems. Our tightest SDP relaxations, problems (4) and (8), contain all nonnegativity constraints and transitivity constraints, which bring \(O(n^3)\) constraints in total. Another kind of tight SDP relaxation, (3) and (7), has only nonnegativity constraints. While it is straight forward to consider the constraint \(X\ge 0\) in a 3-block ADMM, including all the transitivity constraints is impractical. Therefore, our strategy is to solve (3) and (7) and then adding violated transitivity constraints in loops to tighten both SDP relaxations.

In order to deal with the SDP problems with inequality and bound constraints, we extend the classical 2-block ADMM, which only deals with equations, to the extended ADMM for general SDP problems. This algorithm is designed to solve large instances that interior point methods fail to solve. We also introduce heuristics that build upper bounds from the solutions of the SDP relaxations. The heuristics include two parts, first we round the SDP solutions to get a feasible solution for the original graph partition problem, then we apply 2-opt methods to locally improve this feasible solution. In the procedure of rounding SDP solutions, we introduce two algorithms, the vector clustering method and the generalized hyperplane rounding method. Both methods perform well with the 2-opt method.

The extended ADMM can solve general SDP problems efficiently. For SDP problems with bound constraints, the extended ADMM deals with them separately from inequalities and equations, thereby solving the problems more efficiently. Mosek fails to solve the DNN relaxations of problems with \(n\ge 300\) due to memory requirements while the extended ADMM can solve the DNN relaxations for k-equipartition problems on large instances up to \(n =1000\) within as few as 5 min and for GPKC problems up to \(n=500\) within as little as 1 h.

We run numerical tests on instances from the literature and on randomly generated graphs with different densities. The results show that SDP relaxations can produce tighter bounds for dense graphs than sparse graphs. In general, the results show that nonnegativity constraints give more improvement when k increases.

We compare our heuristics with a simulated annealing method in the generation of upper bounds for k-equipartition problems. Our heuristics obtain upper bounds displaying better quality within a short time limit, especially for large instances. Our methods show better performance on dense graphs, where the final gaps are less than 4% for graphs with 80% density, while the gaps between lower and upper bounds for sparse graphs with 20% density are bounded by 12%. This is mainly due to the tighter lower bounds for dense graphs.