Several algorithmic alternatives to interior-point methods have been proposed in the literature to solve semidefinite programs [24,25,26]. The alternating direction method of multipliers (ADMM) has been studied over the last decades due to its wide range of applications in a number of areas and its capability of solving large-scale problems [15, 27, 28]. ADMM is an algorithm that solves convex optimization problems by breaking them into smaller pieces, which are easier to handle. This is achieved by an alternating optimization of the augmented Lagrangian function. The method deals with the following problems with only two blocks of functions and variables:
$$\begin{aligned} \begin{aligned} \text {minimize } \quad&f_1(x_1) + f_2(x_2) \\ \text { subject to } \quad&A_1x_1 + A_2x_2 = b \\ \quad&x_1 \in \mathcal {X}_1, \ x_2 \in \mathcal {X}_2. \end{aligned} \end{aligned}$$
(4)
Let
$$\begin{aligned} \mathcal {L}_{\rho }(x_1,x_2,y) = f_1(x_1) + f_2(x_2) + y^T\left( A_1x_1 + A_2x_2 - b \right) + \frac{\rho }{2}\left\| A_1x_1 + A_2x_2 - b \right\| _2^2 \end{aligned}$$
be the augmented Lagrangian function of (4) with the Lagrange multiplier y and a penalty parameter \(\rho > 0\). Then ADMM consists of iterations
$$\begin{aligned} \begin{aligned} x_1^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{x_1 \in \mathcal {X}_1} \mathcal {L}_{\rho }(x_1,x_2^k, y^k)\\ x_2^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{x_2 \in \mathcal {X}_2} \mathcal {L}_{\rho }(x_1^{k+1},x_2, y^k)\\ y^{k+1}&= y^k + \rho \left( A_1x_1^{k+1} + A_2x_2^{k+1} - b \right) . \end{aligned} \end{aligned}$$
(5)
For details the reader is referred to [27] and references therein.
ADMM has already been successfully applied for solving semidefinite relaxations of combinatorial optimization problems. The Boundary Point Method [28] is an augmented Lagrangian method applied to the dual SDP in standard form. In the implementation, alternating optimization is used since only one iteration is performed in the inner loop. The method is currently one of the best algorithms for computing the theta number of a graph [29]. However, the drawback of the method is that it can only solve equality constrained semidefinite programs. Wen et al. [15] present ADMM for general semidefinite programs with equality and inequality constraints. Compared to our approach, the drawback of their method is in solving the quadratic program over the nonnegative orthant when computing the dual variable corresponding to inequality constraints. Furthermore, they use multi-block ADMM for which the proof of convergence is not presented.
ADMM method for \((\text {MC}_{\scriptscriptstyle {\text {HYP}}})\)
By introducing the slack variable s, the problem \((\text {MC}_{\scriptscriptstyle {\text {HYP}}})\) is equivalent to
$$\begin{aligned} \begin{aligned} \text {maximize } \quad&\langle L, X \rangle \\ \text {subject to} \quad&{{\,\mathrm{diag}\,}}(X) = e \\ \quad&{\mathcal {B}}(X) +s = e\\ \quad&X \succeq 0, \ s \ge 0. \end{aligned} \end{aligned}$$
(6)
One can easily verify that its dual problem can be written as
$$\begin{aligned} \begin{aligned} \text {minimize } \quad&e^Ty + e^Tt \\ \text { subject to } \quad&L-{{\,\mathrm{Diag}\,}}(y) -{\mathcal {B}}^T(t) + Z = 0 \\ \quad&u-t = 0 \\ \quad&y, t \text { free}, \ Z \succeq 0, \ u \ge 0, \end{aligned} \end{aligned}$$
(7)
where y and t are dual variables associated respectively with the equality constraints, whereas u and Z are dual multipliers to the conic constraints.
Let m denote the number of hypermetric inequalities considered in \((\text {MC}_{\scriptscriptstyle {\text {HYP}}})\). For fixed \(\rho > 0\), consider the augmented Lagrangian \(L_{\rho }\) for (7):
$$\begin{aligned} L_{\rho }(y,t,Z,u;X,s)&= \ e^Ty + e^Tt + \langle L-{{\,\mathrm{Diag}\,}}(y)-{\mathcal {B}}^T(t)+Z, X \rangle \ \\&\quad +\ \frac{\rho }{2}\left\| L-{{\,\mathrm{Diag}\,}}(y)-{\mathcal {B}}^T(t)+Z\right\| ^2 _F+ \langle u-t, s \rangle + \frac{\rho }{2}\left\| u-t\right\| ^2 _2\\&= \ e^Ty + e^Tt + \frac{\rho }{2}\left\| L-{{\,\mathrm{Diag}\,}}(y)-{\mathcal {B}}^T(t)+Z + X/ \rho \right\| ^2 _F \ \\&\quad + \ \frac{\rho }{2}\left\| u-t + s/ \rho \right\| ^2 _2 - \frac{1}{2\rho }\left\| X \right\| _F^2 - \frac{1}{2\rho }\left\| s \right\| ^2_2. \end{aligned}$$
The alternating direction method of multipliers for problem (7) consists of the alternating minimization of \(L_{\rho }\) with respect to one variable while keeping others fixed to get y, Z, t, and u. Then the primal variables X and s are updated using the following rules:
$$\begin{aligned} y^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{y \in {\mathbb {R}}^n} \ L_{\rho }(y,t^k,Z^k,u^k;X^k,s^k) \end{aligned}$$
(8a)
$$\begin{aligned} t^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{t \in {\mathbb {R}}^m} \ L_{\rho }(y^{k+1},t,Z^k,u^k;X^k,s^k)\end{aligned}$$
(8b)
$$\begin{aligned} Z^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{Z \succeq 0} \ L_{\rho }(y^{k+1},t^{k+1},Z,u^k;X^k,s^k) \end{aligned}$$
(8c)
$$\begin{aligned} u^{k+1}&= {{\,\mathrm{arg\,min}\,}}_{s \in {\mathbb {R}}^m_+} \ L_{\rho }(y^{k+1},t^{k+1},Z^{k+1},u;X^k,s^k)\end{aligned}$$
(8d)
$$\begin{aligned} X^{k+1}&= X^k + \rho \left( L-{{\,\mathrm{Diag}\,}}(y^{k+1})-{\mathcal {B}}^T(t^{k+1}) + Z^{k+1} \right) \end{aligned}$$
(8e)
$$\begin{aligned} s^{k+1}&= s^k + \rho \left( u^{k+1}-t^{k+1}\right) . \end{aligned}$$
(8f)
Let us closely look at the subproblems (8a)–(8f). The first order optimality condition for problem (8a) is
$$\begin{aligned} \nabla _y L_{\rho } = e +\rho y - \rho {{\,\mathrm{diag}\,}}(L-{\mathcal {B}}^T(t) + Z +X/ \rho ) = 0. \end{aligned}$$
Hence the computation of y is trivial
$$\begin{aligned} y = -\frac{e}{\rho } + {{\,\mathrm{diag}\,}}(L -{\mathcal {B}}^T(t)+Z+ X/\rho ). \end{aligned}$$
Similarly for (8b), we compute the gradient with respect to t
$$\begin{aligned} \nabla _t L_{\rho } = e - s - \rho (u-t) - \rho {\mathcal {B}}(L-{{\,\mathrm{Diag}\,}}(y) - {\mathcal {B}}^T(t) + Z + X/\rho ) = 0. \end{aligned}$$
Therefore, t is the solution of the following linear system
$$\begin{aligned} ({\mathcal {B}}{\mathcal {B}}^T + I)t = {\mathcal {B}}(L-{{\,\mathrm{Diag}\,}}(y)+Z+X/\rho ) +u + \frac{s-e}{\rho }. \end{aligned}$$
Note that the \({\mathcal {B}}{\mathcal {B}}^T + I\) matrix is sparse and positive definite. Hence the above linear system can be efficiently solved using the sparse Cholesky factorization.
By defining \(M = L - {{\,\mathrm{Diag}\,}}(y) - {\mathcal {B}}^T(t) + X/\rho \), the subproblem (8c) can be formulated as
$$\begin{aligned} {{\,\mathrm{arg\,min}\,}}_{Z \succeq 0} \left\| M+Z \right\| _F^2, \end{aligned}$$
where the solution
$$\begin{aligned} Z = (-M)_+ = -M_{-} \end{aligned}$$
(9)
is the projection of the \(-M\) matrix onto the positive semidefinite cone.
The subproblem (8d) can be written as
$$\begin{aligned} {{\,\mathrm{arg\,min}\,}}_{u \ge 0} \left\| u - t + s/\rho \right\| _2^2. \end{aligned}$$
It asks for the nonnegative vector that is closest to \(v = t -s/\rho \). Hence the solution is
$$\begin{aligned} u = \max (0, v) = v_+, \end{aligned}$$
the nonnegative part of vector v.
After all variables for the dual problem (7) have been updated in alternating manner, we compute primal variables X and s using the expressions (8e) and (8f). As already observed in the Boundary Point Method [28], the update rule for X can be simplified and also computed from the spectral decomposition of matrix M. Using (8e), we get
$$\begin{aligned} X^{k+1}&= X^k + \rho \left( L - {{\,\mathrm{Diag}\,}}(y^{k+1}) - {\mathcal {B}}^T(t^{k+1}) + Z^{k+1}\right) \\&= \rho \left( L - {{\,\mathrm{Diag}\,}}(y^{k+1}) - {\mathcal {B}}^T(t^{k+1}) + X^k/\rho + Z^{k+1}\right) \\&= \rho (M - M_-) = \rho M_+. \end{aligned}$$
Similarly, we can simplify the formula for (8f)
$$\begin{aligned} s^{k+1}&= s^k + \rho (u^{k+1}-t^{k+1}) \\&= \rho \left( u^{k+1} - (t^{k+1} - s^k/\rho )\right) \\&= \rho (v_+ - v) = -\rho v_-. \end{aligned}$$
Hence by construction, the matrix X is positive semidefinite and the vector s is nonnegative.
The overall complexity of one iteration of the method is solving a linear system with matrix \({\mathcal {B}}{\mathcal {B}}^T + I\) and computing the partial eigenvalue decompostion of M. Compared to interior-point methods where the coefficient matrix changes in each iteration, matrix \({\mathcal {B}}{\mathcal {B}}^T + I\) remains constant throughout the algorithm and its factorization can be cached at the beginning to efficiently solve the linear system in each iteration.
The difference between Algorithm 1 and the method proposed by Wen et al. [15] lies in the update rule of the dual variable t corresponding to inequality constraints. The authors directly apply ADMM on the dual of \((\text {MC}_{\scriptscriptstyle {\text {HYP}}})\). In our approach, we introduce the slack variable s resulting in an unconstrained optimization problem for variable t in (8b). This reduces the overall complexity of the algorithm by eliminating the need to solve a convex quadratic program of order m over the nonnegative orthant, especially since multiple hypermetric inequalities are added to strengthen the bound. Instead, a sparse system of linear equations is solved and a projection onto the nonnegative orthant is used.
Implementation
The above update rules ensure that during the algorithm, nonnegativity of vectors u and s, and conic constraints for matrices X and Z are maintained, as well as complementarity conditions \(u^Ts = 0\) and \(ZX = 0\). Hence, once primal and dual feasibility are reached, the method converges to the optimal solution. To measure the accuracy of primal and dual feasibility, we use
$$\begin{aligned} r_P&= \frac{\Vert {{\,\mathrm{diag}\,}}(X) - e\Vert _2 + \Vert \max \left( {\mathcal {B}}(x) - e, 0\right) \Vert _2}{1 + \sqrt{n}},\\[0.5em] r_D&= \frac{\Vert L - {{\,\mathrm{Diag}\,}}(y) - {\mathcal {B}}^T(t) + Z \Vert _F + \Vert u - t \Vert _2 }{1 + \Vert L \Vert _F}. \end{aligned}$$
We terminate our algorithm when \( \max \{r_P, r_D\} < \varepsilon , \) for prescribed tolerance \(\varepsilon > 0\).
The performance of the method is dependent on the choice of the penalty parameter \(\rho \). Numerical experiments show that for the problems we consider, the starting value of \(\rho = 1\) or \(\rho = 1.6\) is a good choice and the value is dynamically tuned during the algorithm in order to improve the practical convergence. A simple strategy to adjust the value of \(\rho \) is observing the residuals:
$$\begin{aligned} \begin{aligned} \rho ^{k+1} = {\left\{ \begin{array}{ll} \tau \rho ^k &{} \text { if } \log \left( \frac{r_D}{r_P}\right)> \mu \\ \rho ^k/\tau &{} \text { if } \log \left( \frac{r_P}{r_D}\right) > \mu \\ \rho ^k &{} \text { otherwise,} \end{array}\right. } \end{aligned} \end{aligned}$$
(10)
for some parameters \(\mu \) and \(\tau \). In our numerical tests, we use \(\mu = 0.5\) and \(\tau = 1.001\). The idea behind this penalty parameter update scheme is trying to keep the primal and dual residual norms in the same order of magnitude as they both converge to zero.
The computational time of our ADMM method is essentially determined by the number of partial eigenvalue decompositions and the efficiency of the sparse Cholesky solver, since these are the most computationally expensive steps. For obtaining positive eigenvalues and corresponding eigenvectors, we use the LAPACK [30] routine DSYEVR. For factoring the \({\mathcal {B}}{\mathcal {B}}^T+I\) matrix and then performing backsolves to get t, we use the sparse direct solver from CHOLMOD [31], a high performance library for the sparse Cholesky factorization. CHOLMOD is part of the SuiteSparse linear algebra package [32].
In the following two subsections, we elaborate on two potential issues when using ADMM, and how to resolve them. These are obtaining a safe upper bound, which can be used within the B&B algorithm, and the convergence of multi-block ADMM.
Safe upper bound
To safely use the proposed upper bound within the B&B algorithm, we need a certificate that the value of the dual function is indeed a valid upper bound for the original problem (1). To achieve this, the quadruplet (y, t, Z, u) has to be dual feasible, i.e. by assigning \(t \leftarrow u\), the equation
$$\begin{aligned} L - {{\,\mathrm{Diag}\,}}(y) - {\mathcal {B}}^T(u) + Z = 0 \end{aligned}$$
has to be satisfied. However, since we only approximately solve the primal-dual pair of semidefinite programs to some precision, the dual feasibility is not necessarily reached when the algorithm terminates. Note that variable y is unconstrained, whereas the conic conditions on u and Z are satisfied by construction. In the following, we describe the post-processing step we do after each computation of the bound.
Proposition 1
Let y, t, u, and Z be the output variables computed with iteration scheme (8a) – (8f). Let \(\lambda _{\min }\) denote the smallest eigenvalue of matrix \({\hat{Z}}:= {{\,\mathrm{Diag}\,}}(y) + {\mathcal {B}}^T(u) - L\). If \(\lambda _{\min } \ge 0\), then the value \(e^Ty + e^Tu\) is a valid upper bound for problem (1). Otherwise the value \(e^T\hat{y} + e^Tu\), where \(\hat{y} = y -\lambda _{\min } e\), provides an upper bound on the largest cut. Furthermore, it always holds that the value \(e^T\hat{y} + e^Tu\) is larger than \(e^Ty + e^Tu\).
Proof
After the ADMM method terminates, and by assigning \(t \leftarrow u\), the quadruplet (y, t, Z, u) satisfies all the constraints of (7) within machine accuracy, only the linear constraint
$$\begin{aligned} L - {{\,\mathrm{Diag}\,}}(y) - {\mathcal {B}}^T(u) +Z=0 \end{aligned}$$
may not be satisfied. By replacing Z with \({\hat{Z}}\), we satisfy this linear constraint, but the positive semidefiniteness of \({\hat{Z}}\) might be violated. If the smallest eigenvalue \(\lambda _{\min }\) of \({\hat{Z}}\) is nonnegative, then the new quadruplet \((y,t,\hat{Z},u)\) is feasible for (7) and the \(e^Ty + e^Tu\) value provides an upper bound on the size of the largest cut.
If \(\lambda _{\min }\) is negative, we adjust the matrix \({\hat{Z}}\) to be positive semidefinite by using
$$\begin{aligned} {\hat{Z}} \leftarrow {\hat{Z}} -\lambda _{\min } I. \end{aligned}$$
To maintain dual equality constraints, we correct the unconstrained variable as \(\hat{y} = y - \lambda _{\min } e\). It is obvious that the value \(e^T\hat{y} + e^Tu = e^Ty + e^Tu - n\lambda _{\min }\) is larger than \(e^Ty + e^Tu\). \(\square \)
We summarize the ADMM-based algorithm for solving the SDP relaxation \((\text {MC}_{\scriptscriptstyle {\text {HYP}}})\) in Algorithm 1.
Convergence of the method
It has been recently shown [33] that multi-block ADMM is not necessarily convergent. In the theorem presented below, we show that due to the special structure of operators in the case of semidefinite relaxation of Max-Cut, we can achieve a convergent scheme by reducing it to a 2-block method. For the sake of completeness, we include the proof of convergence. We also note that Chen et al. in [33] prove the same result in a more general setting.
Theorem 1
The sequence \(\left\{ \left( X^k, s^k, y^k, t^k, Z^k, u^k \right) \right\} \) generated by Algorithm 1 from any starting point \(\left( X^0, s^0, y^0, t^0, Z^0, u^0 \right) \) converges to solutions \(\left( X^*, s^*\right) \), \(\left( y^*, t^*, Z^*, u^* \right) \) of the primal-dual pair of semidefinite programs (6) and (7).
Proof
The convergence of our multi-block ADMM is guaranteed due to the orthogonality relations of the operators \({{\,\mathrm{diag}\,}}\) and \({\mathcal {B}}\) and their adjoints:
$$\begin{aligned} {\mathcal {B}}\left( {{\,\mathrm{Diag}\,}}(y)\right) = 0, \ \ {{\,\mathrm{diag}\,}}\left( {\mathcal {B}}^T(t)\right) = 0, \ \text { for any vectors } y \in {\mathbb {R}}^n \text { and } t \in {\mathbb {R}}^m. \end{aligned}$$
(11)
In this case the multi-block ADMM reduces to a special case of the original method (5). To see this, note that orthogonality in (11) implies that the first order optimality conditions for variables y and t in (8a) and (8b) reduce to
$$\begin{aligned} \nabla _y L_{\rho }&= e +\rho y - \rho {{\,\mathrm{diag}\,}}(L+ Z +X/ \rho ) = 0\\ \nabla _t L_{\rho }&= e - s - \rho (u-t) - \rho {\mathcal {B}}(L - {\mathcal {B}}^T(t) + Z + X/\rho ) = 0, \end{aligned}$$
meaning that y and t are independent and are jointly minimized by regarding (y, t) as one variable. Similarly, update rules for variables Z and u, as well as for primal pair X and s, ensure that they can also be jointly minimized. By separately regarding (Z, u) and (X, s) as one variable, the iterate scheme (8a) – (8f) can be written as:
$$\begin{aligned} \left( y^{k+1},t^{k+1}\right)&= {{\,\mathrm{arg\,min}\,}}_{(y,t) \in {\mathbb {R}}^n \times {\mathbb {R}}^m} \ L_{\rho } \left( (y,t), (Z^k,u^k);(X^k,s^k) \right) \\[0.5em] \left( Z^{k+1}, u^{k+1} \right)&= {{\,\mathrm{arg\,min}\,}}_{(Z,u) \in \mathcal {S}_n^+ \times {\mathbb {R}}^m_+} \ L_{\rho }\left( (y^{k+1},t^{k+1}),(Z,u);(X^k,s^k)\right) \\[0.5em] \left( X^{k+1}, s^{k+1} \right)&= \left( X^{k}, s^{k} \right) + \rho \left( L-{{\,\mathrm{Diag}\,}}(y^{k+1})-{\mathcal {B}}^T(t^{k+1}) + Z^{k+1}, u^{k+1}-t^{k+1} \right) . \end{aligned}$$
Thus the convergence of Algorithm 1 is implied by the analysis in [15], which looks at 2-block ADMM as a fixed point method. \(\square \)