MADAM: A parallel exact solver for Max-Cut based on semidefinite programming and ADMM

We present MADAM, a parallel semidefinite based exact solver for Max-Cut, a problem of finding the cut with maximum weight in a given graph. The algorithm uses branch and bound paradigm that applies alternating direction method of multipliers as the bounding routine to solve the basic semidefinite relaxation strengthened by a subset of hypermetric inequalities. The benefit of the new approach is less computationally expensive update rule for the dual variable with respect to the inequality constraints. We provide theoretical convergence of the algorithm, as well as extensive computational experiments with this method, to show that our algorithm outperformes current state-of-the-art approaches. Furthermore, by combining algorithmic ingredients from the serial algorithm we develop an efficient distributed parallel solver based on MPI.


Motivation
Max-Cut problem is a classical NP-hard optimization problem [15,22] on graphs with quadratic objective function and unconstrained binary variables. During the last decades it has attracted the interest of many researchers, from its theoretical and algorithmic perspective to its applicability in different fields, for instance mathematics, physics, and computer science [24,27,33]. Several exact algorithmic approaches have been proposed in the literature, including the solvers BiqMac [33] and BiqCrunch [25], which are among the best solvers and are based on semidefinite programming.
Lassere [26] has proved that Max-Cut problem can be considered as a canonical model of linearly constrained linear and quadratic 0/1 programs. The transformation is based on the exact penalty approach and was further explored and advanced in a new paper by Gusmeroli and Wiegele [18]. Even for instances of moderate size, it is considered a computational challenge to solve the Max-Cut to optimality. In practice we typically solve such problems only approximately by using a heuristic or an approximation algorithm [12,16]. However, to compare these algorithms and evaluate their performance, we still require the optimum solutions. Considering all this, solving increasingly large instances of Max-Cut to optimality on parallel computers is highly needed in scientific computing.

Problem formulation and notations
The central problem that we consider is the Max-Cut problem, which can be defined as follows. For a given undirected graph G = (V, E) on n = |V | vertices and with edge weights w e for e ∈ E, the Max-Cut problem asks to find a bipartition of the vertices such that the sum of the weights of the edges across the bipartition is maximum. Let A = (a ij ) denote the weighted adjacency matrix with a ij = a ji = w e for edge e = {i, j} ∈ E and zero otherwise. Encoding the partitions by vectors z = {0, 1} n , we obtain the following unconstrained binary quadratic optimization problem formulation for Max-Cut: where L 0 is the Laplacian matrix of the graph defined by L 0 = diag(Ae) − A. By e we denoted the vector of all ones. Note that due to the symmetry of the problem we can fix the last element of vector z to zero and thus removing the last row and column of L 0 , obtaining the matrix L 0 ∈ R (n−1)×(n−1) . The inner product on the space of symmetric matrices is given by X•Y = X, Y = tr(XY ) = i,j X ij Y ij and the associated Frobenious norm is defined by X F = X, X = i,j X 2 ij . Using the property of inner product z T L 0 z = L 0 , zz T we can reformulate problem (1) as: We increased the dimension of the problem by one, since the last column of the solution matrix of a semidefinite relaxation will be used for determining the next branching variable.
In the following, we denote by Diag(v) the diagonal matrix having v on its diagonal and diag(X) denotes the vector obtained by extracting the main diagonal from the matrix X. For given symmetric matrices A i , i = 1, . . . , m, let A : S n → R m denote the linear operator mapping n × n symmetric matrices to R m with A(X) i = A i , X . Its adjoint is well known to be A T (y) = i y i A i . We denote the projection of some symmetric matrix X onto the positive semidefinite cone by X + and its projection onto the negative semidefinite cone by X − . It is well known that both projections can be computed from the spectral decomposition of X = SΛS T , see, for instance [21]. Let S = (S + , S − ) be a partition of the eigenvectors in S according to positive and negative eigenvalues Λ = (Λ + , Λ − ). Then X = S + Λ + S T + + S − Λ − S T − = X + + X − .

Related work and our contribution
Over the decades many methods have been proposed for finding exact solutions of Max-Cut. Some of them are linear programming based methods [10,27], which work particularly well when the underlying graph is sparse. Other algorithms combine semidefinite programming with polyhedral approach [19] to strengthen the basic SDP relaxation with cutting planes. Two of them, the BiqMac [33] and BiqCrunch [25] solvers, turned out in the last decade to be the best performing solvers for Max-Cut, also compared to commercial solvers. Both solvers utilize branch and bound (B&B) paradigm, however, the distinction is in using different algorithms for solving the underlying SDP relaxation. Furthermore, they do not use any parallelization. For other recent computational approaches the reader is refered to [31]. Recently, we have developed with other authors a solver BiqBin [17] for the class of binary quadratic problems with linear constraints, which includes Max-Cut problem. This solver uses exact penalty approach to reformulate every instance of binary quadratic problem into an instance of Max-Cut problem and then solve it using enhanced version of BiqMac, where the underlying SDP relaxations are tight due to inclusion of hypermetric inequalities. Additionally, the B&B part of BiqBin was improved compared to BiqMac and the solver was fully parallelized. Extensive numerical evidence shows that BiqBin outperforms BiqMac, BiqCrunch, GUROBI and SCIP on the Max-Cut instances and also on some classes of binary quadratic problems.
The main contribution of this paper is introduction of the parallel exact solver MADAM for Max-Cut, based on the alternating direction method of multipliers (ADMM) which was introduced in [37]. More precisely, we • Adapt ADMM to solve SDP relaxations of Max-Cut with a subset of hypermetric inequalities included and provide a convergence proof. More specifically, we strengthen the basic SDP relaxation with triangle, pentagonal and heptagonal inequalities. The key idea in solving the underlying relaxation is to introduce a slack variable in order to eliminate solving a quadratic program over the nonnegative orthant when computing the dual variable corresponding to inequality constraints. Instead a sparse system of linear equations is solved and projection onto nonnegative orthant is performed.
• We propose a rounding procedure which rounds the (nearly feasible) solution of the dual problem obtained by ADMM to feasible solutions and therefore provides a valid upper bound for the optimum value of Max-Cut. The reason is that our adaption of ADMM has a better balance between the time needed to solve the SDP relaxation and the quality of the solution, compared to other solvers.
The main difference between MADAM, BiqMac and BiqBin is that MADAM applies ADMM to solve the underlying SDP relaxations, while BiqMac and BiqBin use the bundle method. Biq-Mac strengthens the basic SDP relaxation with triangle inequalities, whereas BiqBin and MADAM include also a subset of pentagonal and heptagonal inequalities, which make these relaxations harder to solve. Using ADMM in MADAM implies that in each B&B node we have a (nearly) feasible solution for the SDP relaxation available, which is not the case with the bundle method applied to the partial Lagrangian dual. This gives us more information on how to generate feasible solutions and how to branch. Furthermore, the obtained SDP bounds are tighter, and consequently the size of the explored B&B tree is smaller compared to other solvers. Parallelization of B&B in MADAM is very similar to parallelization of B&B in BiqBin, since both are based on load coordinator-workers paradigm and use MPI library for communication between the processes.
The rest of this paper is organized as follows. Section 2 deals with semidefinite relaxations of Max-Cut used in this paper. We also describe the class of cutting planes that we use. Description of other solution approaches that are based on semidefinite programming is given in Section 3. In Section 4 we outline our new ADMM algorithm, discuss implementation details and provide proof of convergence. Details on how proposed ADMM method is used within B&B framework are given in Section 5, while Section 6 presents numerical results of the serial algorithm. In Section 7 we give insights on how parallelization improves perfomance of MADAM solver. We conclude with computational experience of our method and present scalability results.
and therefore Observe that for any x = {−1, 1} n−1 the matrix xx T is positive semidefinite and its diagonal is equal to vector of all ones. Using the above transformation this allows us to write Max-Cut problem in −1/1 variables as where By dropping the rank-one constraint we obtain the basic SDP relaxation Interior-point methods (IPM) turned out to be the most prominent algorithms for solving SDPs. There exist several numerical packages to solve semidefinite programs, e.g., CSDP [2], MOSEK [30], SeDuMi [35] and SDPT3 [36]. Many of them are based on some variant of the primal-dual path following interior-point method, see for example [20].
Despite their expressive power we are limited to solving small and middle size SDPs in practice. As the size of problems and number of constraints grow, interior-point methods show poor scalability for large-scale problems. However, for special cases like the relaxation (3), interior-point method tailored for this problem scales well, since the matrix determining the Newton direction can be efficiently constructed [20].
The bound from (3) is not strong enough to be successfully used within a branch and bound framework in order to solve larger Max-Cut problems to optimality. By adding additional equality or inequality constraints, known as cutting planes, which are valid for all feasible solutions of (MC), we strengthen the upper bound. One such class of cutting planes, called triangle inequalities, is obtained as follows. Observe that for an arbitrary triangle with vertices i < j < k in the graph, any partition of vertices cuts either 0 or 2 of its edges. Moving to {−1, 1} model this leads to (see [19] and [33]). The inequalities are collected as B TRI (X) ≤ e and we end up with the following SDP relaxation: To exploit a stronger relaxation of (MCSDP), the bound is further strengthened by using pentagonal and heptagonal inequalities, which belong to the family of hypermetric inequalities valid for any matrix X from the convex hull of rank-one matrices xx T , for x ∈ {−1, 1} n . For every integer vector b such that e T b is odd, the inequality |x T b| ≥ 1 holds for all x ∈ {−1, 1} n . Therefore we have bb T , xx T ≥ 1. Hypermetric inequality specified by the vector b is the inequality bb T , X ≥ 1.
We consider the subset of hypermetric inequalities generated by choosing b with b i ∈ {−1, 0, 1} and by fixing the number of non-zero entries to 3, 5, or 7. In this cases, we obtain triangle, pentagonal, and heptagonal inequalities, respectively, and collect them as B(X) ≤ e. We obtain the following strengthening of (MCSDP): We now describe the routine for separating the proposed cutting planes. There are 4 n 3 triangle inequalities, but for given X we can enumerate all of them and identify the most violated ones. Due to a large number of higher order hypermetric inequalities, the separation of pentagonal and heptagonal inequalities is done heuristically. We use the idea proposed in BiqBin, where the separation problem is reformulated as a quadratic assignment problem of the form where Π is the set of all n × n permutation matrices and H determines the type of pentagonal or heptagonal inequality specified by the vector b. The problem is then approximately solved by using simulated annealing to obtain inequality with potentially large violation. The reader is referred to [17] for more details.
In this section we summarise other approaches for solving Max-Cut problem to optimality, especially the BiqMac [33], BiqBin [17] and BiqCrunch [25] solvers, which are widely considered as the most powerful solvers for unconstrained binary quadratic problems.

BiqMac and BiqBin
The starting point of BiqMac is the stengthened SDP relaxation (MCSDP). By dualizing only the triangle inequality constraints, the authors obtain the convex nonsmooth partial dual function where γ is the nonnegative dual variable. Evaluating the dual function and computing a subgradient amounts to solving an SDP of the form (3), which can be efficiently computed using an interior-point method tailored for this problem. The approximate minimizer of the dual problem is then computed using the bundle method [23]. In BiqBin the bound is further strengthened by also dualizing pentagonal and heptagonal inequalities, i.e. the bundle method is used to minimize the partial dual function of (MCHYP).

BiqCrunch
Due to diagonal constraint in (MCSDP) it can easily be proven that all feasible matrices of this problem satisfy the condition n 2 − X 2 F ≥ 0, in which the equality holds if and only if the matrix has rank 1. The idea of BiqCrunch is to add a multiple of this quadratic regularization term to the objective function. They obtain the following regularized problem: By using quadratic regularization the simplified dual problem subject to y free, z ≥ 0 produces a family of upper bounds dependent on the penalty parameter α. Note that the objective function is convex and differentiable, see [24]. The function value and gradient are evaluated by computing partial spectral decomposition. This enables that for fixed α the dual function is minimized using L-BFGS-B algorithm [5] from the family of quasi-Newton methods.
Reducing the penalty parameter α increses the tightness of the upper bound. This is due to decreased impact of regularization term in the primal objective function of (4).
Several algorithmic alternatives to interior-point methods have been proposed in the literature to solve semidefinite programs [4,13,29]. Alternating direction method of multipliers (ADMM) has been studied over the last decades due to its wide range of applications in a number of areas and capability of solving large-scale problems [3,32,37]. ADMM is an algorithm that solves convex optimization problems by breaking them into smaller pieces which are easier to handle. This is achieved by alternating optimization of augmented Lagrangian function. The method deals with the following problems with only two blocks of functions and variables: Let be the augmented Lagrangian function of (5) with the Lagrange multiplier y and a penalty paramter ρ > 0. Then ADMM consists of iterations For details the reader is refered to [3] and references therein. ADMM has already been successfully applied for solving semidefinite relaxations of combinatorial optimization problems. The Boundary Point Method [32] is an augmented Lagrangian method applied to the dual SDP in standard form. In the implementation the alternating optimization is used since in the inner loop only one iteration is performed. The method is currently one of the best algorithms for computing the theta number of a graph [28]. However, the drawback of the method is that it can only solve equality constrained semidefinite programs. Wen et al. [37] present ADMM for general semidefinite programs with equality and inequality constraints. Compared to our approach the drawback of their method is in solving the quadratic program over nonnegative orthant when computing the dual variable corresponding to inequality constraints. Furthermore, they use multi-block ADMM for which the proof of convergence is not presented.

ADMM method for (MCHYP)
By introducing the slack variable s the problem (MCHYP) is equivalent to One can easily verify that its dual problem can be written as where y and t are dual variables associated respectively with the equality constraints, whereas u and Z are dual multipliers to the conic constraints. Let m denote the number of hypermetric inequalities. For fixed ρ > 0 consider the augmented Lagrangian L ρ for (8): Alternating direction method of multipliers for problem (8) consists of alternating minimization of L ρ with respect to one variable while keeping others fixed to get y, Z, t and u. Then the primal variables X and s are updated using the following rules: Let us closely look at the subproblems (9a)-(9f). The first order optimality condition for problem (9a) is Hence the computation of y is trivial Similarly for (9b), we compute the gradient with respect to t Therefore t is the solution of the following linear system Note that the matrix BB T + I is sparse and positive definite. Hence the above linear system can be efficiently solved using sparse Cholesky factorization. By defining M = L − Diag(y) − B T (t) + X/ρ, the subproblem (9c) can be formulated as is the projection of the matrix −M onto the positive semidefinite cone. The subproblem (9d) can be written as It asks for the nonnegative vector that is closest to v = t − s/ρ. Hence the solution is the nonnegative part of vector v. After all variables for the dual problem (8) have been updated in alternating manner, we compute primal variables X and s using the expressions (9e) and (9f). As already observed in the Boundary Point Method [32], the update rule for X can be simplified and also computed from the spectral decomposition of matrix M . Using (9e) we get Similarly, we can simplify the formula for (9f) Hence by construction the matrix X is positive semidefinite and the vector s is nonnegative.
The overall complexity of one iteration of the method is solving a linear system with matrix BB T + I and computing the partial eigenvalue decompostion of M . Compared to interior-point methods where coefficient matrix changes in each iteration, matrix BB T + I remains constant throughout the algorithm and its factorization can be cached at the beginning to efficiently solve linear system in each iteration.
Difference between our algorithm, described in Algorithm 1, and the method proposed by Wenn et al. [37] is in the update rule of the dual variable t corresponding to inequality constraints. The authors directly apply ADMM on the dual of (MCHYP). In our approach we introduced slack variable s resulting in an unconstrained optimization problem for variable t in (9b). This reduced the overall complexity of the algorithm by eliminating the need to solve a convex quadratic program of order m over nonnegative orthant, especially since multiple hypermetric inequalites are added to strengthen to bound. Instead a sparse system of linear equations is solved and projection onto nonnegative orthant is used.

Implementation
The above update rules ensure that during the algorithm nonnegativity of vectors u and s, and conic constraints for matrices X and Z are maintained, as well as complementarity conditions u T s = 0 and ZX = 0. Hence, once the primal and dual feasibility are reached, the method converged to the optimal solution. To measure the accuracy of primal and dual feasibility we use We terminate our algorithm when max{r P , r D } < ε, for prescribed tolerance ε > 0.
The performance of the method is dependent on the choice of the penalty parameter ρ. Numerical experiments show that for the problems we considered the starting value of ρ = 1 or ρ = 1.6 is a good choice and the value is dynamically tuned during the algorithm in order to improve the practical convegence. Simple strategy to adjust the value of ρ is observing the residuals: for some parameters µ and τ . In our numerical tests we have used µ = 0.5 and τ = 1.001. The idea behind this penalty parameter update scheme is to try to keep the primal and dual residual norms in the same order of magnitude as they both converge to zero.
The computational time of our ADMM method is essentially determined by the number of partial eigenvalue decompositions and efficiency of sparse Cholesky solver, since these are the most computationally expensive steps. For obtaining positive eigenvalues and corresponding eigenvectors we use LAPACK [1] routine DSYEVR. For factoring the matrix BB T + I and then performing backsolves to get t we use sparse direct solver from CHOLMOD [7], a high performance library for sparse Cholesky factorization. CHOLMOD is part of the SuiteSparse linear algebra package [9].
In the following two subsections we elaborate on two potential issues when using ADMM and how to resolve them. These are obtaining safe upper bound which can be used within B&B algorithm and the convergence of multi-block ADMM.

Safe upper bound
To safely use the proposed upper bound within B&B algorithm we need a certificate that the value of the dual function is indeed a valid upper bound for the original problem (1). To achieve this the quadruplet (y, t, Z, u) has to be dual feasible, i.e. by assigning t ← u, the equation has to be satisfied. However, since we only approximately solve primal-dual pair of semidefinite programs to some precision, the dual feasibility is not necessarily reached when the algorithm terminates. Note that the variable y is unconstrained, whereas the conic conditions on u and Z are satisfied by construction. In the following we describe the post-processing step we do after each computation of the bound, provided the pruning condition is satisfied. Proposition 1. Let y, t, u and Z be the output variables computed with iteration scheme (9a) -(9f). Let λ min denote the smallest eigenvalue of matrixẐ := Diag(y) + B T (u) − L. If λ min ≥ 0, then the value e T y + e T u is a valid upper bound for problem (1). Otherwise the value e Tŷ + e T u, whereŷ = y − λ min e, provides an upper bound on the largest cut. Furthermore, it always holds that the value e Tŷ + e T u is larger than e T y + e T u.
Proof. After ADMM method terminates and by assigning t ← u, the quadruplet (y, t, Z, u) satisfies all the constraints of (8) within machine accuracy, only the linear constraint may not be satisfied. By replacing Z withẐ we satisfy this linear constraint, but the positive semidefiniteness ofẐ might be violated. If the smallest eigenvalue λ min ofẐ is nonnegative, then new quadruplet (y, t,Ẑ, u) is feasible for (8) and the value e T y + e T u provides an upper bound on the size of the largest cut.
If λ min is negative, we adjust the matrixẐ to be positive semidefinite by usinĝ To maintain dual equality constraints, we correct the unconstrained variable asŷ = y − λ min e. It is obvious that the value e Tŷ + e T u = e T y + e T u − nλ min is larger than e T y + e T u.
The last statement of Proposition 1 guarantees that the computation of correction vector y is needed, only when we can prune the B&B node with the bound e T y + e T u, because the post-processing step in the case of a negative eigenvalue produces the bound which is always larger.
We now summarize the ADMM-based algorithm for solving SDP relaxation (MCHYP) of Max-Cut. In the view of the previous observations, the algorithm is described in Algorithm 1.

Convergence of the method
It has been recently shown [6] that multi-block ADMM is not necessarily convergent. In the following theorem we prove that due to the special structure of operators in the case of semidefinite relaxation of Max-Cut we can achive convergent sheme by reducing it to a 2-block method. Theorem 1. The sequence X k , s k , y k , t k , Z k , u k generated by Algorithm 1 from any starting point X 0 , s 0 , y 0 , t 0 , Z 0 , u 0 converges to a solutions (X * , s * ), (y * , t * , Z * , u * ) of primal-dual pair of semidefinite programs (7) and (8).
Proof. The convergence of our multi-block ADMM is guaranteed due to the orthogonality relations of the operators diag and B and their adjoints: B (Diag(y)) = 0, diag B T (t) = 0, for any vectors y ∈ R n and t ∈ R m .
In this case the multi-block ADMM reduces to a special case of the original method (6). To see this, note that orthogonality in (12) implies that the first order optimality conditions for variables y and t in (9a) and (9b) reduce to ∇ y L ρ = e + ρy − ρ diag(L + Z + X/ρ) = 0 meaning that y and t are independent and are jointly minimized by regarding (y, t) as one variable. Similarly, update rules for variables Z and u, as well as for primal pair X and s,

Algorithm 1: ADMM method for solving SDP relaxation (MCHYP) of Max-Cut
Output: Upper bound on the largest cut of a graph. Select ρ > 0, ε > 0; k = 0, u k = 0, s k = 0, X k = 0, Z k = 0; Compute sparse Cholesky factorization of matrix BB T + I; for k = 0, 1, . . . do Compute y k+1 = − e ρ + diag(L + Z k + X k /ρ) ; Solve for t k+1 by using cached factorization: Update penalty parameter ρ according to (11). Post-processing: else return e T y + e T u − nλ min (Ẑ); ensure that they can also be joinly minimized. By separately regarding (Z, u) and (X, s) as one variable, the iterate scheme (9a) -(9f) can be written as: Thus the convergence of Algorithm 1 is implied by the analysis in [37], that looks at 2-block ADMM as a fixed point method.

Branch and bound
The prominent bounds obtained by using ADMM method for solving (MCHYP) motivated us to use this method within a branch and bound framework. We call the outcome of this work MADAM solver -Max-Cut Alternating Direction Augmented Lagrangian Method.
The Branch and Bound algorithm (B&B) is one of the standard enumerative methods for computing the global optimum of a NP-hard problem. It consists of the following ingredients: • the bounding procedure, which provides for each instance of a problem an upper bound for the optimum value; • the branching procedure, which splits the current problem into more problems of smaller dimensions by fixing some variables; • a heuristic for generating feasible solution providing a lower bound.
Bounding routine. The bounding routine of MADAM applies the ADMM algorithm, described in Algorithm 1, which for a given set of triangle, pentagonal and heptagonal inequalities minimizes the dual function (8). In order to obtain tight upper bound we use the cutting-plane aproach where multiple hypermetric inequalities are iteratively added and purged after each computation of upper bound. First, the optimal solution of the basic semidefinite relaxation (3) is computed using interior-point method. Then we separate the initial set of triangle inequalities. After the minimizer of problem (8) is obtained, we purge all inactive constraints and new violated triangles are added. The problem with updated set of inequalities is solved and the process is iterated as long as the decrease of the upper bound is sufficiently large. After separating the most violated triangle inequalities, we add pentagonal and heptagonal inequalitites in order to further decrease the upper bound. We monitor the maximum violation of triangle inequalities r TRI and as soon as the number is sufficiently small, we use the heuristic from Section 2 to add some strongly violated pentagonal inequalities to the relaxation. Similarly, as the maximum violation of pentagonal inequalities r PENT drops bellow some threshold, new heptagonal inequalities are separated and added to the relaxation. In our numerical tests, we have used the thresholds r TRI < 0.2 and r PENT < 0.4 and each time separated at most 10 · n violated triangle inequalities. For pentagonal and heptagonal inequalities different strategy is used. The initial number of these cutting planes is small, but each time the node is not pruned, this number is increased by 200. A description of our bounding routine is in Algorithm 2.
Algorithm 2: Semidefinite bounding procedure of MADAM solve the basic sdp relaxation (3); separate initial set of triangle inequalities; while upper bound decreases significantly do use ADMM method to obtain approximate maximizer X of (MCHYP); remove hypermetric inequalities that are not active; add new hypermetric inequalities that are violated; In order to improve the performance of MADAM we can stop the bounding routine, when we detect that we will not be able to prune the current B&B node. BiqCrunch terminates the bounding routine if the difference between consecutive bounds is less than some prescribed parameter and if the gap between upper and lower bound is still large. In MADAM we borrow an idea from BiqMac. After some cutting plane iterations, we make a linear forecast to decide whether it is worth doing more iterations. If the gap cannot be closed, we terminate the bounding routine, branch the current node, and start evaluating new subproblems.
Branching rules. In MADAM we use two branching strategies which are based on the approximate solution matrix X obtained from Algorithm 1. Once the ADMM method terminates, the last column of X, with exception of the last element, is extracted. Due to the diagonal constraint and positive semidefinitness of feasible matrices of (MCHYP) all the entries lie in the inverval [−1, 1]. By using the tranformation z = 1 2 (x + e), a vector with entries in [0, 1] is obtained. We can follow different strategies based on the information we get from z. Similarly to BiqCrunch and BiqBin, the decision on which variable z i to branch on is based on the following two strategies: i most-fractional: we branch on the vertex i for which the variable z i is closest to 0.5; ii least-fractional: we branch on the vertex i for which the variable z i is furthest from 0.5; With this we create two new subproblems, one where z i is fixed to 0 and the other where it is fixed to 1. The subproblems correspond to nodes in the B&B tree and are added to the priority queue of unexplored problems. Priority is based on the upper bound computed by the ADMM method. When selecting the next subproblem, a node with the worst upper bound is evaluated first.
Generating feasible solutions. For generating high quality feasible solutions of Max-Cut problem we use the optimum solution X of (MCHYP) in Goemans-Williamson rounding hyperplane technique [16]. By using factorization X = V ⊤ V with column vectors v i of V and selecting some random vector r, one side of the cut is obtained as {i | v ⊤ i r ≥ 0}. The cut vector x obtained from this heuristic is then further improved by flipping the vertices. Note that in our case the factorization of X is already available from the bounding routine, which computes an eigenvalue decomposition of matrix M . To summarize, for generating good cuts we use the following scheme: i Use the Goemans-Williamson rounding hyperplane technique to generate cut vector x from minimizer of (MCHYP).
ii The cut x is locally improved by checking all possible moves of a single vertex to the opposite partition block.
The process is repeated several times with different random vectors v, due to its low computational cost (in practice we repeat it n times, where n is the size of the input graph). Numerical experiments show that by using the proposed heuristic the optimum solution is usually found already in the root node.
Strategy for faster enumeration of the B&B tree. To obtain tight upper bounds on maximum cuts we strengthen the basic SDP relaxation wih a subset of hypermetric inequalities. Adding these cutting planes to the model and solving the relaxation in each B&B node is computationally expensive and not necessary, especially if one can not prune the node. We use the strategy proposed in BiqBin, where we check (before including the hypermetric inequalities) whether we will be able to prune the current node or whether it is better to branch. Firstly, in the root node we cache the bound OP T SDP of the basic SDP relaxation (3) and compute the bound OP T HYP by iteratively addding violated hypermetric inequalities. Let dif f = OP T SDP − OP T HYP denote the difference between optimal values of both relaxations and let lb denote the current lower bound. Secondly, at all other nodes, we only compute the basic SDP relaxation. If the condition is satisfied, meaning we are close enough to the lower bound, we estimate that by adding the cutting planes to compute the tighter bound OP T HYP we will prune the node. This idea helps efficiently traverse the B&B tree and we only invest time into the bounding routine when it is really needed. Numerical experiments show that overall this strategy produces more B&B nodes than necessary, but the performance of the algorithm is immensly improved, since in the first few levels of the B&B tree only the basic SDP relaxation needs to be computed.
We compare our serial version of MADAM with three state-of-the-art semidefinite based exact solvers for Max-Cut, BiqMac [33], BiqCrunch [25] and BiqBin [17]. All solvers are written in C programming language. Several graphs available in the BiqMac library [39], with 100 vertices were used to test the algorithms. For BiqMac and BiqCrunch we used their default settings, except each instance is solved with both branching rules that the solvers provide. The one that produced faster time of execution is used in the tables presenting numerical results. The most-fractional rule is set as the default strategy for BiqBin and MADAM solver. We have run the algorithm with different penalty parameter values ρ found in the literature [37]. The initial value of penalty parameter is set to 1.6, while for dense graphs from families w05, w09, pw05 and pw09 this value was decreased to 1, since we found this works well in our code. In each table we report the needed number of B&B nodes and running time to solve the problem to optimality.
All the computations were done on a cluster consisting of 24-core nodes featuring two Intel Xeon E5-2680V3 running at 2.5 GHz with 64 GB of RAM. The code is compiled against OpenBLAS and LAPACK. Observe that our solver completely outperformes other approaches. For many instances we significantly reduce the running time of the current best solver. The reason for efficiency is simplicity and speed of the proposed bounding routine.

Parallel algorithm
In this section we describe how the algorithmic ingredients from the serial B&B algorithm are combined into a parallel solver which utilizes distributive memory parallelism. OpenMP (Open Multi-Processing) and MPI (Message Passing Interface) are two widely used paradigms available for parallel computing [8,14]. Algorithms based on B&B can be parallelized on different levels. For example, shared memory parallelization using OpenMP can be applied to speed up the computation of bounds, and secondly, exploration of different branches of B&B tree can be done in parallel using distributive scheme. In this case several workers, each with its own memory, are evaluating B&B nodes concurrently and using MPI to share important informations via messages. This is the approach we have taken.
Load coordinator-worker paradigm with distributed work pools is applied, in which one process becomes the master process carefully managing the status of each worker, while remaining processes concurrently explore branches of the B&B tree. Each worker has its own local priority queue of subproblems and work is shared when one of them becomes idle. The master process keeps track of the status of each worker and acts as a load coordinator receiving messages and,  based on their content, acts in appropriate manner. In order to distinguish between different workers, each process is given a fundamental identifier, called rank of the worker. At the start of the algorithm the master process reads and broadcasts the graph instance to the workers. It is important that every process has the knowledge of the original graph, since construction of subproblems via branching or via received MPI messages is done based on this information. All the data about the B&B nodes is encoded as MPI structure, which is used in communication between different workers in order to efficiently exchange and construct the subproblems.
Next, the load coordinator evaluates the root node and distributes the current best lower bound. After the bounding step two new subproblems are generated, which are then sent to the first two idle processes. Afterwards, load coordinator monitors the status of workers (idle or busy), counts the number of B&B nodes, sends the ranks of free workers and distributes the best solution found so far.
After initialization phase the load coordinator waits for different types of messages sent by the workers. Firstly, if the worker's local queue is empty, the message is sent informing the master that the process is idle and it can receive further work. Secondly, during the algorithm working processes generate multiple candidates for optimal solutions. The master node is keeping the track of current best optimal value and solution. When a new solution is received, the value is compared, updated if necessary and distributed back during communication phase. And thirdly, the master process helps the computing processes with work sharing by sending messages containing the ranks of idle workers. After a worker computes lower and upper bound, it compares these values to see if this branch of B&B tree can be safely pruned or further branching is needed and construction of new subproblems takes place. In the latter case, request message is sent to the load coordinator, asking for idle processes to share with one of the new generated subproblems and/or subproblems left in the queue from previous branching processes. If no idle worker is available, the generated subproblems remain in the worker's queue and the work continues locally. Otherwise, subproblems are encoded and sent   to available idle workers. This is also where exchange of the current best lower bound happens. When all the workers become idle (all local queues are empty), master process sends message to finish and the algorithm terminates.
Lastly, we elaborate on how effficient distribution of the subproblems to the workers is achieved and how the idle time is reduced. If the number of available workers is large, we need to reach a certain depth of the B&B tree in order for the processes to receive some work. Until this happens in the algorithm, workers are idle and resources are wasted. To fully exploit all the available resources we need a strategy for the worker processes to start evaluating the nodes of B&B tree as soon as possible. This is again where the strategy using the variable dif f from Section 5 benefits the algorithm. After computation of the upper bound in the root node, the load coordinator distributes the value dif f to the workers. When the first two idle processes evaluate the generated subproblems, typically the value of the basic SDP relaxation is such that condition (13) is not satisfied. Thus, on the first few levels of the B&B tree, the workers compute only the basic SDP bound to faster evaluate the node and idle processes quickly receive the generated subproblems.

Numerical results -parallel algorithm
We present numerical results comparing serial with parallel version of MADAM solver. As a Message Passing Interface library we use Open MPI to exchange the data between the computing cores. Among the graphs that are used to test the sequential algorithm, we consider only those for which the solver needs more than 200 seconds to compute the optimal solution. We solve the same problem but with increasing number of cores and compare times against the serial algorithm. The results are reported in Table 5.
For the hardest instance w09_100.1 for the serial solver we also present scalability plot to demonstrate the efficiency of parallel version. Due to increased number of workers, the computational time drops, however, larger number of workers also increases communication costs, since multiple request are sent to the load coordinator. This is shown in Figure 1 as the curve diverges from ideal scaling curve.  Next, we use rudy [34], a machine independent graph generator, to construct new Max-Cut instances with 150 vertices, similarly to the ones in BiqMac library. The parameters for generating the graphs can be found in [38] in Appendix B. Table 6 presents numerical results obtained by applying parallel solver using 240 cores for all instances. We report the maximum cut value and the number of B&B nodes, as well as the time needed to solve the problem to optimality. Furthermore, the number in parantheses next to the nodes specifies the maximum number of used workers. This number is the upper bound on the number of cores we should reserve in order to fully use available resources. Any number of cores beyond makes some of the workers idle and the resources are wasted.

Conclusions and future work
In this paper we present an efficient exact solver MADAM for Max-Cut problem that applies alternating direction method of multipliers to efficiently compute quality SDP based upper bounds. This is due to the small computational complexity per iteration of the proposed method, since it essentially consists of solving one sparse system of linear equations and projection onto nonnegative orthant and positive semidefinite cone. Furthermore, tight upper bounds are obtained due to inclusion of a subset of hypermetric inequalities. Numerical results show that MADAM solver outperforms current state-of-the approaches. By combining the elements of the serial algorithm and the new strategy for faster exploration of B&B tree we develop parallel solver based on MPI that utilizes load coordinator-worker paradigm. We show that MADAM solver scales very well and that we can greatly reduce the time needed to solve the Max-Cut problem to optimality and increase the size of instances that can be solved in a routine way.
In the future we intent to increase the performance of the algorithm by using one-sided MPI communication, thus enabling to completely remove the master process from the algorithm and more efficient exchange of the messages.