1 Introduction

The Lasserre hierarchy (Lasserre 2001) is a powerful relaxation for polynomial optimization problems (POPs). Its power comes from the fact that, in practice, the optimal value of the hierarchy converges faster than theoretical results indicate. In practice, the first and second levels often provide a good solution of the original POP. Along with the efficiency of interior point methods to solve semidefinite programming problems, the Lasserre hierarchy can, in theory, solve POPs in polynomial time. However, not even the second hierarchy level can be solved by interior point methods for medium or large problems. This paper explores the idea of partially strengthening the first level relaxation with a subset of the second level relaxations constraints for sparse instances of the classical Max-Cut problem. This idea was originally used in the Optimal Power Flow problem in Josz and Molzahn (2018) in what the authors called a multi-order relaxation. Similarly, in Chen et al. (2020) the first order of the sparse Lasserre hierarchy is strengthened to calculate upper bounds of the Lipschitz constant for neural networks by only adding second order information for constraints that do not destroy the sparsity pattern of the POP; and Pál and Vértesi (2009) uses intermediate levels of a nonconmutative generalisation of the Lasserre hierarchy to find upper limits for the maximum quantum violation of the Bell inequalities. In this paper, and in the context of the Max-Cut problem, we use the basic multi-order approach (which is equivalent to what we called the partial relaxation), proposed an augmented version, as well as different heuristics to improve the bounds of the first order level of the original Lasserre hierarchy. As the integrality constraints of the Max-Cut problem do not affect the sparsity pattern of the POP, the sparse hierarchy depends exclusively on the sparsity of the graph, and the strengthening of the first order relaxation is done by restricting the size and/or selecting a subset of the maximal cliques used in the second order relaxation. After we pre-printed this paper, Chen et al. (2022) generalized (i) the ideas proposed in this paper and Josz and Molzahn (2018), and (ii) created what they call the sublevel hierarchy (which includes the heuristics used in Chen et al. (2020) as a particular case). This hierarchy provide intermediate levels of the SDP relaxations for any arbitrary level d. The authors use the sublevel hierarchy as well as some of the heuristics proposed in this paper, and provide numerical results showing how these techniques can also be useful outside the context of the Max-Cut problem.

Given a graph \(G = (V,E)\) with nodes \(V=\{v_1,\dots , v_n\}\), a set of edges \(E=\{(i,j): 1\le i,j \le n, \text { if }i\text { is connected to }j\}\) and a symmetric matrix \(W \in S^n\) with value \(w_{i,j} \ne 0\) in position (ij) if \((i,j) \in E\) and 0 otherwise, the Max-Cut problem can be written as the integer program:

$$\begin{aligned} \begin{aligned} f^\star =&\max _{x \in {\mathbb {R}}^n} \; \sum _{(i,j) \in E} w_{i,j}\left( \frac{1-x_ix_j}{2}\right) \\&\;\;\; \text {s.t. } x_i \in \{-1,1\}, \;i=1,2,\dots ,n. \end{aligned} \end{aligned}$$
(1)

Approaches for rigorously computing upper bounds for the Max-Cut problem include linear, semidefinite programming (SDP), convex quadratic and second order cone relaxations. These relaxations may be strengthened with cuts, e.g. triangle or cycle inequalities, or a branch-and-bound algorithm (Barahona et al. 1989; Barahona and Ladanyi 2006; Billionnet and Elloumi 2007; Fischer et al. 2006; Kim and Kojima 2001). Codes like Biq Mac (Rendl et al. 2010) and BiqCrunch (Krislock et al. 2014, 2017), which implement the SDP relaxation together with triangle inequalities, successfully solve difficult Max-Cut instances. The SDP relaxation corresponds to the first Lasserre hierarchy level (Lasserre 2001). Alternative Lasserre hierarchy versions are relevant for sparse graphs (Lasserre 2006; Waki et al. 2006). These alternatives only add a subset of the dense relaxation and thereby reduce the SDP size considerably at every hierarchy level.

In our computational experience, the sparse SDP relaxation of reasonably-sized Max-Cut instances is still too big to be solved at the second hierarchy level or higher. So, our numerical experiments explore different heuristics to add second order information to the standard sparse SDP relaxation of Waki et al. (2006). We show the advantages and limitations of this partial relaxation by comparing with (i) state-of-the-art Max-Cut solver BiqCrunch and (ii) CS-TSSOS (Wang et al. 2020b), a recently-developed sparse POP hierarchy. The results indicate that, for sufficiently sparse problems, there is rich information to be added to strengthen the first order relaxation without using the entire second hierarchy level. The partial Lasserre relaxation is particularly useful if the maximal cliques of a chordal extension of the graph’s correlative sparsity matrix are small.

Previous work considered block-diagonal relaxations between the first and second order relaxations (Gvozdenović and Laurent 2008; Gvozdenović et al. 2009). Specific to Max-Cut, Wiegele (2006) proposes a submatrix of the second order relaxation constraint to solve Max-Cut problems for dense graphs. Similarly, works like Adams et al. (2015) and Ghaddar et al. (2016) construct hierarchies by strenghtening the first level of the Lasserre hierarchy. But our paper is, to the best of our knowledge, the first to computationally study sparse versions of the Lasserre hierarchy to form intermediate relaxations between levels. Our work also resembles prior work approximating semidefinite relaxations with linear cutting planes (Baltean-Lugojan et al. 2018; Qualizza et al. 2012; Saxena et al. 2011; Sherali et al. 2012) or nonlinear cutting surfaces (Dong 2016). These works developing cutting surfaces typically approximate the first order of the Lasserre hierarchy, while we relax the second order (Baltean-Lugojan et al. 2018; Qualizza et al. 2012; Saxena et al. 2011; Sherali et al. 2012). Like Baltean-Lugojan et al. (2018), we select submatrices of a semidefinite relaxation. Differently to Baltean-Lugojan et al. (2018), we use clique patterns in the Max-Cut graph to select submatrices (rather than data-driven methods).

It is important to note that there are efficient linear methods to solve sparse Max-Cut problems. For example, the linear branch-and-cut algorithm of Liers et al. (2004) is faster than Biq Mac when solving the Max-Cut for sparse toroidal grid graphs (Rendl et al. 2010). This paper uses Max-Cut because it is a well-studied problem, but the ideas explored here apply to any sparse POP. Finally, the ideas studied in this paper could also be used in other sparse semidefinite relaxations like CS-TSSOS (Wang et al. 2020b.

The rest of the paper is structured as follows. Section 2 provides the notation. Section 3 introduces the dense and sparse hierarchies. Sections 4 and 5 contain our numerical experiments and the conclusions, respectively.

2 Notation

The monomial \(x_1^{\alpha _1} x_2^{\alpha _2} \dots x_n^{\alpha _n}\) will be denoted by \({\mathbf {x}}^{{\varvec{\alpha }}}\), where \({\varvec{\alpha }}= [\alpha _1,\alpha _2,\dots ,\alpha _n] \in {\mathbb {N}}^n\). If \(\phi \subset \{1,2,\dots ,n\}\) and d a positive integer, then \({\mathbb {A}}^{\phi }_d = \{{\varvec{\alpha }}\in {\mathbb {N}}^n:\alpha _i = 0 \text { if } i \notin \phi , \sum _i \alpha _i \le d\}\), and \(u_{d}(x,\phi )\) is the vector containing all the monomials \({\mathbf {x}}^{\varvec{\alpha }}\) such that \({\varvec{\alpha }}\in {\mathbb {A}}^{\phi }_d\), e.g. if \(\phi = \{2,4\}\), \(n=4\) and \(d=2\), \({\mathbb {A}}^{\phi }_d=\{[0,0,0,0],[0,1,0,0],[0,0,0,1],[0,2,0,0],[0,1,0,1],[0,0,0,2]\}\), \(u_{d}(x,\phi ) = [1,x_2,x_4,x_2^2,x_2x_4,x_4^2]^\top \).

If \(A \in {\mathbb {R}}^{m \times n}\) is a matrix, then the element in position (ij) will be denoted by \(A_{i,j}\) (if \(m=1\) or \(n=1\), the \(i^{th}\) element of the vector will be denoted by \(A_i\)). Likewise, if \(A, B \in {\mathbb {R}}^{m \times n}\), we will use the Frobenius inner product \(\left\langle A,B \right\rangle = \sum _{1\le i \le m}\sum _{1\le j \le n} A_{i,j} B_{i,j}\) and its induced norm \(\Vert A \Vert ^2 = \left\langle A,A \right\rangle \). \(\text {Diag}(x_1,x_2,\dots ,x_n)\) is the function returning a diagonal matrix of dimensions \(n \times n\) with \(x_i\) in the entry (ii) for \(i=1,2,\dots ,n\). The set of symmetric matrices will be denoted by \({\mathcal {S}}\), and for any matrix \(X \in {\mathcal {S}}\), \(X \succeq 0\) (\(\succ 0\)) means that X is positive semidefinite (resp., definite). \(e \in {\mathbb {R}}^n\) will denote the vector of ones, and if \(i \in {\mathbb {N}}\), \(e_i \in {\mathbb {R}}^n\) is a vector with one in position \(i^{th}\) and zeros everywhere else. Finally, the cardinality of any set \(\phi \) will be denoted by \(|\phi |\), the set \(\{1,2,\dots ,n\}\) will be written as [n], and \(\left\lceil {x} \right\rceil \) is the smallest integer such that \(\left\lceil {x} \right\rceil \ge x\) for any \(x \in {\mathbb {R}}\) .

3 Lasserre dense, sparse and partial relaxations

The Lasserre hierarchy can be obtained by lifting the monomials of the POP. Replace the objective of Problem (1) by \(\frac{1}{4}\left\langle L,[x_1,x_2,\dots ,x_n]^\top [x_1,x_2,\dots ,x_n] \right\rangle \), where L is the Laplacian matrix \(L = \text {Diag}(We)- W\); and the integer constraint \(x_i \in \{-1,1\}\) with the equivalent equation \(x_i^2 -1 = 0\). Problem (1) can be written:

$$\begin{aligned} \begin{aligned}&\max _{x \in {\mathbb {R}}^n} \; \frac{1}{4}\left\langle L,[x_1,x_2,\dots ,x_n]^\top [x_1,x_2,\dots ,x_n] \right\rangle \\&\;\;\; \text {s.t. } x_i^2-1=0, \;i=1,2,\dots ,n. \end{aligned} \end{aligned}$$
(2)

Using the fact that \(zz^\top \succeq 0\) for any vector z, we can add the redundant constraints \(M_d^{[n]}(x) =u_{d}(x,[n])u_{d}(x,[n])^\top \succeq 0\) to obtain:

$$\begin{aligned} \begin{aligned}&\max _{x \in {\mathbb {R}}^n} \; \frac{1}{4}\left\langle L,[x_1,x_2,\dots ,x_n]^\top [x_1,x_2,\dots ,x_n] \right\rangle \\&\;\;\; \text {s.t. } x_i^2-1=0, \;i=1,2,\dots ,n,\\&\;\;\;\;\;\;\;\;\; M_d^{[n]}(x) \succeq 0, \end{aligned} \end{aligned}$$
(3)

where \(d \ge 1\) is a positive integer. Let \({\widehat{M}}_d^{[n]}(x)\) be the matrix obtained after replacing all the occurrences of the monomial \(x_i^2\) by 1 (\(i \in [n]\)) in the matrix \(M_d^{[n]}(x)\), and then deleting the rth column and row if there is \(p<r\) such that the pth column is identical to the rth column. Lifting all the variables by replacing the monomial \({\mathbf {x}}^{\varvec{\alpha }}\) by the real variable \(y_{\varvec{\alpha }}\), and deleting the integrality constraints we obtain the following SDP,

$$\begin{aligned} \begin{aligned} Q_d =&\max _{y} \; \frac{1}{4}\left\langle {\hat{L}},{\widehat{M}}_1^{[n]}(y) \right\rangle \\&\;\; \text {s.t. } {\widehat{M}}_d^{[n]}(y) \succeq 0, \end{aligned} \end{aligned}$$
(4)

where y is a real vector indexed by the set \({\mathbb {A}}^{[n]}_{2d}\), and \({\hat{L}} = \begin{bmatrix} {\mathbf {0}} &{} {\mathbf {0}}\\ {\mathbf {0}} &{} L\end{bmatrix}\). Matrix \({\widehat{M}}_d^{[n]}(y)\) is the moment matrix of order d. In Lasserre (2002) it is proved that the optimal relaxation value converges to the optimal value of the original POP, and that the convergence is finite: if \(d\ge n\) then \(Q_d = p^\star \). Practically, the optimal value \(p^\star \) is found usually for d considerably smaller than n, e.g. small Max-Cut instances have been solved using only the second relaxation (Campos et al. 2019; Lasserre 2002).

Although, in theory, the SDP relaxation of Max-Cut can be solved using interior point methods, the relaxation size grows exponentially as d increases and using interior point methods is no longer possible. Hence, the relaxation is only useful for small graphs, or for small values of d. Our computational experience on modern desktops is that, for graphs larger than 30 nodes, typically only the first order relaxation can be solved.

For sparse graphs, Waki et al. (2006) developed a sparse version of the Lasserre hierarchy that reduces the size of the dense SDP. Using a similar approach as the one for the dense relaxation, we obtain the relaxation:

$$\begin{aligned} \begin{aligned} Q_{d}^{s} =&\max _{y} \; \frac{1}{4}\sum _{k=1}^m\left\langle {\hat{L}}_k,{\widehat{M}}_1^{\phi _k}(y) \right\rangle \\&\;\; \text {s.t. } {\widehat{M}}_d^{\phi _k}(y) \succeq 0,\; k=1,2,\dots ,m, \end{aligned} \end{aligned}$$
(5)

where y is a vector indexed by the set \(\cup _{k=1}^m {\mathbb {A}}^{\phi _k}_{2d}\), \(\{\phi _k\}_{k=1}^m\) correspond to the maximal cliques of a chordal extension of the graph \(G=(V,E)\), and the matrices \({\hat{L}}_k\) are such that \(\sum _{k=1}^m\left\langle {\hat{L}}_k,{\widehat{M}}_1^{\phi _k}(y) \right\rangle = \left\langle {\hat{L}},{\widehat{M}}_1^{[n]}(y) \right\rangle \). For more information about chordal extensions and algorithms to find maximal cliques, see Blair and Peyton (1993), Bomze et al. (1999), Golumbic (2004). Lasserre (2006) proves that if the sets \(\{\phi _k\}_{k=1}^m\) satisfy the running intersection property, which is the case for the maximal cliques used in Waki et al. (2006), and some redundant constraints are added to SDP (5), the optimal relaxation value converges to the optimal value of the original polynomial optimization problem as the value of d tends to infinity.

For sparse graphs, i.e. graphs with small |E|, Relaxation (5) reduces considerably the size of Problem (4). However, even for sparse graphs, the sparse relaxation of Max-Cut is too large and can not be solved using interior point methods for \(d>1\). For example, we generated graphs with 300 nodes using the package SparsePOP (see Sect. 4.1) and tried unsuccessfully to solve the Sparse Relaxation (5) for \(d=2\) for many graphs even with a \(2\%\) sparsity, i.e. \(2|E|/(n^2)\approx 0.02\) (we used Mosek version 8.1 in an Intel Core i7-6700 CPU @ 3.40 Gigahertz Ubuntu 16.04 workstation with 16 gigabytes of RAM).

3.1 A partial and partial augmented second order sparse relaxation

Not all the moment matrices of the sparse second order relaxation can be included, so we propose including a subset. “Partial” Second Order Relaxation (6) relaxes maximal cliques with more than (fewer than or equal to) r nodes using the first (second) level of the Lasserre hierarchy.

$$\begin{aligned} \begin{aligned} Q^{P}_{r} =&\max _{y} \; \frac{1}{4}\sum _{k=1}^m\left\langle {\hat{L}}_k,{\widehat{M}}_1^{\phi _k}(y) \right\rangle \\&\;\;\text {s.t. } {\widehat{M}}_1^{\phi _k}(y) \succeq 0,\; k \notin \varGamma _r, \\&\;\;\;\;\;\;\;\; {\widehat{M}}_2^{\phi _k}(y) \succeq 0,\; k \in \varGamma _r, \\ \end{aligned} \end{aligned}$$
(6)

where \(\varGamma _r = \{k \in [m]: \left| \phi _k\right| \le r\}\) and the real vector y is indexed by the set \(\{\cup _{k\in \varGamma _r} {\mathbb {A}}^{\phi _k}_{4}\} \cup \{\cup _{k \notin \varGamma _r} {\mathbb {A}}^{\phi _k}_{2}\}\); note that this set is contained in the set \(\cup _{k=1}^m {\mathbb {A}}^{\phi _k}_{4}\) corresponding to the space of the second order sparse relaxation. Partial Relaxation (6) reduces the number of constraints and variables by including only a subset of the second order semidefinite constraints of the second level of the Lasserre hierarchy.

We consider also an “augmented” version of the previous problem, by also including second order constraints for subsets of the maximal cliques with sizes larger than r.

$$\begin{aligned} \begin{aligned} Q^{a}_{r,p,H} =&\max _{y} \; \frac{1}{4}\sum _{k=1}^m\left\langle {\hat{L}}_k,{\widehat{M}}_1^{\phi _k}(y) \right\rangle \\&\;\;\text {s.t. } {\widehat{M}}_1^{\phi _k}(y) \succeq 0,\; k \notin \varGamma _r, \\&\;\;\;\;\;\;\;\; {\widehat{M}}_2^{\phi _k}(y) \succeq 0,\; k \in \varGamma _r, \\&\;\;\;\;\;\;\;\; {\widehat{M}}_2^{\phi _{k,i}}(y) \succeq 0,\; \phi _{k,i} \subset \phi _k,\; \left| \phi _{k,i}\right| = r,\; i = 1,\dots ,p, k \notin \varGamma _r,\\ \end{aligned} \end{aligned}$$
(7)

where p denotes the number of second order constraints added for each maximal clique \(\phi _k\) such that \(\left| \phi _k\right| > r\), while H denotes the heuristic that selects the sets \(\phi _{k,i}\). We construct these subsets \(\phi _{k,i}\) sequentially, starting by selecting the smallest maximal clique (if there’s more than 1 smallest maximal clique, then we select randomly), and making sure that we are not adding repeated subsets. For fixed r, Augmented Partial Relaxation (7) increases the number of constraints over Partial Relaxation (6) by including the semidefinite constraints corresponding to the second order moment matrices of the subsets \(\phi _{k,i} \subset \phi _k,\; \left| \phi _{k,i}\right| = r, \left| \phi _{k}\right| > r\).

We use five heuristics to select the sets \(\{\phi _{k,i}\}\). If the total number of subsets of size r of \(\phi _k\) is \(q_k\) and \(I = \{i_1,i_2,\dots ,i_p\} \subseteq [q_k]\) is a set of sub-indices, then we select the subsets \(\{\phi _{k,i_1},\phi _{k,i_2},\dots ,\phi _{k,i_p}\}\) where:

  1. (H1)

    I is selected randomly and uniformly from the set \([q_k]\).

  2. (H2)

    I is such that \(\Vert L_{\phi _{k,i_1}}\Vert \ge \Vert L_{\phi _{k,i_2}}\Vert \ge \dots \ge \Vert L_{\phi _{k,i_{q_k}}}\Vert \), where \(L_{\phi }\) is the sub-matrix created by deleting columns and rows corresponding to indices not contained in \(\phi \) from the Laplacian matrix. H2 selects variable subsets with large absolute value weights in the graph.

  3. (H3)

    I is such that \(\left| \varOmega _{k,i_1}\right| \ge \left| \varOmega _{k,i_2}\right| \ge \dots \ge | \varOmega _{k,i_{q_k}}|\), where for every set \(\phi _{k,i} \subset \phi _k\) of size r, \(\varOmega _{k,i} = \{l: \phi _{k,i} \subseteq \phi _l,l \in [m], l \ne k\}\). H3 selects subsets contained in many maximal cliques.

  4. (H4)

    I is such that \(\left| \varOmega _{k,i_1}\right| \le \left| \varOmega _{k,i_2}\right| \le \dots \le | \varOmega _{k,i_{q_k}}|\). H4 selects subsets contained in few maximal cliques.

  5. (H5)

    I combines H2 and H4: we select subsets that are not repeated in other maximal cliques and contain variables with large weights in absolute value in the graph. Specifically, let \(\{i_1, i_2,\dots , i_z\}\) be the indices of the subsets \(\{\phi _{k,i_j}\}\) such that \(\left| \varOmega _{k,i_j}\right| = 0\) and \(\Vert L_{\phi _{k,i_j}}\Vert \ge \Vert L_{\phi _{k,i_{j+1}}}\Vert \) for \(1\le j \le z\). Then: if \(p \le z\) we select the subsets \(\{\phi _{k,i_1},\dots ,\phi _{k,i_p}\}\), if \(z<p\) we select the subsets \(\{\phi _{k,i_1},\dots ,\phi _{k,i_z}\}\), and if \(z=0\) we apply the heuristic H2.

The next section explores the power of Relaxations (6) and (7) for sparse graphs.

4 Numerical experiments

This section solves relaxations of the Max-Cut problems using Partial Relaxation (6) and Augmented Partial Relaxation (7). All the experiments were run in an Intel Core i7-6700 CPU @ 3.40 GHz Ubuntu 18.04 workstation with 16 GB RAM. MATLAB version 2018a generated the set of maximal cliques using the code genClique.m contained in the package SparsePOP version 3.01 (Waki et al. 2008) (https://sparsepop.sourceforge.io/), an implementation of the sparse relaxation developed in Waki et al. (2006). We created and solved SDPs (6) and (7) with C++ and Fusion-API Mosek version 8.1. All the times are elapsed real times in seconds. We limit the time of each interior point run to 5 hours.

4.1 Randomly generated problems

The first set of problems are created randomly using the objective sparsity structure obtained by SparsePOP’s randomUnconst.m. This function takes as arguments the number of polynomial variables (the graph size n), a lower and upper integer bound (l and u, respectively), and a maximal degree (2). With these parameters, randomUnconst.m first constructs randomly a set of maximal cliques \(\{\omega _k\}_{k=1}^m\) such that \(l\le \left| \omega _k \right| \le u\) for all k, and then generates a quadratic objective function (see Section 6.1 in Waki et al. (2006)). For a given set of parameters (nlu), and by setting the maximal degree equal to 2, we construct a graph by assigning a weight \(w_{ij} \ne 0\) to the edge \(\{i,j\}\) if the monomial \(x_ix_j\) has a non-zero coefficient in the function f(x). We select the weight \(w_{ij}\) randomly and uniformly from a discrete set \({\mathcal {W}}\). The graph sparsity depends on parameter u. Note that we only use the sparsity structure of the function f(x) to construct the graph, i.e. the specific function values are irrelevant for constructing the graph, what matters is which coefficients are non-zero. Also note that the resulting graph may not be chordal, and therefore the size of the maximal cliques \(\{\phi _k\}_k\) of the chordal extension of the graph (which are the ones used in the actual SDP relaxation) are not necessarily bounded by the parameters l and u.

Taking \(n=300,500\), \(l = 2\) and \(u = 4,6,8,10\), we generated 3 different graphs for every combination of the previous parameters by setting \({\mathcal {W}}\) equal to \(\{-1,1\}\), \(\{1,2, \dots ,10\}\) and \(\{-10,-9,\dots ,-1,1,2,\dots ,10\}\). In this set-up, the sparsity structure of the 3 graphs is identical but the non-zero weights are different. We repeat this procedure 10 times to obtain a total of 240 graphs. Table 1 summarizes the mean of the sparsity of the graphs for the different parameters and presents the mean of different size measures of the maximal cliques found after using the SparsePOP heuristic. Note that, while the parameter u controls the graph sparsity by limiting the maximal clique size, this maximal clique size is not bounded by the parameter u.

Table 1 Summary statistics of the random generated graphs

4.1.1 Strength of the partial relaxation

We solved Partial Relaxation (6) for every graph using \(r = 3,4,\dots ,20\). Notice: (i) given the computational limitations it is not always possible to solve the partial relaxation for all r and (ii) a graph might not contain maximal cliques of a certain size and therefore there is no need to solve the problem for that size, e.g. if a graph does not have maximal cliques of size 7 the set \(\varGamma _6\) is equal to \(\varGamma _7\) and the relaxation does not change from \(r=6\) to \(r=7\).

We denote \({\hat{r}} = \text {argmin}_{r} \{Q^{P}_{r}\}_{r \in \{3,4,\dots ,20\}}\) where \(Q^{P}_{r}\) is the solution of Partial Relaxation (6) for each graph. In theory, the best solution is obtained for \(r=20\), but (i) we may not be computationally able to solve SDP (6) for such a large r or (ii) we may find the solution for some \(r < 20\) (in this case we set \({\hat{r}}\) as the minimum integer r such that \(Q^{P}_{r}\) solves the Max-Cut).

Let gap\(_{r}=\frac{Q^{P}_{r}}{f^\star } - 1\), where \(f^\star \) is the optimal solution of Max-Cut Problem (1). BiqCrunch calculated the optimal solution \(f^\star \) for each graph. We limited the total BiqCrunch time to five hours and used default parameters (BiqCrunch\({\backslash }\)problems\({\backslash }\) max-cut\({\backslash }\)biq_crunch.param). We set \(f^\star \) equal to the best feasible solution found by BiqCrunch (we did not obtain a certificate of optimality for all problems). Table 2 groups the results by dimension and the parameter u that controls the graph sparsity. The third column contains the mean of the gaps for \(r=0\) and \(r={\hat{r}}\), i.e. the gap for the first order relaxation and the best solution found using the partial relaxation respectively (\({\textbf {gap}}_0\) is missing information from 7 problems that Mosek could not solve), while the fourth column shows the mean of the differences between the size of the largest maximal clique and \({\hat{r}}\). The fifth column presents the total number of problems with a gap smaller than \(1 \times 10^{-7}\) (30 problems in total). Additionally, given that all the weights of the graphs are integer numbers, the last column presents how many of the partial relaxation solutions are at most a unit away from the Max-Cut solutions, i.e. \(Q^{P}_{{{\hat{r}}}} -f^\star \le 1\).

The partial relaxation reaches a gap smaller than \(1\%\) in average for the sparser problems corresponding to \(u=4,6\) (these problems have in average maximal cliques with size smaller than 7, see Sect. 1), and solves most of the problems for \(u=4\). Also notice that these results are achieved without using all the possible maximal cliques, with an average difference between \({\hat{r}}\) and the largest maximal cliques ranging from 4 to 40. When the graph density increases, the partial relaxation starts to lose its effectiveness. Table 3 additionally groups results by the weights generating the graph and shows that the average gap for the graphs with weights in the set \(\{1,2,\dots ,10\}\) are smaller than for the other two types of weights.

Table 2 Optimality results using the best solution found (\(f^\star \)) with the partial relaxation (\(Q^{P}_{{{\hat{r}}}}\)) applied to the random instances grouped by dimension and u
Table 3 Gaps using the best solution found with the partial relaxation (\(Q^{P}_{{{\hat{r}}}}\)) applied to the random instances grouped by dimension, u and type of weight of the graph

4.1.2 Comparison with triangle inequalities

When using SDP relaxations for the Max-Cut problem, one of the most common and efficient approaches combines the triangle inequalities with the first level of the dense Lasserre hierarchy. State-of-the-art codes Biq Mac (Rendl et al. 2010; Wiegele 2006) and BiqCrunch (Krislock et al. 2014) (approximately) solve this relaxation at every branch-and-bound node. Given \(r< s < t \le n\), the triangle inequalities are:

$$\begin{aligned} \begin{aligned}&y_{e_r + e_s}+y_{e_r + e_t}+y_{e_s + e_t} \ge -1,\\&y_{e_r + e_s}-y_{e_r + e_t}-y_{e_s + e_t} \ge -1,\\&-y_{e_r + e_s}+y_{e_r + e_t}-y_{e_s + e_t} \ge -1,\\&-y_{e_r + e_s}-y_{e_r + e_t}+y_{e_s + e_t} \ge -1.\\ \end{aligned} \end{aligned}$$
(8)

The metric polytope (MET) is the space defined by the vectors satisfying the set of Inequalities (8) for all \(r< s < t \le n\). The vertices of this convex set contain all the possible cuts of a graph with n nodes, and is therefore a relaxation for the Max-Cut problem by itself. Laurent (1996) further discusses the MET vertices. Laurent Laurent and Poljak (1995) also characterizes the vertices of the dense Lasserre relaxation of order 1 and proves that they correspond exactly to the cuts of the graph. Furthermore, the second order dense relaxation is contained in the metric polytope (see Anjos and Wolkowicz (2002)Footnote 1), which means that the second level of the Lasserre dense relaxation is contained in the intersection of the metric polytope and the first level of the hierarchy (we will refer to this intersection as MET\(^{1st}_+\)).

For every solution \(Q^P_{{{\hat{r}}}}\), we calculated the time that BiqCrunch takes to find a solution as good as \(Q^P_{{{\hat{r}}}}\) (software download: https://biqcrunch.lipn.univ-paris13.fr/). For each instance, we ran four implementations of BiqCrunch:

  1. 1.

    Standard branch-and-bound using defaults: BiqCrunch\({\backslash }\)problems\({\backslash }\) max-cut \({\backslash }\)biq_crunch.param,

  2. 2.

    Root node solve only with default parameter change: \(\texttt {minAlpha} = 1 \times 10^{-12}\),

  3. 3.

    Root node solve only with default parameter change: \(\texttt {minTol} = 1 \times 10^{-12}\), or

  4. 4.

    Root node solve only with default parameter change: \(\texttt {minAlpha} = 1 \times 10^{-12}\) and \(\texttt {minTol} = 1 \times 10^{-12}\).

For each Max-Cut instance, our comparisons are with respect to the fastest time of these four implementations that reached the solution \(Q^P_{{\hat{r}}}\). These changes improve the accuracy of the root node solution to prevent an early stop of the algorithm (see Section 4.1 of Krislock et al. (2014) and Section 2.1 of the BiqCrunch manual (Krislock et al. 2016) for more information). Finally, we set a limit of 5 hours to BiqCrunch. Given that the weights of all the graphs are integers, the solution found using BiqCrunch is not strictly better than \(Q^P_{{{\hat{r}}}}\), in particular if \(BC^\star \) is the BiqCrunch solution then \(BC^\star < \left\lceil Q^P_{{{\hat{r}}}} \right\rceil \).

Table 4 compares the times of the partial relaxation (t) and BiqCrunch (\(t_{BC}\)), grouping the instances by the type of weight in the graph, the parameter u (see Sect. 4.1), and the size of the graph (10 instances in total for a fixed type of weight, value of u and size of the graph). Table 4 shows the average of the ratio of the times (\(t_{BC}/t\)), number of problems where the partial relaxation was faster than BiqCrunch out of the 10 instances (\(t_{BC} > t\)), and the average time taken by the partial relaxation (t). The partial relaxation is faster than BiqCrunch for the problems with the smallest maximal cliques (\(u=4\)) independent of the size (one particular instance of size 500 could not be solved by BiqCrunch for any of the 4 implementations), but loses its efficiency as the size of the maximal cliques increases. For instances generated using \(u=8,10,\) BiqCrunch is always faster. As the size of the graphs increases from 300 to 500, the number of problems solved faster by the partial relaxation increases for some cases, and for those where it does not (remaining at zero) the average ratio of the times has a small increase. With respect to the type of weight, BiqCrunch tends to perform better for weights in \(\{-10,\dots ,10\}\).

Table 4 Comparison between the time used by BiqCrunch to reach a solution as good as \(Q^P_{{{\hat{r}}}}\) for all the random instances

The feasible space of the partial relaxation depends on the parameter r. If \(r = 0\), this space is equivalent to the feasible space of Relaxation (5) with \(d=1\), and MET\(^{1st}_+\) is then a tighter relaxation. Recall that, for graphs of size 3 or 4, the vertices of the metric polytope correspond exactly to the cuts of the graph (Laurent 1996). Therefore, if \(r\le 4\), the space defined by MET\(^{1st}_+\) is at least as tight or tighter than the feasible space of Partial Relaxation (6). However, it is not difficult to find a point y belonging to MET\(^{1st}_+\) that does not satisfy \({\widehat{M}}_2^{\phi _k}(y) \succeq 0\) for \(\left| \phi _k \right| \ge 5 \).Footnote 2 Given that the feasible space MET\(^{1st}_+\) is not contained in the feasible space of the partial relaxation if \(r \ge 5\), these two feasible spaces are then not equal for \(r \ge 5\). Although the feasible spaces are not the same, Tables 23, and 4 show that, for sparse graphs, the partial relaxation can provide competitive solutions compared to the standard MET plus first order Lasserre relaxation.

4.1.3 Augmented relaxation

This section concentrates on Augmented Partial Relaxation (7). First we compare heuristics H1–H5 and then compare the partial and the augmented partial relaxations. In the implementation of the heuristics H1 to H5 for the augmented formulation, we did not consider all the possible subsets of \(\phi _k\) as this number can be very large. Instead, we select randomly 20 subsets and apply the heuristics to those sets. For example, if \(r=5\) and the maximal clique k contains 10 elements (\(\left| \phi _k \right| = 10\)), the total number of possible subsets of size 5 is \(q_k = 252\). Rather than considering all 252 subsets when applying any particular heuristic, we select randomly 20 subsets and assume those are all the possible subsets of size 5 for the maximal clique k and then apply the heuristic. If \(p>q_k\), we include all the subsets (without adding repeated subsets).

We solved every Augmented Partial Relaxation (7) using \(r = 3,4,\dots ,20\), \(p = 1,2,3\), and the 5 types of heuristics, i.e. for a fixed r and p we have 5 different solutions. For each value of r and p, we ranked the optimal solutions found by the different heuristics from the best (ranked 1) to worst (rank 5): the best heuristic produces the augmented partial relaxation with the smallest optimal objective value. If two or more heuristics produce the same solution, we assign the same rank. Table 5 presents the performance of each heuristic (the results do not change drastically between the type of weight used in the graph, or the number of subsets p). For example, the third row and second column indicates that, in 428 out of the 8905 solutions (\(4.8 \%\)), the random heuristic H1 found the best possible solution out of all the heuristics. Notice that the total number of problems is not the same between the 5 heuristics, this is because we exclude the solution of the augmented partial relaxations that do not improve the objective value, i.e. if \(Q^a_{{r},p,H}\) is already the solution of the original Max-Cut problem we do not include \(Q^a_{{r+j},p,H}\) for \(j>0\). Therefore, heuristics that perform better will have fewer total problems. The results indicate that the best heuristic, with more than \(90\%\) of its solutions ranked first or second, consists on selecting the subsets that do not repeat in other maximal cliques and for which the norm of the weights between the variables of the subset is as large as possible H5. Interestingly, the random heuristic H1 performed better than trying to select the most repeated subsets H3.

Table 5 Comparison objective value of Augmented Partial Relaxation (7) using heuristics H1–H5

For fixed p and H, let \(r_a\) be the minimum value such that the solution \(Q^a_{{r_a},p,H}\) of the augmented partial relaxation satisfies \(Q^a_{{r_a},p,H} \le Q^P_{{{\hat{r}}}}\). We found \(Q^a_{{r_a},p,H} \) for the 240 random graphs using H5 as heuristic, \(p=1,2,3\), and calculated: the time t and \(t_a\) needed to obtain \(Q^P_{{{\hat{r}}}}\) and \(Q^a_{{r_a},p,H} \) respectively, and \(\varDelta r = {\hat{r}} - r_a\). Recall that \({\hat{r}}\) and \(r_a\) will typically have different values, so a large value of \(\varDelta r\) indicates that Augmented Partial Relaxation (7) is working particularly well. Table 6 compares the two relaxations grouped by the size of the graph, the upper bound parameter (u) used to construct the random graph, and three different values of p, i.e. for a fixed nu and p, Table 6 represents 30 problems. For the three values of p, Table 6 shows the mean of \(\varDelta r\), the mean of the ratios of the time (\(t/t_a\)), and how many problems out of 30 were solved faster using the augmented partial relaxation (\(t>t_a\)). For Table 6, the times t and \(t_a\) include: time to create the maximal cliques and the subsets, time to formulate the relaxation in Mosek, and the relaxation solution time used by the interior point method. As expected, increasing the number of included subsets (p) reduces the largest clique size (smaller \(r_a\), larger \(\varDelta r\)). Table 6 shows that the augmented partial relaxation may often be more efficient, but there is not a clear pattern.

Our code for creating the maximal cliques and the subsets (which is written in MATLAB) can be improved considerably as our intention was not at this point to generate a very efficient code for these heuristics. Table 7 is equivalent to Table 6 except that Table 7 excludes from t and \(t_a\) the time spent constructing the maximal cliques and the subsets (the values of \(\varDelta r\) do not change and are not included again). In Table 7, a clearer pattern appears: larger values of p and sparser problems (smaller values of u) make the augmented partial relaxation faster than the partial one (the only exception is for \(n=300\) and \(u=4\)). Finally, there were not substantial differences when the results were also grouped by the type of weight of the graph.

Table 6 Comparison between the time used to find the best solution using Partial Relaxation (6) and the time used by the smallest Augmented Partial Relaxation (7) with a solution as good as the partial (using heuristic H5)
Table 7 Comparison between the time used to find the best solution using Partial Relaxation (6) and the time used by the smallest Augmented Partial Relaxation (7) with a solution as good as the partial (using heuristic H5)

4.2 Max-Cut instances from applications in statistical physics

Now that we have established the usefulness of the SDP relaxations (6) and (7) for random instances, we apply the partial and augmented partial relaxations to toroidal grid graphs (Liers 2004; Liers et al. 2004). These problems are in the Biq Mac library (http://biqmac.aau.at/biqmaclib.html). There are 6 different graphs structures and, for each structure, there are 3 different sets of weights, i.e. same set of edges but with different weights. Table 8 summarizes the 6 graph structures and the maximal cliques obtained using the SparsePOP heuristics. Note that t3 instances have larger maximal cliques in average (around 13) compared with t2 (approximately 8).

We found (if possible) the smallest \(r \in \{3,4,\dots ,20\}\) such that \(\frac{Q^a_{{r},p,H}}{f^\star } -1 \le 1 \times 10^{-7}\), for \(p = 0,1,2,3\) and \(H= H5\). Note that \(p=0\) corresponds to Partial Relaxation (6) (heuristic H does not apply in this case), and \(p=1,2,3\) to Augmented Partial Relaxation (7). For each p value, we calculated the time BiqCrunch needs to find a solution at least as good as \(Q^a_{{r_a},p,H}\). We used the same 4 parameter settings used in Sect. 4.1.2, and additionally we repeated the 4 experiments but now using the default parameters provided by BiqCrunch for Ising problems (BiqCrunch\({\backslash }\) problems\({\backslash }\)max-cut\({\backslash }\)biq_crunch.param.ising). Since the graphs only contain integer weights, we again consider a solution \(BC^\star \) from BiqCrunch as good as \(Q^a_{{r},p,H} \) if \(BC^\star < \left\lceil {Q^a_{{r},p,H} } \right\rceil \). We selected the best BiqCrunch time from the 8 parameter settings (once more we limit the experiment to a 5 hour limit)Footnote 3.

Once more, t and \(t_a\) correspond to the time (including creation of maximal cliques and subsets) to solve the partial relaxation (\(p=0\)) and the augmented one (\(p=1,2,3\)), and \(t_{BC}\) the time used by BiqCrunch. Table 9 presents the minimum value r, t, \(t_a\), and gap, as well as the time ratios \(t_{BC}/t\) and \(t_{BC}/t_a\), for each instance for the different values of p. Partial Relaxation (6) finds the solution for all the t2 instances at least 6 times faster than BiqCrunch, except for \(t2g20\_6666\) where even using \(r=19\) is only enough to reduce the gap to \(4 ^{-4}\). For the t3 instances, which have larger maximal cliques, the partial relaxation can not find the solution for all the problems and is slower than BiqCrunch. Increasing the value of p and using the heuristic H5, Augmented Partial Relaxation (7) improves the gap for all the problems (when \(p=2\) all the gaps are smaller than \(1 \times 10^{-7}\) except one), and for some of the t3 problems the augmented partial relaxation is faster than BiqCrunch. The augmented partial relaxation is slower than the partial one for many problems. Most of this difference is explained by the time selecting the subsets in the augmented partial relaxation.

Table 8 Summary of the graph structure and maximal cliques for the toroidal grid graphs (Liers 2004; Liers et al. 2004)
Table 9 Results for the toroidal grid graphs problems

4.3 Comparison with CS-TSSOS

This section compares the partial relaxation with the moment-SOS hierarchy CS-TSSOS (Wang et al. 2020b). This 2 level hierarchy exploits sparsity by combining the correlative sparsity used in Waki et al. (2006) (which is the same one used for constructing the maximal cliques in our partial and augmented partial relaxations) with term sparsity (Wang et al. 2019a, b, 2020a). Wang et al. (2020b) use large-scale, sparse randomly generated graphs to show the efficiency of CS-TSSOS as a relaxation of the Max-Cut. For the instances generated, the second order sparse Relaxation (5), i.e. second level of the Waki et al. (2006) relaxation, is too large to be solved, while the second level of CS-TSSOS is not only solvable but also provides better bounds than the first order of the sparse Relaxation (5).

We solved the 10 instances in Wang et al. (2020b) using Partial Relaxation (6) for values of \(r = 4,5,\dots ,10\), and using the CS-TSSOS second level relaxation. Both the instances and the code to generate and solve the CS-TSSOS relaxations (which also uses Mosek to solve the SDP) can be found in https://wangjie212.github.io/jiewang/code.html (accessed 25/06/2020). We did not use Augmented Partial Relaxation (7) because generating the subsets of the maximal cliques with any of the heuristics takes considerable time compared to the time of solving the resulting relaxation. These instances are larger than the graphs studied so far (the largest graph has 5005 nodes), have maximal cliques with average size no larger than 8, and the largest maximal clique have 16 elements (see Table 10).

Table 10 Summary of the graph structure and maximal cliques for the instances in used in Wang et al. (2020b)

Table 11 shows the results. Because our code is specific to construct the partial relaxation for the Max-Cut problem while the CS-TSSOS code is general for any POP, we included the time used by the interior point method to solve the SDP relaxation (Time IPM), besides the total time used by each method (Total time), which includes the time to create the relaxation and solve it. In general, Partial Relaxation (6) achieves better solutions faster (both in terms of the total time and the interior point time) than the CS-TSSOS relaxation after adding maximal cliques of size 7 or smaller (\(r \le 7\)). In particular, for the largest instances, \(r=4\) was enough to improve the bound found by CS-TSSOS. These solutions can be improved using larger values of r and while still obtaining competitive times compared to CS-TSSOS.

Table 11 Comparison of the solutions of the CS-TSSOS and Partial Relaxation (6) for the Wang et al. (2020b) large-scale instances

5 Conclusions

We explored the idea of using information of the second order of the Lasserre hierarchy to strengthen the standard relaxation used to solve Max-Cut in the case of sparse graphs. We used two basic formulations by: (i) limiting the size of the maximal cliques added to the relaxation and (ii) including subsets of the maximal cliques that were too large.

We tested these ideas on randomly generated graphs of different densities and sizes, as well as on sparse graphs coming from applications of statistical physics. Our results showed that Partial Relaxation (6) and Augmented Partial Relaxation (7) can be very effective for solving sparse graphs with small maximal cliques but lose power as the density and/or the size of the maximal cliques increases. The results also showed that this idea provides strong bounds when compared with CS-TSSOS.

Although the partial relaxation showed potential, there is still the question of how to select the parameters r and p in the partial relaxation. This is an interesting question for future work. In particular, multilevel techniques like the ones used in Campos and Parpas (2018), Campos et al. (2019), Ma et al. (2019) can be useful in this context.