1 Introduction

In a semidefinite program (SDP) one wants to find a positive semidefinite (and hence symmetric) matrix such that linear — in the entries of the matrix — constraints are fulfilled and a linear objective function is minimized. If the matrix is also required to be entrywise nonnegative, the problem is called doubly nonnegative program (DNN). Since interior point methods fail (in terms of time and memory required) when the scale of the SDP is big, augmented Lagrangian approaches became more and more popular to solve this class of programs. Wen et al. (2010) as well as Malick et al. (2009) and De Santis et al. (2018) considered alternating direction methods of multipliers (ADMMs) to solve SDPs. One can directly apply these ADMMs to solve DNNs, too, by introducing nonnegative slack variables for the nonnegativity constraints in order to obtain equality constraints only. However, this increases the size of the problem significantly.

In this paper, we first present two ADMMs already proposed in the literature (namely ConicADMM3c by Sun et al. (2015) and ADAL+ Wen et al. (2010)) to specifically solve DNNs. Then we introduce two new methods: DADMM3c, which employs a factorization of the dual matrix to avoid spectral decompositions, and DADAL+ taking advantage of the practical benefits of DADAL De Santis et al. (2018). Note that there are examples for which a 3-block ADMM (like DADAL+) diverges. However, the question of convergence of 3-block ADMMs for SDP relaxations arising from combinatorial optimization problems is still open.

In case the DNN is used as relaxation of some combinatorial optimization problem, one is interested in dual bounds, i.e. bounds that are the dual objective function value of a dual feasible solution. In case of a minimization problem this is a lower bound, in case of a maximization problem an upper bound. Having bounds is in particular important if one intends to use the relaxation within a branch-and-bound algorithm. This, however, means that one needs to solve the DNN to high precision such that the dual solution is feasible and hence the dual objective function value is a reliable bound. Typically, first order methods can compute solutions of moderate precision in reasonable time, whereas progressing to higher precision can become expensive. To overcome this drawback, we present two methods to compute a dual bound from a solution obtained by the ADMMs within a post-processing phase.

In the following section we state our notations and introduce the formulation of standard primal-dual SDPs and DNNs. In Sect. 2 we go through the two existing ADMMs for DNNs we mentioned before, and in Sect. 3 we introduce the tool of dual matrix factorization used in the new ADMMs DADAL+ and DADMM3c presented later in the same section. In Sect. 4 we present two methods for obtaining dual bounds from a solution of a DNN that satisfies the optimality criteria to moderate precision only. Section 5 shows numerical results for instances of DNN relaxations of the stable set problem. We evaluate the impact of the dual factorization within the methods as well as the two post-processing schemes for obtaining dual bounds. Section 6 concludes the paper.

1.1 Problem formulation and notations

Let \({\mathscr {S}}_n\mathcal {S}\) be the set of n-by-n symmetric matrices, \({\mathscr {S}}_n^{+}~\subset ~{\mathscr {S}}_n\) be the set of positive semidefinite matrices and \({\mathscr {S}}_n^{-}~\subset ~{\mathscr {S}}_n\) be the set of negative semidefinite matrices. Denoting by \(\left\langle X,Y\right\rangle = \mathrm{trace}(XY)\) the standard inner product in \({\mathscr {S}}_n\), we write the standard primal-dual pair of SDPs as

$$\begin{aligned} \begin{array}{ll} \min \, &{}\quad \left\langle C,X\right\rangle \\ \text{ s.t. } &{}\quad {\mathscr {A}}X = b \\ &{}\quad X\in {\mathscr {S}}_n^{+}\end{array} \end{aligned}$$
(1)

and

$$\begin{aligned} \begin{array}{ll} \max \, &{}\quad b^Ty\\ \text{ s.t. } &{}\quad {\mathscr {A}}^{\top }y +Z = C \\ &{}\quad Z\in {\mathscr {S}}_n^{+}, \end{array} \end{aligned}$$
(2)

where \(C\in {\mathscr {S}}_n\), \(b\in \mathbb {R}^m\), \({\mathscr {A}}: {\mathscr {S}}_n\rightarrow \mathbb {R}^m\) is the linear operator \((\AA X)_i = \left\langle A_i,X\right\rangle \) with \(A_i\in {\mathscr {S}}_n\), \(i=1,\ldots ,m\) and \({\mathscr {A}}^{\top }: \mathbb {R}^m \rightarrow {\mathscr {S}}_n\) is its adjoint operator, so \({\mathscr {A}}^{\top }y = \sum _i y_i A_i\) for \(y \in \mathbb {R}^m\).

When in the primal SDP (1) the elements of X are constrained to be nonnegative, then the SDP is called a doubly nonnegative program (DNN). To be more precise the primal DNN is given as

$$\begin{aligned} \begin{array}{ll} \min \, &{}\quad \left\langle C,X\right\rangle \\ \text{ s.t. } &{}\quad {\mathscr {A}}X = b \\ &{}\quad X\in {\mathscr {S}}_n^{+}, \quad X\ge 0. \end{array} \end{aligned}$$
(3)

Introducing S as the dual variable related to the nonnegativity constraint \(X\ge 0\), we write the dual of the DNN (3) as

$$\begin{aligned} \begin{array}{ll} \max \, &{}\quad b^Ty\\ \text{ s.t. } &{}\quad {\mathscr {A}}^{\top }y + Z + S = C \\ &{}\quad Z\in {\mathscr {S}}_n^{+}, \quad S \in {\mathscr {S}}_n, \quad S\ge 0. \end{array} \end{aligned}$$
(4)

We assume that both the primal DNN (3) and the dual DNN (4) have strictly feasible points (i.e. Slater’s condition is satisfied), so strong duality holds. Under this assumption, (ySZX) is optimal for (3) and (4) if and only if

$$\begin{aligned} \begin{aligned} {\mathscr {A}}X&= b, \quad&{\mathscr {A}}^{\top }y +Z + S&= C,\quad&ZX&= 0, \\ X&\in {\mathscr {S}}_n^{+}, \quad&Z&\in {\mathscr {S}}_n^{+}, \quad&\left\langle S,X\right\rangle&= 0, \\ X&\ge 0, \quad&S \in {\mathscr {S}}_n, \quad S&\ge 0,\\ \end{aligned} \end{aligned}$$
(5)

hold. We further assume that the constraints formed through the operator \({\mathscr {A}}\) are linearly independent.

Let \(v\in \mathbb {R}^n\) and \(M\in \mathbb {R}^{m\times n}\). In the following, M(i,  : ) is defined as the ith row of M and M( : , j) as the jth column of M. Further we denote by \(\mathrm{Diag}{(v)}\) the diagonal matrix having v on the main diagonal. The vector \(e_i\) is defined as the ith vector of the standard basis in \(\mathbb {R}^n\). Whenever a norm is used, we consider the Frobenius norm in case of matrices and the Euclidean norm in case of vectors. Let \(S \in {\mathscr {S}}_n\). We denote the projection of S onto the positive semidefinite and negative semidefinite cone by \((S)_{+}\) and \((S)_{-}\), respectively. The projection of S onto the nonnegative orthant is denoted by \((S)_{\ge 0}\). Moreover we denote by \(\lambda (S)\) the vector of the eigenvalues of S and by \(\lambda _{\min }(S)\) and \(\lambda _{\max }(S)\) the smallest and largest eigenvalue of S, respectively.

2 ADMMs for doubly nonnegative programs

In this section, we present two different ADMMs for solving DNNs. Let \(X\in {\mathscr {S}}_n\) be the Lagrange multiplier for the dual equation \({\mathscr {A}}^{\top }y + Z + S -C = 0\) and \(\sigma >0\) be fixed. Then the augmented Lagrangian of the dual DNN (4) is defined as

$$\begin{aligned} L_\sigma (y,S,Z; X) = b^Ty - \langle {\mathscr {A}}^{\top }y + Z + S - C, X\rangle - \frac{\sigma }{2}\Vert {\mathscr {A}}^{\top }y + Z + S - C\Vert ^2. \end{aligned}$$

In the classical augmented Lagrangian method applied to the dual DNN (4) the problem

$$\begin{aligned} \begin{array}{cc} \max \, &{}\quad L_\sigma (y,S,Z; X) \\ \text{ s.t. } &{}\quad y\in \mathbb {R}^m, \quad S \in {\mathscr {S}}_n, \quad S\ge 0, \quad Z\in {\mathscr {S}}_n^{+}, \end{array} \end{aligned}$$
(6)

where X is fixed and \(\sigma >0\) is a penalty parameter is addressed at every iteration.

Once Problem (6) is (approximately) solved, the multiplier X is updated by the first order rule

$$\begin{aligned} X = X + \sigma ({\mathscr {A}}^{\top }y + Z +S - C) \end{aligned}$$
(7)

and the process is iterated until convergence, i.e., until the optimality conditions (5) are satisfied within a certain tolerance (see Bertsekas 1982, Chapter 2 for further details).

If the augmented Lagrangian \(L_\sigma (y,S,Z; X)\) is maximized with respect to y, S and Z not simultaneously but one after the other, this yields the well known alternating direction method of multipliers (ADMM). The number of blocks of an ADMM is the number of blocks of variables for which Problem (6) is maximized separately, so we consider a 3-block ADMM. Such an ADMM has been specialized and used by Wen et al. (2010) to address DNNs and in the following we refer to this method as ADAL+. Details will be given in Sect. 2.1. Even though in all our numerical tests this algorithm reaches the desired precision of our stopping criteria, it has been recently shown in Chen et al. (2016) that an ADMM with more than two blocks may diverge.

In order to overcome this theoretical issue, Sun et al. (2015) proposed to update the third block twice per iteration, or, in other words, to maximize \(L_\sigma (y,S,Z; X)\) with respect to the variable y two times in one iteration. Their algorithm, named ConicADMM3c and detailed in Sect. 2.2, is the first theoretically convergent 3-block ADMM proposed in the context of conic programming.

2.1 ADAL+

In the following, we refer to the ADMM presented by Wen, Goldfarb and Yin in Wen et al. (2010) and applied to the dual DNN (4) as ADAL+. As already mentioned, ADAL+ iterates the maximization of the augmented Lagrangian with respect to each block of dual variables. To be more precise the new point \((y^{k+1},S^{k+1},Z^{k+1},X^{k+1})\) is computed by the following steps:

$$\begin{aligned} y^{k+1}&= {{\,\mathrm{arg\,max}\,}}_{y\in \mathbb {R}^m} L_{\sigma ^k}(y,S^k,Z^{k}; X^{k}), \end{aligned}$$
(8)
$$\begin{aligned} S^{k+1}&= {{\,\mathrm{arg\,max}\,}}_{S \in {\mathscr {S}}_n, S\ge 0} L_{\sigma ^k}(y^{k+1},S,Z^k; X^{k}), \end{aligned}$$
(9)
$$\begin{aligned} Z^{k+1}&= {{\,\mathrm{arg\,max}\,}}_{Z\in {\mathscr {S}}_n^{+}} L_{\sigma ^k}(y^{k+1},S^{k+1},Z; X^{k}), \end{aligned}$$
(10)
$$\begin{aligned} X^{k+1}&= X^{k} + \sigma ^k ( {\mathscr {A}}^{\top }y^{k+1} + Z^{k+1} +S^{k+1} - C). \end{aligned}$$
(11)

The update of y in (8) is derived from the first-order optimality condition of the problem on the right-hand side of (8), so \(y^{k+1}\) is the unique solution of

$$\begin{aligned} \nabla _y L_{\sigma ^k} (y,S^{k},Z^{k}; X^{k}) = b - {\mathscr {A}}( X^k + \sigma ^k ({\mathscr {A}}^{\top }y + Z^k +S^k - C)) = 0, \end{aligned}$$

that is

$$\begin{aligned} y^{k+1}= ({\mathscr {A}}{\mathscr {A}}^{\top })^{-1}\left( \frac{1}{\sigma ^k} b - {\mathscr {A}}\left( \frac{1}{\sigma ^k} X^k + Z^k +S^k - C\right) \right) . \end{aligned}$$

As shown in Wen et al. (2010), the update of S according to (9) is equivalent to

$$\begin{aligned} \min _{S\in {\mathscr {S}}_n, S\ge 0} \Vert S - U^{k+1}\Vert ^2, \end{aligned}$$

where \(U^{k+1} = C - {\mathscr {A}}^{\top }y^{k+1} -Z^k -\frac{1}{\sigma ^k} X^k\). Hence, \(S^{k+1}\) is obtained as the projection of \(U^{k+1}\) onto the nonnegative orthant, namely

$$\begin{aligned} S^{k+1}= \big (U^{k+1}\big )_{\ge 0} = \left( C - {\mathscr {A}}^{\top }y^{k+1} -Z^k -\frac{1}{\sigma ^k} X^k\right) _{\ge 0}. \end{aligned}$$

Then, the update of Z in (10) is conducted by considering the equivalent problem

$$\begin{aligned} \min _{Z\in {\mathscr {S}}_n^{+}} \Vert Z + W^{k+1}\Vert ^2, \end{aligned}$$
(12)

with \(W^{k+1} = (\frac{1}{\sigma ^k}X^k - C + {\mathscr {A}}^{\top }y^{k+1} +S^{k+1})\), or, in other words, by projecting \(W^{k+1}\in {\mathscr {S}}_n\) onto the (closed convex) cone \({\mathscr {S}}_n^{-}\) and taking its additive inverse (see Algorithm 1). Such a projection is computed via the spectral decomposition of the matrix \(W^{k+1}\).

Finally, it is easy to see that the update of X in (11) can be performed considering the projection of \(W^{k+1}\in {\mathscr {S}}_n\) onto \({\mathscr {S}}_n^{+}\) multiplied by \(\sigma ^k\), namely

$$\begin{aligned} X^{k+1}&= X^k + \sigma ^k ({\mathscr {A}}^{\top }y^{k+1} + Z^{k+1} +S^{k+1} - C ) = \\&= \sigma ^k(X^k/\sigma ^k - C + {\mathscr {A}}^{\top }y^{k+1} +S^{k+1} - (X^k/\sigma ^k - C + {\mathscr {A}}^{\top }y^{k+1} +S^{k+1})_-) = \\&= \sigma ^k(X^k/\sigma ^k - C + {\mathscr {A}}^{\top }y^{k+1} +S^{k+1})_+. \end{aligned}$$

We report in Algorithm 1 the scheme of ADAL+.

figure a

The stopping criterion of ADAL+ considers the following errors

$$\begin{aligned} r_P= & {} \frac{\Vert {\mathscr {A}}X - b\Vert }{1+\Vert b\Vert }, \quad r_D= \frac{\Vert {\mathscr {A}}^{\top }y + Z + S - C\Vert }{1+ \Vert C\Vert }, \\ r_{PP}= & {} \frac{\Vert X - (X)_{\ge 0}\Vert }{1+\Vert X\Vert }, \quad r_{CS} = \frac{|\left\langle S,X\right\rangle |}{1+\Vert X\Vert +\Vert S\Vert }, \end{aligned}$$

related to primal feasibility (\(\AA X = b\), \(X\ge 0\)), dual feasibility (\({\mathscr {A}}^{\top }y + Z + S = C\)) and complementarity condition (\(\left\langle S,X\right\rangle = 0\)). More precisely, the algorithm stops as soon as the quantity

$$\begin{aligned} \delta = \max \{ r_P, r_D, r_{PP}, r_{CS}\} \end{aligned}$$

is less than a fixed precision \(\varepsilon >0\).

The other optimality conditions (namely \(X\in {\mathscr {S}}_n^{+}\), \(Z\in {\mathscr {S}}_n^{+}\), \(S \in {\mathscr {S}}_n\), \(S\ge 0\), \(ZX=0\)) are satisfied up to machine accuracy throughout the algorithm thanks to the projections employed in ADAL+.

2.2 ConicADMM3c

A major drawback of ADAL+ is that it is not necessarily convergent. By considering two updates of the variable y within one iteration, Sun, Toh and Yang are able to prove that the algorithm ConicADMM3c proposed in Sun et al. (2015) and detailed in Algorithm 2 is a 3-block convergent ADMM: Under certain assumptions, they show that the sequence \(\{(y^k, S^k, Z^k; X^k)\}\) produced by ConicADMM3c converges to a KKT point of the primal DNN (3) and the dual DNN (4). Note that also the order of the updates on the blocks of variables is different with respect to ADAL+. The convergence analysis is based on the fact that ConicADMM3c is equivalent to a semi-proximal ADMM.

With respect to ADAL+, ConicADMM3c has the drawback that fewer optimality conditions are satisfied up to machine accuracy throughout the algorithm. Additionally to \(r_P, r_D, r_{PP}\) and \(r_{CS}\), the stopping criterion of ConicADMM3c has to take into account the errors

$$\begin{aligned} r_{PD} = \frac{\Vert (-X)_+\Vert }{1+\Vert X\Vert } \quad \text { and } \quad r_{CZ} = \frac{\Vert \left\langle Z,X\right\rangle \Vert }{1+\Vert X\Vert +\Vert Z\Vert }, \end{aligned}$$

related to the primal feasibility \(X\in {\mathscr {S}}_n^{+}\) and the complementarity condition \(ZX = 0\). In fact, as the second update of y is performed after the update of Z, the spectral decomposition of \(W^{k+1}\) cannot be used to update X as in ADAL+ and both the complementarity condition \(ZX=0\) and the positive semidefiniteness of X are not satisfied by construction. (We will give a summary on the conditions satisfied throughout the algorithms in Table 1 in a subsequent section.) From a computational point of view this slows down the convergence of the scheme, which will be confirmed in our computational evaluation in Sect. 5.

figure b

3 Dual matrix factorization

In this section, we present our new variants of ADAL+ and ConicADMM3c, namely DADAL+ and DADMM3c, where a factorization of the dual variable Z is employed. We adapt the method introduced by De Santis et al. (2018). In particular, we look at the augmented Lagrangian problem where the positive semidefinite constraint on the dual matrix Z is eliminated by considering the factorization \(Z=VV^\top \). To be more precise, in each iteration of the ADMMs for fixed X, we focus on the problem

$$\begin{aligned} \begin{array}{ll} \max \, &{}\quad L_\sigma (y,S,V; X)\\ \text{ s.t. } &{}\quad y\in \mathbb {R}^m, \quad S \in {\mathscr {S}}_n, \quad S\ge 0, \quad V\in \mathbb {R}^{n \times r}, \end{array} \end{aligned}$$
(13)

where

$$\begin{aligned} L_\sigma (y,S,V; X) = b^Ty - \langle {\mathscr {A}}^{\top }y + VV^\top + S - C, X\rangle - \frac{\sigma }{2}\Vert {\mathscr {A}}^{\top }y + VV^\top +S - C \Vert ^2. \end{aligned}$$

Compared to (6) the constraint \(Z \in {\mathscr {S}}_n^{+}\) is replaced by \(Z=VV^\top \) for some \(V\in \mathbb {R}^{n \times r}\), so \(Z \in {\mathscr {S}}_n^{+}\) is fulfilled automatically. Note that the number of columns r of the matrix V represents the rank of Z.

The use of the factorization of the dual variable in ADAL+ should improve the numerical performance of the algorithm when dealing with structured DNNs, as it happens in the comparison of the algorithm DADAL with ADAL when dealing with structured SDPs De Santis et al. (2018). For what concerns ConicADMM3c, we will see in Sect. 3.2 that using the factorization of the dual variable allows to avoid any spectral decomposition along the iterations of the algorithm, without compromising the theoretical convergence of the method.

Note that Problem (13) is unconstrained with respect to the variables y and V. In particular, the following holds.

Proposition 1

Let \((y^*, S^*, V^*)\in \mathbb {R}^m \times {\mathscr {S}}_n\times \mathbb {R}^{n \times r}\) be a stationary point of (13), then

$$\begin{aligned} \nabla _y L_\sigma (y^*,S^*,V^*; X)= & {} b - {\mathscr {A}}( X + \sigma ({\mathscr {A}}^{\top }y^* + {V^*V^*}^\top + S -C)) = 0 \,\mathrm{and }\nonumber \\ \nabla _V L_\sigma (y^*,S^*,V^*; X)= & {} -2 (X + \sigma ({\mathscr {A}}^{\top }y^* + {V^*V^*}^\top + S-C))V^* = 0. \end{aligned}$$
(14)

Proposition 1 implies that fulfilling the necessary optimality conditions with respect to y is equivalent to solve one system of linear equations.

As in De Santis et al. (2018), we consider Algorithm 3 in order to update y and V (and hence Z) for fixed S and X. In particular in Algorithm 3, starting from (ySVX), we move V along an ascent direction \(D_V\in \mathbb {R}^{n \times r}\) with a stepsize \(\alpha \). While doing this, we update y in such a way that we keep its optimality conditions of (13) satisfied, so \(\nabla _y L_\sigma (y,S,V+\alpha D_V; X) = 0\) holds for the updated y (see De Santis et al. 2018, Proposition 2). We stop as soon as the necessary optimality conditions with respect to V (see Proposition 1) are fulfilled to a certain precision.

As in the algorithm DADAL presented in De Santis et al. (2018), in our implementation we set \(D_V\) either to the gradient of \(L_\sigma (y,S,V; X)\) or to the gradient scaled with the inverse of the diagonal of the Hessian of \(L_\sigma (y,S,V; X)\). In order to determine a stepsize \(\alpha \), at Step 4 in Algorithm 3 we could perform an exact linesearch to maximize \(L_\sigma (y(V + \alpha D_{V}),S, V + \alpha D_{V}; X)\) with respect to \(\alpha \). This is a polynomial of degree 4 in \(\alpha \), so we can interpolate it from five different points in order to get its analytical expression and by this determining the maximizer explicitly. In practice we evaluate \(L_\sigma (y(V + \alpha D_{V}),S, V + \alpha D_{V}; X)\) for 1000 different values of \(\alpha \in (0,10)\) and take the \(\alpha \) corresponding to the maximum value of \(L_\sigma \).

figure c

As output of Algorithm 3, we get y and V (and therefore also \(Z = VV^\top \)) that have been updated through the maximization of the augmented Lagrangian (13) with respect to V. This leads to a new point (ySVX).

This update can be used within ADAL+ and ConicADMM3c as detailed in the following.

3.1 DADAL+

First we consider DADAL+, our version of ADAL+ where the use of the factorization of the dual variable Z leads to a double update of Z. As a further enhancement of the algorithm ADAL+ devised in Wen et al. (2010), we propose to perform also a double update of the dual variable y.

To be more precise, we replace the first update of y in ADAL+ with a update of y and V with Algorithm 3 in DADAL+. Furthermore in DADAL+ we update y not only before, but also a second time after the computation of S. This second update is performed by applying the closed formula solution of the maximization problem in (8). Note that the second update of y is performed before the update of Z so that by computing the spectral decomposition of \(W = X/\sigma -C +{\mathscr {A}}^{\top }y + S\), we can simultaneously update Z and X and both the complementarity condition \(ZX=0\) and the positive semidefiniteness of X are satisfied up to machine accuracy throughout the algorithm in the same way it is the case in ADAL+. The scheme of DADAL+ is detailed in Algorithm 4.

figure d

3.2 DADMM3c

We now investigate the use of the dual factorization within the algorithm ConicADMM3c and call the modified algorithm DADMM3c. In ConicADMM3c, the effort spent to compute the spectral decomposition of \(W=X/\sigma -C +{\mathscr {A}}^{\top }y + S\) is not that well exploited as it is used to update only the dual matrix Z but not the primal matrix X. Hence in DADMM3c we update Z and y by employing the factorization \(Z = VV^\top \) and performing Algorithm 3 instead of updating them by a spectral decomposition and a closed formula as it is done in ConicADMM3c. Note that Algorithm 3 is able to compute stationary points of Problem (13), that are not necessarily global optima. However, assuming that the update of y and V at Step 5 in Algorithm 5 is done such that Problem (13) is solved to optimality, the theoretical convergence of the method is maintained. Note that the computation of any spectral decomposition is avoided. The scheme of the algorithm DADMM3c is detailed in Algorithm 5.

figure e

A limit of DADMM3c is that the rank of Z is not updated throughout the iterations. This means that the maximization of L(ySVX) with respect to V is performed keeping r fixed to the initial value that in our implementation is n. It is still an open question to update the rank of Z in a beneficial way.

On the other hand, note that in DADAL+ the rank of Z is determined at every iteration through the eigenvalue decomposition in the second update of Z.

As already mentioned in Sect. 2, some of the optimality conditions are satisfied throughout the algorithms ADAL+/DADAL+ and ConicADMM3c/DADMM3c. A summary is presented in Table 1.

Table 1 Optimality conditions (5) satisfied through the algorithms by construction are indicated by a checkmark, all others by an x-mark

4 Computation of dual bounds

When solving combinatorial optimization problems, DNN relaxations very often yield high quality bounds. These bounds can then be used within a branch-and-bound framework in order to get an exact solution method. In this section we want to discuss how we can obtain lower bounds on the optimal objective function value of the primal DNN (3) from a dual solution of moderate precision only.

Thanks to weak and strong duality results, the objective function value of every feasible solution of the dual DNN (4) is a lower bound on the optimal objective function value of the primal DNN (3) and the optimal values of the primal and the dual DNN coincide. Therefore every dual feasible solution and in particular the optimal dual solution give rise to a dual bound.

Note that the dual objective function value serves as a dual bound only if the DNN relaxation is solved to high precision. If the DNN is solved to moderate precision, the dual objective function value might not be a bound as the dual solution might be infeasible. However, solving the DNN to high precision comes with enormous computational costs.

So unfortunately ADAL+, DADAL+, ConicADMM3c and DADMM3c are not suitable to produce a bound fast. Running an ADMM typically gives approximate optimal solutions rather quickly, while going to optimal solutions with high precision can be very time consuming. As the dual constraint \({\mathscr {A}}^{\top }y + Z + S -C = 0\) does not necessarily hold in every iteration of the four algorithms (see Table 1), obtaining a dual feasible solution with sufficiently high precision with ADMMs may take extremely long.

To save time, but still ensure that we obtain a dual bound, we will stop the four methods at a certain precision. After that we will use one of two procedures in a post-processing phase in order to obtain a bound. In Sect. 4.1 we will describe how to obtain a bound with a method already presented in the literature. In Sect. 4.2 we present a new procedure for obtaining a dual feasible solution and hence a bound from an approximate optimal solution.

4.1 Dual bounds through error bounds

In this section we present the method to obtain lower bounds on the primal optimal value of an SDP of the form (1) introduced by Jansson et al. (2008). We adapt this method for DNNs in order to use it in a post-processing phase of the four ADMMs presented above. We start with the following lemma from (Jansson et al. 2008, Lemma 3.1).

Lemma 1

Let Z, X be symmetric matrices of dimension n that satisfy

$$\begin{aligned} \underline{z} \le \lambda _{\min }(Z), \quad 0 \le \lambda _{\min }(X), \quad \lambda _{\max }(X) \le \bar{x} \end{aligned}$$
(15)

for some \(\underline{z}\), \(\bar{x} \in {\mathbb R}\). Then the inequality

$$\begin{aligned} \left\langle Z,X\right\rangle \ge \bar{x}\sum _{k : \lambda _k(Z) <0}\lambda _k(Z) \ge n\bar{x}\min \{0,\underline{z}\} \end{aligned}$$

holds.

Proof

Let \(Z= Q\Lambda Q^\top \) be an eigenvalue decomposition of Z with \(QQ^\top =I\) for some \(Q \in {\mathbb R}^{n \times n}\) and \(\Lambda =\mathrm{Diag}(\lambda (Z)).\) Then

$$\begin{aligned} \left\langle Z,X\right\rangle&= \mathrm{trace}(Q\Lambda Q^\top X) = \mathrm{trace}(\Lambda Q^\top X Q) \\&= \sum _{k=1}^n \lambda _k(Z) Q(:,k)^\top X Q(:,k). \end{aligned}$$

Because of (15), we have \(0 \le Q(:,k)^\top X Q(:,k) \le \bar{x}.\) Therefore

$$\begin{aligned} \left\langle Z,X\right\rangle \ge \bar{x}\sum _{k : \lambda _k(Z) <0}\lambda _k(Z) \ge n\bar{x}\min \{0,\underline{z}\}. \end{aligned}$$

\(\square \)

At this point we can present the following theorem of (Jansson et al. 2008, Theorem 3.2) adapted for DNNs.

Theorem 1

Consider the primal DNN (3), let \(X^*\) be an optimal solution and let \(p^*\) be its optimal value. Given \(y \in \mathbb {R}^m\) and \(S \in {\mathscr {S}}_n\) with \(S \ge 0\), set

$$\begin{aligned} \tilde{Z} = C - {\mathscr {A}}^{\top }y - S \end{aligned}$$
(16)

and suppose that \(\underline{z} \le \lambda _{\min }(\tilde{Z}).\) Assume \(\bar{x} \in {\mathbb R}\) such that \(\bar{x} \ge \lambda _{\max }(X^*)\) is known. Then the inequality

$$\begin{aligned} p^* \ge b^\top y + \bar{x}\sum _{k: \lambda _k(\tilde{Z})<0} \lambda _k(\tilde{Z}) \ge b^\top y + n\bar{x}\min \{0, \underline{z}\} \end{aligned}$$
(17)

holds.

Proof

Let \(X^*\) be optimal for the primal DNN (3). Then

$$\begin{aligned} \left\langle C,X^*\right\rangle - b^\top y&= \left\langle C,X^*\right\rangle - \left\langle {\mathscr {A}}X^*, y\right\rangle = \left\langle C -{\mathscr {A}}^{\top }y,X^*\right\rangle \\&= \left\langle \tilde{Z}+ S,X^*\right\rangle = \left\langle \tilde{Z},X^*\right\rangle + \left\langle S,X^*\right\rangle . \end{aligned}$$

Since \(S \ge 0\) and \(X^* \ge 0\), the inequality

$$\begin{aligned} \left\langle C,X^*\right\rangle \ge b^\top y + \left\langle \tilde{Z},X^*\right\rangle \end{aligned}$$

is satisfied and Lemma 1 implies

$$\begin{aligned} p^* = \left\langle C,X^*\right\rangle \ge b^\top y + \left\langle \tilde{Z},X^*\right\rangle \ge b^\top y + \bar{x}\sum _{k: \lambda _k(\tilde{Z}) <0} \lambda _k(\tilde{Z}) \ge b^\top y + n \bar{x}\min \{0, \underline{z}\}, \end{aligned}$$

which proves (17). \(\square \)

Theorem 1 justifies to compute dual bounds via Algorithm 6. If the matrix \(\tilde{Z}\) defined in (16) is positive semidefinite, then \((y,\tilde{Z},S)\), is a dual feasible solution and \(b^\top y\) is already a bound. Otherwise, we decrease the dual objective function value \(b^\top y\) of the infeasible point \((y,\tilde{Z},S)\) by adding the negative term \(\bar{x}\sum \limits _{k: \lambda _k(\tilde{Z}) <0} \lambda _k(\tilde{Z})\) to it. In this way, we obtain a bound (EB in Algorithm 6) as proved by Theorem 1.

Note that for the computation of the bound of Theorem 1 it is not necessary to have a primal optimal solution \(X^*\) at hand, only an upper bound on the maximum eigenvalue of an optimal solution is needed. Such an upper bound is known for example if there is an upper bound on the maximum eigenvalue of any feasible solution.

figure f

4.2 Dual bounds through the Nightjet procedure

Next we will present a new procedure to obtain bounds. In contrast to the procedure described in the previous section, this approach will also provide a dual feasible solution. The key ingredient to obtain such dual feasible solutions will be the following lemma.

Lemma 2

We consider the primal DNN (3) and the dual DNN (4). Let \(\tilde{Z} \in {\mathscr {S}}_n^{+}\). If

$$\begin{aligned} \max _{y \in {\mathbb R}^m}\{ b^\top y \mid {\mathscr {A}}^{\top }y \le C - \tilde{Z} \} \end{aligned}$$
(18)

has an optimal solution \(\tilde{y}\), let \(\tilde{S}= C -\tilde{Z} - {\mathscr {A}}^{\top }\tilde{y}\). Then \((\tilde{y}, \tilde{S}, \tilde{Z})\) is a dual feasible solution. If (18) is unbounded, then also (4) is unbounded. If (18) is infeasible, then there is no dual feasible solution with \(\tilde{Z}\).

Proof

If (18) has an optimal solution \(\tilde{y}\), then it is easy to see that \(\tilde{S} \ge 0\) by construction. Furthermore \(\tilde{S} \in {\mathscr {S}}_n\) because C, \(\tilde{Z}\), \({\mathscr {A}}^{\top }y \in {\mathscr {S}}_n\). Therefore \((\tilde{y}, \tilde{S}, \tilde{Z})\) is a dual feasible solution. If (18) is unbounded, then the same values of y that make the objective function value of (18) arbitrarily large can be used to make the objective function value of (4) arbitrary large, hence also (4) is unbounded. Furthermore it is easy to see that (18) is feasible if there is a dual feasible solution with \(\tilde{Z}\). Hence if (18) is infeasible, then there is no dual feasible solution with \(\tilde{Z}\). \(\square \)

Let (ySZX) be any solution (not necessarily feasible) to the primal DNN (3) and the dual DNN (4). In the back of our minds we think of them as the solutions we obtained by ADAL+, DADAL+, ConicADMM3c or DADMM3c, so they are close to optimal solutions but not necessarily dual or primal feasible. We want to obtain \(\tilde{y}\), \(\tilde{S}\) and \(\tilde{Z}\) satisfying dual feasibility

$$\begin{aligned} {\mathscr {A}}^{\top }\tilde{y} +\tilde{Z}+\tilde{S}= C, \quad \tilde{Z} \in {\mathscr {S}}_n^{+}, \quad \tilde{S} \in {\mathscr {S}}_n, \quad \tilde{S} \ge 0. \end{aligned}$$
(19)

We use Lemma 2 within the Nightjet procedure for obtaining such solutions in the following way. From the given Z we obtain the new positive semidefinite matrix \(\tilde{Z}\) by projecting Z onto the positive semidefinite cone. Then we solve the linear program (18).

If (18) is infeasible, then we are neither able to construct a feasible dual solution nor to construct a dual bound. If (18) is unbounded, then also the dual DNN (4) is unbounded and hence the primal DNN (3) is not feasible. If (18) has an optimal solution \(\tilde{y}\), then we obtain a dual feasible solution \((\tilde{y}, \tilde{S}, \tilde{Z})\) with the help of Lemma 2. Furthermore the dual objective function value \(b^\top \tilde{y}\) is a bound in this case, so we can return a dual feasible solution and a bound. The Nightjet procedure is detailed in Algorithm 7.

figure g

To summarize, we have presented two different approaches to determine dual bounds for the primal DNN (3) from given y, S and Z.

Note that the approaches are in the following sense complementary to each other: In the first approach from Jansson, Chaykin and Keil we fix y and S and obtain the bound from a newly computed \(\tilde{Z}\), but we do not obtain a dual feasible solution. In our second approach, the Nightjet procedure, we fix \(\tilde{Z}\) to be the projection of Z onto the positive semidefinite cone and then construct a feasible \(\tilde{y}\) and \(\tilde{S}\) from that.

Furthermore note that in the approach of Jansson, Chaykin and Keil the obtained bound is always less or equal to the dual objective function value of y, because a negative term is added to \(b^\top y\), the dual objective function value using y. In contrast to that, it can happen in the Nightjet procedure that the bound is larger and hence better than \(b^\top y\). However, the Nightjet procedure comes with the drawback that it might be unable to produce a feasible solution. In this case one should continue running the ADMM to a higher precision and apply the procedure to the improved point.

5 Numerical experiments

In this section we present a comparison of the four ADMMs using the two procedures presented in Sect. 4 as post-processing phase. Towards that end we consider instances of one fundamental problem from combinatorial optimization, the stable set problem.

5.1 The stable set problem and an SDP relaxation

Given a graph G, let V(G) be its set of vertices and E(G) its set of edges. A subset of V(G) is called stable, if no two vertices are adjacent. The stability number \(\alpha (G)\) is the largest possible cardinality of a stable set. It is NP-hard to compute the stability number Karp (1972) and it is even hard to approximate it Håstad (1999), therefore upper bounds on the stability number are of interest. One possible upper bound is the Lovász theta function \(\vartheta (G)\), see for example Rendl (2012). The Lovász theta function is defined as the optimal value of the SDP

where J is the n-by-n matrix of all ones. Note that \(\vartheta (G)\) — as SDP of polynomial size — can be computed to arbitrary precision in polynomial time. Hence \(\vartheta (G)\) is a polynomial computable upper bound on \(\alpha (G)\).

Several attempts of improving \(\vartheta (G)\) towards \(\alpha (G)\) have been done. One of the most recent ones is including the so called exact subgraph constraints into the SDP of computing \(\vartheta (G)\), which make sure that for small subgraphs the solution is in the respective squared stable set polytope Gaar and Rendl (2019). This approach is a generalization of one of the first approaches to improve \(\vartheta (G)\) in Schrijver (1979), which consisted of adding the constraint \(X\ge 0\). Compared to \(\vartheta (G)\) this leads to an even stronger bound on \(\alpha (G)\) as the copositive cone is better approximated. We denote by \(\vartheta _+(G)\) the optimal objective function value of the DNN

(20)

Note that in the DNN (20) the matrix \(\AA {\mathscr {A}}^{\top }\) is a diagonal matrix, which leads to an inexpensive update of y in the methods discussed.

5.2 Dual bounds for \(\vartheta _+(G)\)

As already discussed in Sect. 4, for a combinatorial optimization problem like the stable set problem, bounds on the objective function value are of huge importance.

The bound according to Jansson et al. (2008) can be used for computing bounds on \(\vartheta _+(G)\) very easily: We can set \(\bar{x} = 1\), as for every feasible solution X of (20) we have \(\mathrm{trace}(X) = 1\) and \(X \in {\mathscr {S}}_n^{+}\) and hence \(\lambda _{\max }(X) \le 1\).

The computation of the dual bound with the Nightjet procedure simplifies drastically. In particular there is no need to solve the linear program (18), since the solution can be computed explicitly. To be more precise, the following holds.

Lemma 3

We consider the primal DNN (20) to compute \(\vartheta _+(G)\) and the dual of it. Let \(y_t\) be the dual variable for the constraint \(\mathrm{trace}(X) = 1\) and \(y_e\) be the dual variable for the constraint \(X_{ij} = 0\) for every edge \(e = \{i,j\} \in E(G)\). Furthermore let \(\tilde{Z} \in {\mathscr {S}}_n^{+}\) and let \( M = \max \left\{ \tilde{Z}_{ij} \mid \{i,j\} \not \in E(G) \right\} . \)

If \(M \ge 0 \), then it is not possible to construct a dual feasible solution with this \(\tilde{Z}\). If \(-1< M < 0\), then we can redefine \(\tilde{Z}\) as \(\tilde{Z} = -\frac{1}{M}\tilde{Z},\) and obtain a new \(\tilde{Z}\) for which \(M = - 1\). If \(M \le -1\), then we obtain a dual feasible solution with

$$\begin{aligned} \tilde{y}_t&= \min \left\{ - 1 - \tilde{Z}_{ii} \mid i \in \{1, 2, \dots , n\}\right\} ,\\ \tilde{y}_e&= 2(- 1 - \tilde{Z}_{ij}) \quad \quad \forall e = \{i,j\} \in E(G),\\ \tilde{S}&= C -\tilde{Z} - {\mathscr {A}}^{\top }\tilde{y}. \end{aligned}$$

Proof

We first consider the dual of (20) in more detail. To be consistent with our notation we replace the objective function \(\max \,\left\langle J, X \right\rangle \) of (20) with the equivalent objective function \(-\min \,\left\langle -J, X \right\rangle \) in order to consider a primal minimization problem as in the primal DNN (3). We introduce one dual variable \(y_t\) for the constraint \(\mathrm{trace}(X) = 1\) and one dual variable \(y_e\) for the constraint \(X_{ij} = 0\) for every edge \(e = \{i,j\} \in E(G)\). Then the dual of (20) is given as

(21)

Now we apply Lemma 2 for (20). Thus we replace the dual variable Z with the fixed \(\tilde{Z} \in {\mathscr {S}}_n^{+}\) and the linear program (18) becomes

(22)

Clearly this linear program is bounded and detecting infeasibility or constructing an optimal solution is straightforward. Indeed, let \( M = \max \left\{ \tilde{Z}_{ij}\mid \{i,j\} \not \in E(G) \right\} , \) then it is easy to see that (22) is infeasible if \(M > - 1\). However, if \(-1< M < 0\) holds, then we can redefine \(\tilde{Z}\) as \( \tilde{Z} = -\frac{1}{M}\tilde{Z}, \) and obtain a new \(\tilde{Z}\) for which \(M=-1\). On the contrary, if \( M \ge 0\), we can not update \(\tilde{Z}\) in a straightforward way. If \(M \le -1\), then (22) is feasible and we can construct the optimal solution as

$$\begin{aligned} y_t&= \min \left\{ - 1 - \tilde{Z}_{ii}\mid i \in \{1, 2, \dots , n\} \right\} ,\\ y_e&= 2(- 1 - \tilde{Z}_{ij}) \quad \quad \forall e = \{i,j\} \in E(G). \end{aligned}$$

Then we let \(\tilde{S} = C -\tilde{Z} - {\mathscr {A}}^{\top }\tilde{y}\) and due to Lemma 2 this yields a feasible dual solution \((\tilde{y}, \tilde{S}, \tilde{Z})\). \(\square \)

Hence, for computing a dual bound for \(\vartheta _+(G)\) it is not necessary to solve the linear program (18), but the solution of it can be written down explicitly. This explicit solution is used by the Nightjet procedure for \(\vartheta _+(G)\) to obtain \(\tilde{y}\). The computation of \(\tilde{Z}\) and \(\tilde{S}\) is the same as in the original Nighjet procedure. The pseudocode of the Nightjet procedure applied to the computation of \(\vartheta _+(G)\) can be found in Algorithm 8.

figure h

5.3 Comparison of the evolution of the dual bounds

In the following, we give a numerical comparison of the two procedures for the computation of bounds for \(\vartheta _+(G)\) on one instance from the second DIMACS implementation challenge Johnson and Trick (1996), namely johnson8_2_4. For this instance the stability number \(\alpha (G)\) and \(\vartheta _+(G)\) coincide and both are equal to 4.

In Fig. 1, we show the evolution of the bounds along the iterations for ADAL+, DADAL+, ConicADMM3c and DADMM3c. For each algorithm we report the dual objective function value (dualOfv), the bound computed according to Jansson et al. (2008) (EB) and the bound computed by the Nightjet procedure (NB) at every iteration.

Note that in some iterations the dual objective function value is not a bound on \(\vartheta _+(G) = 4\) and hence also not on \(\alpha (G)\). This is due to the fact that the solution considered is not dual feasible. (The criteria are satisfied only to moderate precision.)

We observe that for ADAL+, DADAL+ and ConicADMM3c the Nightjet bound is always less or equal than the error bound and in several iterations it is significantly better, in particular at the iterations in the beginning. Hence our Nightjet procedure is an effective tool to obtain dual bounds. Note that every ADMM keeps Z positive semidefinite along the iterations (see Table 1) and this may be in favor of the Nightjet procedure.

Fig. 1
figure 1

Evolution of the computed bounds on the instance johnson8_2_4

5.4 Computational setup

In our numerical experiments we compare the performance of ADAL+, DADAL+, ConicADMM3c and DADMM3c on 66 instances of the DNN (20) to compute \(\vartheta _+(G)\). The graphs are taken from the second DIMACS implementation challenge Johnson and Trick (1996). Note that in that challenge the task was to find a maximum clique of several graphs, so we consider the complement graphs of the graphs in Johnson and Trick (1996). In Table 2, for each instance on a graph G, we report its name (Problem) and its dimension (the number of vertices n and the number of edges m of G). The value of the Lovász theta function \(\vartheta (G)\) for many of these instances can be found in Giandomenico et al. (2013) and Malick et al. (2009), in this article we exclusively focus on \(\vartheta _+(G)\).

We implemented the four algorithms detailed in Sects. 2 and 3 in MATLAB R2019a. In all computations, we set the accuracy level \(\varepsilon \) to \(10^{-5}\) and we set a time limit of 3600 seconds CPU time. In both DADAL+ and DADMM3c we perform two iterations of Algorithm 3 in order to update (yV).

It is known that the performance of ADMMs strongly depends on the update of the penalty parameter \(\sigma \). In all implementations, we use the strategy described by Lorenz and Tran-Dinh (2019), so in iteration k we set

$$\begin{aligned} \sigma ^k = \frac{\Vert X^k \Vert }{\Vert Z^k \Vert }. \end{aligned}$$

The experiments were carried out on an Intel Core i7 processor running at 3.1 GHz under Linux.

5.5 Comparison between ADAL+ and DADAL+

In Table 3 we report the results obtained with ADAL+ and DADAL+ on the 66 instances of computing \(\vartheta _+(G)\) detailed in Table 2. We include the following data for the comparison: For each instance, we report its name (Problem) and its stability number (\(\alpha \)) and for each of the two algorithms, we report the dual objective function value obtained (d ofv), the bound obtained by computing the error bound described in Sect. 4.1 (EB), the bound obtained by applying the Nightjet procedure described in Sect. 4.2 (NB), the number of iterations (it) and the CPU time needed to satisfy the stopping criterion (time).

As a further comparison, we report in Fig. 2 the performance profiles of ADAL+ and DADAL+ with respect to the number of iterations and the CPU time. These performance profiles are obtained in the following way. Given our set of solvers \(\mathcal {S}\) and a set of problems \(\mathcal {P}\), we compare the performance of a solver \(s \in \mathcal {S}\) on problem \(p \in \mathcal {P}\) against the best performance obtained by any solver in \(\mathcal {S}\) on the same problem. To this end we define the performance ratio \( r_{p,s} = t_{p,s}/\min \{t_{p,s^\prime } \mid s^\prime \in \mathcal {S}\}, \) where \(t_{p,s}\) is the measure we want to compare, and we consider a cumulative distribution function \(\rho _s(\tau ) = |\{p\in \mathcal {P} \mid r_{p,s}\le \tau \}| /|\mathcal {P}|\). The performance profile for \(s \in S\) is the plot of the function \(\rho _s\).

Note that both ADAL+ and DADAL+ stopped on 7 instances because of the time limit. In the performance profiles, we exclude those instances where at least one of the solvers exceeds the time limit.

Table 2 Data of the DIMACS instances considered in Johnson and Trick (1996)
Table 3 Comparison between ADAL+ and DADAL+ on DIMACS instances Johnson and Trick (1996)

It is clear from the results on Table 3 and from the performance profiles that DADAL+ performs much less iterations than ADAL+. However, this does not always correspond to an improvement in terms of computational time as the double update of y is an expensive operation.

With respect to the CPU time, Fig. 2 shows that the performance of the two algorithms is similar, even if DADAL+ slightly outperforms ADAL+ as its curve is always above the other one.

If we consider the dual objective function value in Table 3 we see that in fact the dual objective function value obtained by ADAL+ and DADAL+ is often not a bound, for example on the instances hamming6_4, c_fat200_1, san200_0_7_1, san400_0_9_1, c_fat500_1 and c_fat500_5. This shows that a procedure for obtaining a bound from the approximate solution is indeed of major importance.

Regarding the quality of the bounds, the Nightjet procedure is able to obtain better bounds with respect to the error bounds, both when applied as post-processing phase for ADAL+ and for DADAL+, for the vast majority of the instances. The improvement is particularly impressive when looking at those instances where the time limit is exceeded. We want to further highlight that the bound obtained from the Nightjet procedure comes from a newly computed feasible dual solution. This means that applying the Nightjet procedure as post-processing does not only guarantee a bound generally better than the one obtained by the error bounds, but it also provides a dual feasible solution.

Fig. 2
figure 2

Comparison between ADAL+ and DADAL+ on DIMACS instances Johnson and Trick (1996)

5.6 Comparison between ConicADMM3c and DADMM3c

In Table 4 we report the results obtained with ConicADMM3c and DADMM3c on the 66 instances of computing \(\vartheta _+(G)\) detailed in Table 2.

As before, we report the name of the instances, the stability number and, for each algorithm, the dual objective function value obtained, the bounds obtained by computing the error bound and by applying the Nightjet procedure, the number of iterations and the CPU time needed to satisfy the stopping criterion.

Table 4 Comparison between ConicADMM3c and DADMM3c on DIMACS instances Johnson and Trick (1996)

ConicADMM3c was not able to stop within the time limit on 11 instances, while DADMM3c was not able to stop within the time limit on 15 instances.

In general, DADMM3c needs to perform much less iterations and it is slightly better than ConicADMM3c in terms of CPU time as it is confirmed by the performance profiles shown in Fig. 3. As before, we did not include the instances that exceeded time limit in the performance profiles.

Again, the Nightjet procedure is able to obtain better bounds with respect to the error bounds, both when applied as post-processing phase for ConicADMM3c and for DADMM3c, for the majority of the instances. However, there exist cases (6 instances) where the Nightjet procedure fails.

We finally mention that on several instances where the time limit was exceeded, the bounds obtained by DADMM3c are much better than those obtained by ConicADMM3c, see for example the instances p_hat1500_1, p_hat1500_2 and p_hat1500_3.

Fig. 3
figure 3

Comparison between ConicADMM3c and DADMM3c on DIMACS instances Johnson and Trick (1996)

6 Conclusions

In this paper we propose to use a factorization of the dual matrix within two ADMMs for conic programming proposed in the literature. In particular we use a first order update of the dual variables in order to improve the performance of the ADMMs considered.

Our computational results on instances from a DNN relaxation of the stable set problem show that the factorization employed gives a significant improvement in the efficiency of the methods. We are confident that this can be the case also when dealing with other structured DNNs. In particular, we experience a drastic reduction in terms of number of iterations. The performance of DADMM3c may even further improve through a smart update of the rank of Z along the iterations. This is a topic for future investigation.

In the paper we also focus on how to obtain bounds on the primal optimal objective function value, since the dual objective function value obtained when using first order methods to solve DNNs is not always guaranteed to serve as bound, as the dual solution may be infeasible. We present two methods: one that adds a sufficient (negative) perturbation to the dual objective function value (error bounds) and one that constructs a dual feasible solution (Nightjet procedure). Both methods are computationally cheap and produce bounds close to the optimal objective function value of the DNN if the obtained solution is close to the optimal solution. The Nightjet procedure works particularly well for structured instances, like computing \(\vartheta _+\), but comes with the drawback that it might fail to produce a feasible solution. However, as long as the dual solution is reasonably close to the (unknown) optimal solution, this does not happen. We also observe that the Nightjet procedure works particularly well after ADAL+ and DADAL+. This is due to the fact that in these algorithms the dual matrix (which is the input for the Nightjet procedure) is positive semidefinite by construction. The two versions of the post-processing make our methods applicable within branch-and-bound frameworks in order to solve combinatorial optimization problems with DNN relaxations.

Our plan for future research is to apply the methods to other structured DNN relaxations. Furthermore, we will expand our methods to solve SDPs with general inequality constraints instead of just nonnegativity.