Abstract
This paper introduces a new storage-optimal first-order method, CertSDP, for solving a special class of semidefinite programs (SDPs) to high accuracy. The class of SDPs that we consider, the exact QMP-like SDPs, is characterized by low-rank solutions, a priori knowledge of the restriction of the SDP solution to a small subspace, and standard regularity assumptions such as strict complementarity. Crucially, we show how to use a certificate of strict complementarity to construct a low-dimensional strongly convex minimax problem whose optimizer coincides with a factorization of the SDP optimizer. From an algorithmic standpoint, we show how to construct the necessary certificate and how to solve the minimax problem efficiently. Our algorithms for strongly convex minimax problems with inexact prox maps may be of independent interest. We accompany our theoretical results with preliminary numerical experiments suggesting that CertSDP significantly outperforms current state-of-the-art methods on large sparse exact QMP-like SDPs.
Similar content being viewed by others
Notes
Technically, these papers establish that the optimal values or optimal solutions of the SDP relaxation coincide with that of the underlying QCQP. Nonetheless, many of these sufficient conditions prove the intermediate result of strict complementarity.
In [76], a rank-k matrix \({\tilde{Y}}\in {\mathbb {S}}^n_+\) is a \((1+\zeta )\)-optimal rank-k approximation of an \(\epsilon \)-optimal solution \(Y_\epsilon \in {\mathbb {S}}^n_+\) if \(\left\Vert Y_\epsilon - {\tilde{Y}} \right\Vert _* \le (1+\zeta ) \left\Vert Y_\epsilon - [Y_\epsilon ]_k \right\Vert _*\) where \(\left\Vert \cdot \right\Vert _*\) is the nuclear norm and \([Y_\epsilon ]_k\) is the best rank-k approximation of \(Y_\epsilon \).
It is in fact true that the two sets are equal but only one direction is necessary in this proof.
References
Abbe, E., Bandeira, A.S., Hall, G.: Exact recovery in the stochastic block model. IEEE Trans. Inform. Theory 62(1), 471–487 (2015)
Alizadeh, F.: Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5(1), 13–51 (1995)
Alizadeh, F., Haeberly, J.A., Overton, M.L.: Complementarity and nondegeneracy in semidefinite programming. Math. Program. 77, 111–128 (1997)
Argue, C.J., Kılınç-Karzan, F., Wang, A.L.: Necessary and sufficient conditions for rank-one generated cones. Math. Oper. Res. 48(1), 100–126 (2023)
Baes, M., Burgisser, M., Nemirovski, A.: A randomized mirror-prox method for solving structured large-scale matrix saddle-point problems. SIAM J. Optim. 23(2), 934–962 (2013)
Beck, A.: Quadratic matrix programming. SIAM J. Optim. 17(4), 1224–1238 (2007)
Beck, A., Drori, Y., Teboulle, M.: A new semidefinite programming relaxation scheme for a class of quadratic matrix problems. Oper. Res. Lett. 40(4), 298–302 (2012)
Ben-Tal, A., Nemirovski, A.: Lectures on Modern Convex Optimization, volume 2 of MPS-SIAM Series on Optimization. SIAM (2001)
Ben-Tal, A., Nemirovski, A.: Solving large scale polynomial convex problems on \(\ell _1\)/nuclear norm balls by randomized first-order algorithms. In: CoRR (2012)
Boumal, N., Voroninski, V., Bandeira, A.: The non-convex Burer–Monteiro approach works on smooth semidefinite programs. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Burer, S., Kılınç-Karzan, F.: How to convexify the intersection of a second order cone and a nonconvex quadratic. Math. Program. 162, 393–429 (2017)
Burer, S., Monteiro, R.D.C.: A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Program. 95, 329–357 (2003)
Burer, S., Yang, B.: The trust region subproblem with non-intersecting linear constraints. Math. Program. 149, 253–264 (2014)
Burer, S., Ye, Y.: Exact semidefinite formulations for a class of (random and non-random) nonconvex quadratic programs. Math. Program. 181, 1–17 (2019)
Candès, E.J., Eldar, Y.C., Strohmer, T., Voroninski, V.: Phase retrieval via matrix completion. SIAM Rev. 57(2), 225–251 (2015)
Carmon, Y., Duchi, J.C.: Analysis of Krylov subspace solutions of regularized nonconvex quadratic problems. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 10728–10738 (2018)
Chambolle, A., Pock, T.: On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program. 159, 253–287 (2016)
Cifuentes, D.: On the Burer–Monteiro method for general semidefinite programs. Opt. Lett. 15(6), 2299–2309 (2021)
Cifuentes, D., Moitra, A.: Polynomial time guarantees for the Burer–Monteiro method. Adv. Neural. Inf. Process. Syst. 35, 23923–23935 (2022)
d’Aspremont, A., El Karoui, N.: A stochastic smoothing algorithm for semidefinite programming. SIAM J. Optim. 24(3), 1138–1177 (2014)
de Carli-Silva, M.K., Tunçel, L.: Strict complementarity in semidefinite optimization with elliptopes including the maxcut SDP. SIAM J. Optim. 29(4), 2650–2676 (2019)
Devolder, O., Glineur, F., Nesterov, Y.: First-Order Methods with Inexact Oracle: The Strongly Convex Case. Technical Report 2013016 (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1), 37–75 (2014)
Ding, L., Udell, M.: On the simplicity and conditioning of low rank semidefinite programs. SIAM J. Optim. 31(4), 2614–2637 (2021)
Ding, L., Wang, A.L.: Sharpness and well-conditioning of nonsmooth convex formulations in statistical signal recovery (2023). arXiv:2307.06873
Ding, L., Yurtsever, A., Cevher, V., Tropp, J.A., Udell, M.: An optimal-storage approach to semidefinite programming using approximate complementarity. SIAM J. Optim. 31(4), 2695–2725 (2021)
Ding, L., Yurtsever, A., Cevher, V., Tropp, J.A., Udell, M.: An optimal-storage approach to semidefinite programming using approximate complementarity. SIAM J. Optim. 31(4), 2695–2725 (2021)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Fradkov, A.L., Yakubovich, V.A.: The S-procedure and duality relations in nonconvex problems of quadratic programming. Vestnik Leningrad Univ. Math. 6, 101–109 (1979)
Friedlander, M.P., Macêdo, I.: Low-rank spectral optimization via gauge duality. SIAM J. Sci. Comput. 38(3), A1616–A1638 (2016)
Garber, D., Kaplan, A. A.: On the efficient implementation of the matrix exponentiated gradient algorithm for low-rank matrix optimization. Math. Oper. Res. (2022)
Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. ACM 42(6), 1115–1145 (1995)
Goldfarb, D., Scheinberg, K.: Interior point trajectories in semidefinite programming. SIAM J. Optim. 8(4), 871–886 (1998)
Hamedani, E.Y., Aybat, N.C.: A primal–dual algorithm with line search for general convex–concave saddle point problems. SIAM J. Optim. 31(2), 1299–1329 (2021)
Hazan, E., Koren, T.: A linear-time algorithm for trust region problems. Math. Program. 158, 363–381 (2016)
Ho-Nguyen, N., Kılınç-Karzan, F.: A second-order cone based approach for solving the Trust Region Subproblem and its variants. SIAM J. Optim. 27(3), 1485–1512 (2017)
Jeyakumar, V., Li, G.Y.: Trust-region problems with linear inequality constraints: exact SDP relaxation, global optimality and robust optimization. Math. Program. 147, 171–206 (2014)
Jiang, R., Li, D.: Novel reformulations and efficient algorithms for the Generalized Trust Region Subproblem. SIAM J. Optim. 29(2), 1603–1633 (2019)
Juditsky, A., Nemirovski, A.: First order methods for non-smooth convex large-scale optimization, ii: utilizing problems structure. Optim. Mach. Learn. 30(9), 149–183 (2011)
Kılınç-Karzan, F., Wang, A.L.: Exactness in SDP relaxations of QCQPs: theory and applications. Tut. Oper. Res. Informs (2021)
Lan, G., Lu, Z., Monteiro, R.D.C.: Primal–dual first-order methods with \(o(1/\epsilon )\) iteration-complexity for cone programming. Math. Program. 126, 1–29 (2011)
Laurent, M., Poljak, S.: On a positive semidefinite relaxation of the cut polytope. Linear Algebra Appl. 223–224, 439–461 (1995)
Levy, K.Y., Yurtsever, A., Cevher, V.: Online adaptive methods, universality and acceleration. In: Advances in Neural Information Processing Systems (2018)
Locatelli, M.: Exactness conditions for an SDP relaxation of the extended trust region problem. Oper. Res. Lett. 10(6), 1141–1151 (2016)
Locatelli, M.: KKT-based primal–dual exactness conditions for the Shor relaxation. J. Glob. Optim. 86(2), 285–301 (2023)
Lu, Z., Nemirovski, A., Monteiro, R.D.C.: Large-scale semidefinite programming via a saddle point mirror-prox algorithm. Math. Program. 109, 211–237 (2007)
Majumdar, A., Hall, G., Ahmadi, A.A.: Recent scalability improvements for semidefinite programming with applications in machine learning, control, and robotics. Annu. Rev. Control Robot. Auton. Syst. 3, 331–360 (2020)
Mixon, D.G., Villar, S., Ward, R.: Clustering subgaussian mixtures by semidefinite programming. Inf. Inference J. IMA 6(4), 389–415 (2017)
Moré, J.J.: Generalizations of the Trust Region Problem. Optim. Methods Softw. 2(3–4), 189–209 (1993)
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)
Nemirovski, A.: Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex–concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Nesterov, Y.: Excessive gap technique in nonsmooth convex minimization. SIAM J. Optim. 16(1), 235–249 (2005)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, 127–152 (2005)
Nesterov, Y.: Lectures on convex optimization. Springer Optimization and Its Applications, vol. 137. Springer (2018)
Nesterov, Y., Nemirovskii, A.: Interior-Point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia (1994)
O’Donoghue, B., Chu, E., Parikh, N., Boyd, S.: Conic optimization via operator splitting and homogeneous self-dual embedding. J. Optim. Theory Appl. 169(3), 1042–1068 (2016)
Ouyang, Y., Xu, Y.: Lower complexity bounds of first-order methods for convex–concave bilinear saddle-point problems. Math. Program. 185, 1–35 (2021)
Palaniappan, B., Bach, F.: Stochastic variance reduction methods for saddle-point problems. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Raghavendra, P.: Optimal algorithms and inapproximability results for every CSP? In: Proceedings of the Fortieth Annual ACM Symposium on Theory of Computing, pp. 245–254 (2008)
Rujeerapaiboon, N., Schindler, K., Kuhn, D., Wiesemann, W.: Size matters: cardinality-constrained clustering and outlier detection via conic optimization. SIAM J. Optim. 29(2), 1211–1239 (2019)
Sard, A.: The measure of the critical values of differentiable maps. Bull. Am. Math. Soc. 48(12), 883–890 (1942)
Shinde, N., Narayanan, V., Saunderson, J.: Memory-efficient structured convex optimization via extreme point sampling. SIAM J. Math. Data Sci. 3(3), 787–814 (2021)
Shor, N.Z.: Dual quadratic estimates in polynomial and Boolean programming. Ann. Oper. Res. 25, 163–168 (1990)
Sion, M.: On general minimax theorems. Pac. J. Math. 8(1), 171–176 (1958)
Souto, M., Garcia, J.D., Veiga, Á.: Exploiting low-rank structure in semidefinite programming by approximate operator splitting. Optimization 1–28 (2020)
Sturm, J.F., Zhang, S.: On cones of nonnegative quadratic functions. Math. Oper. Res. 28(2), 246–267 (2003)
Tseng, P.: On accelerated proximal gradient methods for convex–concave optimization (2008)
Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38(1), 49–95 (1996)
Waldspurger, I., Waters, A.: Rank optimality for the Burer–Monteiro factorization. SIAM J. Optim. 30(3), 2577–2602 (2020)
Wang, A.L., Kılınç-Karzan, F.: A geometric view of SDP exactness in QCQPs and its applications (2020). arXiv:2011.07155
Wang, A.L., Kılınç-Karzan, F.: The generalized trust region subproblem: solution complexity and convex hull results. Math. Program. 191(2), 445–486 (2022)
Wang, A.L., Kılınç-Karzan, F.: On the tightness of SDP relaxations of QCQPs. Math. Program. 193(1), 33–73 (2022)
Wang, A.L., Lu, Y., Kılınç-Karzan, F.: Implicit regularity and linear convergence rates for the generalized trust-region subproblem. SIAM J. Optim. 33(2), 1250–1278 (2023)
Yang, H., Liang, L., Carlone, L., Toh, K.: An inexact projected gradient method with rounding and lifting by nonlinear programming for solving rank-one semidefinite relaxation of polynomial optimization. Math. Program. 201(1–2), 409–472 (2023)
Yurtsever, A., Fercoq, O., Cevher, V.: A conditional-gradient-based augmented Lagrangian framework. In: International Conference on Machine Learning, pp. 7272–7281 (2019)
Yurtsever, A., Tropp, J.A., Fercoq, O., Udell, M., Cevher, V.: Scalable semidefinite programming. SIAM J. Math. Data Sci. 3(1), 171–200 (2021)
Acknowledgements
This research is supported in part by ONR Grant N00014-19-1-2321 and AFOSR Grant FA9550-22-1-0365. The authors wish to thank the review team for their feedback and suggestions that led to an improved presentation of the material.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Deferred proofs
Proof of Lemma 3
Let \(\varDelta :={\tilde{X}} - X_L\). Then,
where the second equality follows from expanding the square and the fact that \(L = {\tilde{L}} + {\tilde{\mu }}\). Moreover,
Combining these two inequalities gives
\(\square \)
The following proof is adapted from [54].
Proof of Lemma 4
It is evident that \(\phi _t(X)\) are quadratic matrix functions of the form (8) with \(V_0=X_0\) and \(\phi _0^*=Q(X_0)\). The remainder of the proof verifies the recurrences on \(V_{t+1}\) and \(\phi ^*_{t+1}\). We suppose that the stated form holds for some t, and we will show that it will hold for \(t+1\) as well. We compute
We deduce that \(V_{t+1} = (1-\alpha )V_t + \alpha \left( \varXi _t - \frac{1}{{\tilde{\mu }}}{\tilde{g}}_t\right) \). Noting that \(\phi _{t+1}^*=\phi _{t+1}(V_{t+1})\), and applying the recursive definition of \(\phi _{t+1}(X)\) gives us
where the third equation follows from substituting the expression for \(V_{t+1}\), and the last one from regrouping the terms. \(\square \)
The following proof is adapted from [54, Page 92].
Proof of Lemma 5
Note that
Therefore,
Then,
\(\square \)
Proof of Lemma 6
It is clear that \(Q(X_0)\le \phi _0^*\). Thus, consider \(X_{t+1}\) with \(t\ge 0\). By induction and Lemma 4,
As \(X_{t+1}\) satisfies \(Q_L(\varXi _t; X_{t+1}) \le Q^*(\varXi _t) + \epsilon _t\), we deduce (see Theorem 2) that
These two inequalities together lead to
It is straightforward to show that the two quantities on the final line are identically zero using the relations \(\alpha ^2 = {\tilde{\mu }} / {\tilde{L}}\) and \(\varXi _t = \tfrac{X_t + \alpha V_t}{1+\alpha }\) (see Lemma 5). \(\square \)
Proof of Lemma 7
The statement holds holds for \(t = 0\). Thus, consider \(\phi _{t+1}\) for \(t\ge 0\). By definition
As \(X_{t+1}\) satisfies \(Q_L(\varXi _t; X_{t+1}) \le Q^*(\varXi _t) + \epsilon _t\), we deduce (see Theorem 2) that
Then, these inequalities combined with the inductive hypothesis give
\(\square \)
Proof of Corollary 1
Let \(X^*_\mathcal{U}\) denote the optimizer of (QMMP) so that \(Q(X^*_\mathcal{U}) = {{\,\textrm{Opt}\,}}_{\text {(QMMP)}}\). Then, Lemmas 6 and 7 give
Note also that by the definition of \(\phi _0(\cdot )\) and the \(\mu \)-strong convexity of Q, we have
Combining the two inequalities completes the proof. \(\square \)
Proof of Lemma 8
Let \({\tilde{\gamma }}\in \mathop {\mathrm {arg\,max}}\limits _{\gamma \in \mathcal{U}}q(\gamma ,X_0)\). By \(\mu \)-strong convexity of Q(X), we have that
In particular, taking \(X = \mathop {\mathrm {arg\,min}}\limits _{X\in {\mathbb {R}}^{(n-k)\times k}} Q(X)\) gives
where the last inequality follows from Assumption 4. This proves the first claim. Next, by Theorem 3, we have that for all \(t\ge 0\), that \(Q(X_t)-Q(X_0)\le Q(X_t)-{{\,\textrm{Opt}\,}}_{\text {(QMMP)}} \le 2\mu \kappa ^2 R^2\) and hence
Using the assumption \(X_0 = 0_{(n-k)\times k}\) in Assumption 4 and applying triangle inequality together with the bound \(\left\Vert \nabla _2\,q({\tilde{\gamma }}, X_0) \right\Vert _F^2\le \mu ^2\kappa ^2 R^2\) derived from Assumption 4, we deduce that for all \(t\ge 0\),
Then, as \(\varXi _{t+1} = X_{t+1} + \frac{1-\alpha }{1+\alpha }\left( X_{t+1} - X_t\right) \), we have
\(\square \)
Proof of Lemma 9
Recall that by definition, the linear operator \(\mathcal{G}\) maps \(\gamma \) to \(\sum _{i=1}^m \gamma _i \left( A_i \varXi _t + B_i\right) \). Thus, for any \(\gamma \in {\textbf{S}}^{m-1}\),
\(\square \)
Proof of Lemma 10
Let \(r :=\left\Vert \gamma ^{(i)} - \gamma ^* \right\Vert _2\). Using Assumption 5, we may bound the individual terms within the definition of \(r^{(i)}\) as
Thus, \(r^{(i)} \ge \min \left( \frac{{\hat{\mu }}}{2{\hat{\rho }}},\,\frac{{\hat{\mu }}}{2{\hat{\rho }}} - r\right) = \frac{{\hat{\mu }}}{2{\hat{\rho }}} - r\). Then, when \(r\le \frac{{\hat{\mu }}}{4{\hat{\rho }}}\), we have \(r^{(i)}>0\) and furthermore, \(r^{(i)}\ge r = \left\Vert \gamma ^{(i)} - \gamma ^* \right\Vert _2\). \(\square \)
Proof of Lemma 11
Begin by noting that for all \(\gamma \in \mathcal{U}^{(i)}\),
Let \({\tilde{\gamma }}\in \mathop {\mathrm {arg\,max}}\limits _{\gamma \in \mathcal{U}^{(i)}}q(\gamma ,0_{(n-k)\times k})\). Then,
Next, for \(\gamma \in {\textbf{S}}^{m-1}\)
\(\square \)
Lemma 13
Consider an instance of (14) generated by the random procedure in Sect. 5.2. Then equality holds throughout (14).
Proof
It suffices to show that \(\gamma ^*\) and \(T^*\) are feasible and achieve value \(\left\Vert X^* \right\Vert _F^2\) in the dual SDP (i.e., the third line of (14)).
Note that by Schur Complement Theorem,
Here, \(\sim \) indicates matrix similarity. Thus, \(\gamma ^*\) and \(T^*\) are feasible in the dual SDP.
Next,
\(\square \)
Strict complementarity in quadratic matrix programs
In this section, we show that a generic quadratic matrix program (QMP) in an \(n\times k\) dimensional matrix variable with at most k constraints satisfies strict complementarity (assuming only existence of primal and dual solutions).
We will need the following lemma stating that a generic bilinear system has only the trivial solutions. This lemma follows from basic dimension-counting arguments in algebraic geometry. However, we will instead prove the lemma directly using only elementary tools.
Lemma 14
Let \(n,p\in {\mathbb {N}}\) and consider the space \(({\mathbb {R}}^{n\times p})^{n+p-1}\). Let the collection \((A_i) = (A_1,\ldots , A_{n+p-1})\) denote an element of this space. Here, each \(A_i\in {\mathbb {R}}^{n\times p}\). Then, the collections \((A_i)\) for which the bilinear system
has a nontrivial solution (i.e., where \(x\in {\mathbb {R}}^n\) and \(y\in {\mathbb {R}}^p\) are both nonzero) forms a set of measure zero in \(({\mathbb {R}}^{n\times p})^{n+p-1}\).
Proof
Let \(\mathcal{S}\) be the exceptional set, i.e.,
By homogeneity, we may require that \(x\in {\mathbb {R}}^n\) has some coordinate equal to one. Similarly, we will require that \(y\in {\mathbb {R}}^p\) has some coordinate equal to one. Thus, we may decompose \(\mathcal{S}= \bigcup _{\ell =1}^n\bigcup _{r=1}^p \mathcal{S}_{\ell ,r}\), where
We will show that for each \(\ell \in [n]\) and \(r\in [p]\) that \(\mathcal{S}_{\ell ,r}\) has measure zero. Without loss of generality, let \(\ell = r= 1\).
Consider the affine space
Let \(\mathcal{F}_{1,1}: \mathcal{M}\rightarrow ({\mathbb {R}}^{n\times p})^{n+p-1}\) send the element \((x,y,B_1,\ldots ,B_{n+p-1})\) to \((A_1,\ldots ,A_{n+p-1})\) where
One may verify that \(\mathcal{F}_{1,1}\) is a smooth map. Furthermore, its domain has dimension \((n-1) + (p-1) + (np - 1)(n+p-1) = np(n+p-1) - 1\). This is one less than the dimension of the space \(\left( {\mathbb {R}}^{n\times p}\right) ^{n+p-1}\). It is known that the image of a Euclidean space under a smooth map into a Euclidean space of higher dimension must have Lebesgue measure zero (see Sard’s lemma [61]). Thus, \(\mathcal{F}_{1,1}(\mathcal{M})\) has Lebesgue measure zero.
It remains to verifyFootnote 3 that \(\mathcal{S}_{1,1}\subseteq \mathcal{F}_{1,1}(\mathcal{M})\). Suppose \((A_i)\in \mathcal{S}_{1,1}\) and let x, y with \(x_1 = y_1 = 1\) satisfy \(x^\intercal A_i y = 0\) for all \(i\in [n+p-1]\). Let
Note that \((B_i)_{1,1} = \frac{1}{\left\Vert x \right\Vert ^2\left\Vert y \right\Vert ^2}x^\intercal A_i y = 0\) for all \(i\in [n+p-1]\). The remaining sets \(\mathcal{S}_{\ell ,r}\) can be shown to have measure zero using analogous maps \(\mathcal{F}_{\ell ,r}\). This concludes the proof. \(\square \)
Lemma 15
Let \(n,k\in {\mathbb {N}}\) and consider the SDP relaxation of a QMP with k constraints in a variable of size \(n\times k\) and its dual:
There exists a subset \(\mathcal{E}\subseteq ({\mathbb {S}}^n)^{1+k} \times ({\mathbb {R}}^{n\times k})^{1+k}\) of measure zero such that if
and the primal and dual SDPs are both solvable, then strict complementarity holds and the primal and dual SDPs both have unique optimizers.
Proof
We will condition on the following bilinear system in the variables \((\gamma _\text {obj},\ldots ,\gamma _k)\in {\mathbb {R}}^{1+k}\) and \(x\in {\mathbb {R}}^n\) having no nontrivial solutions:
This is a homogeneous bilinear system in \(n + 1 + k\) variables with \(n+k\) constraints. Thus, by Lemma 14, this system has no nontrivial solutions outside an exceptional set \(\mathcal{E}\) of measure zero.
Let \((\gamma ^*,T^*)\) denote a dual optimum solution. We claim that \(A(\gamma ^*)\succ 0\). For the sake of contradiction, assume that \(x\in \ker (A(\gamma ^*))\) is nonzero. Then, by assumption, \(x^*\) and \((1,\gamma ^*)\) are not a solution to the bilinear system above, i.e., \(B(\gamma ^*)^\intercal x \ne 0\) and there exists a column of \(B(\gamma ^*)\), say the first column, that has nonzero inner product with x. This contradicts the feasibility of \((\gamma ^*, T^*)\). Specifically for \(\alpha \in {\mathbb {R}}\),
Picking \(\alpha \) large or small enough makes this quantity negative, contradicting that the matrix on the left is positive semidefinite.
We have shown that for every dual optimum solution \((\gamma ^*, T^*)\), the above slack matrix has rank at least n. Similarly, any primal optimum solution \(Y^*\) must have rank at least k. We deduce that every primal optimum solution \(Y^*\) has rank exactly k and that for every dual optimum solution \((\gamma ^*, T^*)\), the slack matrix has rank exactly n. Now, these optimizers must correspond to faces of slices of \({\mathbb {S}}^{n+k}_+\). As the only faces of slices of \({\mathbb {S}}^{n+k}_+\) with constant rank are singleton sets, we deduce that there is a unique primal optimizer and a unique dual optimizer. \(\square \)
Additional experiments on phase-retrieval inspired SDP instances
We perform additional experiments on SDP instances inspired by the phase retrieval problem.
The phase retrieval problem seeks to learn a vector \(x^*\) given only the magnitudes of linear measurements of \(x^*\), and finds applications in imaging. In the Gaussian model of phase retrieval [15], we assume \(x^*\in {\mathbb {R}}^n\) is arbitrary and \(G\in {\mathbb {R}}^{m \times n}\) is entrywise Gaussian with an appropriate normalization. We are given
Here, the absolute value is taken entrywise. Equivalently, we are given the entrywise square of \(Gx^*\), or \(b = {{\,\textrm{diag}\,}}(Gx^*(x^*)^\intercal G^\intercal )\). In this setting, it is known that the PhaseLift SDP,
has \((x^*)(x^*)^\intercal \) as its unique solution with high probability once the number of observations m is roughly O(n). Recent work [25] shows that strict complementarity holds between this SDP and its dual with high probability in the same regime.
We note that the Gaussian model of phase retrieval requires storing the matrix G as part of the instance. This is a matrix of size \(O(n^2)\) and thus limits the size of our current experiments. Nonetheless, we expect the behavior we observe with these experiments to hold in the real setting of phase retrieval where the matrix G is highly structured and can be stored implicitly. We leave this as important future work.
We compare CertSDP (Algorithm 2), CSSDP [26], SketchyCGAL [76], ProxSDP [65], SCS [56], and Burer–Monteiro [12].
Random instance generation. We generate instances as follows. Suppose n is given. We set \(m = 5n\). We generate \(G\in {\mathbb {R}}^{m\times n}\) where each entry is independent N(0, 1/m). We then preprocess G so that its mth observation vector, i.e., the mth row of G, is parallel to \(e_n\). Next, we sample \(x^*\) uniformly form
Thus, this is a random instance of phase retrieval where we are given one highly-correlated observation.
Implementation details. The algorithms we test are mostly as described in Sect. 5.1. The major differences in implementation are described below:
-
In the instances tested in Sect. 5, the \(A_i\) matrices encountered were sparse. In the phase retrieval problems we test in this appendix, the \(A_i\) matrices are dense but rank-one. The implementations of CertSDP, CSSDP, and SketchyCGAL are modified to handle such instances.
-
Phase retrieval instances are likely to contain many dual optimal solutions that may not satisfy strict complementarity. Within CertSDP and CSSDP, we employ the Accelegrad algorithm to approximately solve
$$\begin{aligned}&\max _{\gamma \in {\mathbb {R}}^m} \beta ^\intercal \gamma + {\textrm{penalty}} \min \left( 0, \lambda _{\min }\left( I - G^\intercal {{\,\textrm{Diag}\,}}(\gamma )G\right) \right) \\&\quad + \min (0, \lambda _{1+2}(I - G^\intercal {{\,\textrm{Diag}\,}}(\gamma )G) - 0.1). \end{aligned}$$Here, \(\lambda _{1+2}(\cdot )\) denotes the sum of the two smallest eigenvalues of a given matrix and is a concave expression in its input. This penalization/regularization encourages solutions \(\gamma \) for which the second eigenvalue of \(I - G^\intercal {{\,\textrm{Diag}\,}}(\gamma )G\) is positive, so that \(A(\gamma )\succ 0\). We set \({{\textrm{penalty}}} = 10\).
-
Recall that in Sect. 5, we replaced the random sketch in SketchyCGAL with a projection onto a submatrix to reflect the fact that for QMP instances, the goal is to recover the \((n-k)\times k\) top-right submatrix of the SDP optimizer. For the phase retrieval experiments, we employ the random sketch as originally described in [75].
Numerical results. Due to memory constraints associated with storing \(G\in {\mathbb {R}}^{m\times n}\), we test instances with size \(n = 30,\, 100,\,300\). We set the time limit to 50, 500 and 5000 s respectively. The results are summarized in Tables 4, 5 and 6. The average memory usage of the algorithms is plotted in Fig. 4. We compare the convergence behavior of CertSDP with that of CSSDP and SketchyCGAL on a single instance of each size in Fig. 5.
The results for these experiments are qualitatively similar to those of Appendix 5. We make a few additional observations:
-
On these phase retrieval instances, the dual suboptimality decreases to \(\approx 10^{-3}\) before CertSDP seems to find a certificate of strict complementarity (see Fig. 5). This suggests that the value of \(\mu ^*\) in these instances is relatively small.
-
CSSDP outperforms SketchyCGAL and also outperforms CertSDP initially. The “crossover” point where CertSDP outperforms CSSDP occurs only after CSSDP is able to produce a primal iterate with squared error \(\approx 10^{-7}\).
-
CertSDP seems to suffer from numerical issues for \(n = 300\) and is unable to decrease the primal squared error beyond \(10^{-10}\). Nonetheless, CertSDP outperforms CSSDP and SketchyCGAL on all instances tested.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, A.L., Kılınç-Karzan, F. Accelerated first-order methods for a class of semidefinite programs. Math. Program. (2024). https://doi.org/10.1007/s10107-024-02073-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10107-024-02073-4