Skip to main content
Log in

A Faster Interior-Point Method for Sum-of-Squares Optimization

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let \(p = \sum _i q^2_i\) be an n-variate SOS polynomial of degree 2d. Denoting by \(L:= \left( {\begin{array}{c}n+d\\ d\end{array}}\right) \) and \(U:= \left( {\begin{array}{c}n+2d\\ 2d\end{array}}\right) \) the dimensions of the vector spaces in which \(q_i\)’s and p live respectively, our algorithm runs in time \({\tilde{O}}(LU^{1.87})\). This is polynomially faster than state-of-art SOS and semidefinite programming solvers, which achieve runtime \({\tilde{O}}(L^{0.5}\min \{U^{2.37}, L^{4.24}\})\). The centerpiece of our algorithm is a dynamic data structure for maintaining the inverse of the Hessian of the SOS barrier function under the polynomial interpolant basis, which efficiently extends to multivariate SOS optimization, and requires maintaining spectral approximations to low-rank perturbations of elementwise (Hadamard) products. This is the main challenge and departure from recent IPM breakthroughs using inverse-maintenance, where low-rank updates to the slack matrix readily imply the same for the Hessian matrix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. A subset \({\mathcal {K}}\subset \mathbb {R}^N\) is a convex cone if \(\forall \; x,y \in {\mathcal {K}}\) and \(\alpha , \beta \in \mathbb {R}_+\), \(\alpha x+ \beta y \in {\mathcal {K}}\).

  2. We use \(\widetilde{O}(\cdot )\) to hide \(U^{o(1)}\) and \(\log (1/\delta )\) factors.

  3. Any set of points in \(\mathbb {R}^n\) for which the evaluation of a polynomial in \(\mathcal {V}_{n,d}\) on these points uniquely defines the polynomial.

  4. This equation means \(\forall t \in \mathbb {R}^n\), \(\Lambda ([q_1(t), \ldots , q_U(t)]^\top ) = [p_1(t), \ldots , p_L(t)]^\top \cdot [p_1(t), \ldots , p_L(t)]\).

  5. When solving WSOS, [17] has running time \(O((kL)^{0.5} \cdot (kU L^2 + U^{\omega } + (kL)^{\omega }))\), and [14] has running time \(O((kL)^{0.5} \cdot (U^2 + (kL)^4) + U^{\omega } + (kL)^{2 \omega })\). Here, we were able to drop a factor in k due to the block-diagonal structure of the constraint matrix.

References

  1. Alman, J., Williams, V.V.: A refined laser method and faster matrix multiplication. In: Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 522–539. SIAM (2021)

  2. Ballinger, B., Blekherman, G., Cohn, H., Giansiracusa, N., Kelly, E., Schürmann, A.: Experimental study of energy-minimizing point configurations on spheres. Exp. Math. 18(3), 257–283 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bos, L., De Marchi, S., Sommariva, A., Vianello, M.: Computing multivariate Fekete and Leja points by numerical linear algebra. SIAM J. Numer. Anal. 48(5), 1984–1999 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  4. Barak, B., Hopkins, S.B., Kelner, J.A., Kothari, P.K., Moitra, A., Potechin, A.: A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM J. Comput. 48(2), 687–735 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bläser, M.: Fast matrix multiplication. In: Theory of Computing, pp. 1–60 (2013)

  6. Blekherman, G., Parrilo, P.A., Thomas, R.R.: Semidefinite Optimization and Convex Algebraic Geometry. SIAM (2012)

    Book  MATH  Google Scholar 

  7. Barak., B., Raghavendra., P., Steurer, D.: Rounding semidefinite programming hierarchies via global correlation. In: 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 472–481. IEEE (2011)

  8. Bachoc, C., Vallentin, F.: New upper bounds for kissing numbers from semidefinite programming. Technical report, Journal of the American Mathematical Society (2006)

  9. Cohen, M.B., Lee, Y.T., Song, Z.: Solving linear programs in the current matrix multiplication time. In: Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC) (2019)

  10. Eisenbrand, F., Grandoni, F.: On the complexity of fixed parameter clique and dominating set. Theor. Comput. Sci. 326(1–3), 57–67 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ghaddar, B., Marecek, J., M, M.: Optimal power flow as a polynomial optimization problem. IEEE Trans. Power Syst. 31, 539–546 (2016)

    Article  Google Scholar 

  12. Gall, F.L., Urrutia, F.: Improved rectangular matrix multiplication using powers of the coppersmith-winograd tensor. In: Proceedings of the 2018 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1029–1046. SIAM (2018)

  13. Heß, R., Henrion, D., Lasserre, J.-B., Pham, T.S.: Semidefinite approximations of the polynomial abscissa. SIAM J. Control Optim. 54(3), 1633–1656 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  14. Huang, B., Jiang, S., Song, Z., Tao, R., Zhang, R.: Solving sdp faster: a robust ipm framework and efficient implementation (2021)

  15. Hopkins, S.B., Kothari, P.K., Potechin, A., Raghavendra, P., Schramm, T., Steurer, D.: The power of sum-of-squares for detecting hidden structures. In: 58th IEEE Annual Symposium on Foundations of Computer Science, (FOCS), pp. 720–731. IEEE Computer Society (2017)

  16. Hopkins, S.B., Li, J.: Mixture models, robustness, and sum of squares proofs. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, (STOC), pp. 1021–1034. ACM (2018)

  17. Jiang, H., Kathuria, T., Lee, Y.T., Padmanabhan, S., Song, Z.: A faster interior point method for semidefinite programming. In: 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 910–918. IEEE (2020)

  18. Jiang, S., Man, Y., Song, Z., Yu, Z., Zhuo, D.: Fast graph neural tangent kernel via kronecker sketching (2021). arXiv preprint arXiv:2112.02446, AAAI’22

  19. Karmarkar, N.: A new polynomial-time algorithm for linear programming. In: Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC), pp. 302–311 (1984)

  20. Lasserre, J.B.: An Introduction to Polynomial and Semi-Algebraic Optimization. Cambridge Texts in Applied Mathematics. Cambridge University Press (2015)

    Google Scholar 

  21. Laurent, M.: Sums of Squares, Moment Matrices and Optimization over Polynomials. Number 149 in The IMA Volumes in Mathematics and Its Applications Series, pp. 155–270. Springer, Germany (2009)

  22. Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, pp. 296–303 (2014)

  23. Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in \({\tilde{O}}(\sqrt{rank})\) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), pp. 424–433. IEEE (2014)

  24. Lee, Y.T., Song, Z., Zhang, Q.: Solving empirical risk minimization in the current matrix multiplication time. In: Conference on Learning Theory (COLT), pp. 2140–2157. PMLR (2019)

  25. Nesterov, Y.: Squared functional systems and optimization problems. In: High Performance Optimization, pp. 405–440. Springer (2000)

  26. Nesterov, Y., Nemirovski, A.: Interior-point polynomial algorithms in convex programming. In: Siam Studies in Applied Mathematics (1987)

  27. Pan, V.Y.: Structured Matrices and Polynomials. Birkhäuser, Boston (2001)

    Book  MATH  Google Scholar 

  28. Papp, D.: Optimal designs for rational function regression. J. Am. Stat. Assoc. 107(497), 400–411 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  29. Parrilo, P.: Sum of Squares: Theory and Applications: AMS Short Course, Sum of Squares: Theory and Applications, January 14–15, 2019, Baltimore, Maryland. American Mathematical Society, Providence (2020)

    MATH  Google Scholar 

  30. Putinar, M., Vasilescu, F.-H.: Positive polynomials on semi-algebraic sets. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics 328(7), 585–589 (1999)

    MathSciNet  MATH  Google Scholar 

  31. Papp, D., Yildiz, S.: Sum-of-squares optimization without semidefinite programming. SIAM J. Optim. 29(1), 822–851 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  32. Roh, T., Dumitrescu, B., Vandenberghe, L.: Multidimensional FIR filter design via trigonometric sum-of-squares optimization. J. Sel. Top. Signal Process. 1(4), 641–650 (2007)

    Article  Google Scholar 

  33. Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. Society for Industrial and Applied Mathematics (2001)

    Book  MATH  Google Scholar 

  34. Strang, G.: Karmarkar’s algorithm and its place in applied mathematics. Math. Intell. 9(2), 4–10 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  35. Sommariva, A., Vianello, M.: Computing approximate Fekete points by QR factorizations of Vandermonde matrices. Comput. Math. Appl. 57(8), 1324–1336 (2009)

    MathSciNet  MATH  Google Scholar 

  36. Song, Z., Yang, S., Zhang, R.: Does preprocessing help training over-parameterized neural networks? In: Advances in Neural Information Processing Systems, vol. 34 (2021)

  37. Song, Z., Zhang, L., Zhang R.: Training multi-layer over-parametrized neural network in subquadratic time (2021). arXiv preprint arXiv:2112.07628

  38. Tan, N.: On the power of Lasserre SDP hierarchy. Ph.D. thesis, EECS Department, University of California, Berkeley (2015)

  39. Vaidya, P.M.: Speeding-up linear programming using fast matrix multiplication. In: 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 332–337. IEEE (1989)

  40. van den Brand, J., Peng, B., Song, Z., Weinstein, O.: Training (overparametrized) neural networks in near-linear time. In: 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), vol. 185, pp. 63:1–63:15 (2021)

  41. Yinyu, Y., Michael J, T., Shinji, M.: An \({O}(\sqrt{n}{L})\)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1), 53–67 (1994)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The second author would like to thank Vissarion Fisikopoulos and Elias Tsigaridas for introducing him from a practical perspective to Sum-of-Square optimization under the interpolant basis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bento Natura.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supported by NSF CAREER award CCF-1844887. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757481-ScaleOpt). Supported by NSF CAREER Award CCF-1844887 and ISF Grant #3011005535.

Appendices

A Initialization

There exist standard techniques to transform a convex program to a form that has an easily obtainable strictly feasible point, see e.g. [41]. We follow the initialization procedure presented by [9, 17] and adapt to SOS optimization. Similar initialization lemma exists for WSOS optimization.

Let the matrix \(P \in \mathbb {R}^{U \times L}\) and the operator \(\Lambda : \mathbb {R}^U \rightarrow \mathbb {R}^{L \times L}\) that \(\Lambda (s) = P^{\top } {{\,\textrm{diag}\,}}(s) P\) be defined as in the interpolant basis paragraph of Sect. 3.

Lemma A.1

(Initialization) Given an instance of (SOS) that fulfills Slater’s condition, and let R be an upper bound on the \(\ell _1\)-norm of the primal feasible solutions, i.e. all primal feasible x of (SOS) fulfill \(\Vert x\Vert _1 \le R\), and let \(\delta \in (0,1)\). We define \(\overline{A} \in \mathbb {R}^{(m+1)\times (U+2)}\), \(\overline{b} \in \mathbb {R}^{m + 1}\), and \(\overline{c} \in \mathbb {R}^{U + 2}\) as

$$\begin{aligned} \overline{A} = \begin{bmatrix} A &{} 0 &{} \frac{1}{R}b - A\overline{g}^0 \\ 1_U^\top &{} 1 &{} 0 \end{bmatrix},\;\; \overline{b} = \begin{bmatrix} \frac{1}{R}b \\ 1 + \left\langle 1_U, \overline{g}^0 \right\rangle \end{bmatrix},\;\; \overline{c} = \begin{bmatrix} \frac{\delta }{\Vert c\Vert _\infty } c \\ 0 \\ 1 \end{bmatrix}, \end{aligned}$$

and let

$$\begin{aligned} \overline{x}^0 = \begin{bmatrix} \overline{g}^0 \\ 1 \\ 1 \end{bmatrix} \in \mathbb {R}^{U + 2}, \;\; \overline{y}^0 = \begin{bmatrix} 0_m \\ -1 \end{bmatrix} \in \mathbb {R}^{m + 1},\; \text {and } \overline{s}^0 = \begin{bmatrix} 1_U + \frac{\delta }{\Vert c\Vert _\infty } c \\ 1 \\ 1 \end{bmatrix} \in \mathbb {R}^{U + 2}, \end{aligned}$$

where \(\overline{g}^0 = g_{\Sigma ^*}(\overline{s}^0_{[:U]}) \in \mathbb {R}^U\) for the gradient function \(g_{\Sigma ^*}(s):= {{\,\textrm{diag}\,}}(P(P^\top {{\,\textrm{diag}\,}}(s) P)^{-1} P^\top )\) that maps from \(\mathbb {R}^U\) to \(\mathbb {R}^U\). This defines the auxiliary primal-dual system

figure i

Then \((\overline{x}^0, \overline{y}^0, \overline{s}^0)\) are feasible to the auxiliary system (Aux-SOS).

Further, under the canonical barrier (we use \(\overline{a}_{i}\) to denote the i-th column of \(\overline{A}\)):

$$\begin{aligned} \overline{F}_\eta (\overline{y})= & {} -\eta \left\langle \overline{y}, \overline{b} \right\rangle - \log \det \Big (\Lambda \big ((\overline{c} - \overline{A}^\top \overline{y})_{[:U]}\big )\Big ) - \log (\overline{c}_{U+1}- \left\langle \overline{a}_{U+1}, \overline{y} \right\rangle ) \\{} & {} - \log (\overline{c}_{U+2}- \left\langle \overline{a}_{U+2}, \overline{y} \right\rangle ), \end{aligned}$$

we have that \(\Vert \overline{g}_{\eta ^0}(\overline{y}^0)\Vert _{\overline{H}(\overline{y}^0)^{-1}} = 0\) for \(\eta ^0 = 1\).

Further, for any solution \((\overline{x}, \overline{y}, \overline{s})\) to (Aux-SOS) with duality gap \(\le \delta ^2\), its restriction \(\widehat{x}:= \overline{x}_{[:U]}\) fulfills

$$\begin{aligned} \begin{aligned} \left\langle c, \widehat{x} \right\rangle&\le \min _{Ax = b, x \in \Sigma _{n,2d}} \left\langle c, x \right\rangle + \delta \cdot R \Vert c\Vert _\infty , \\ \Vert A \widehat{x} - b\Vert _1&\le 8\delta L \cdot (LR \Vert A\Vert _\infty + \Vert b\Vert _1),\\ \widehat{x}&\in \Sigma _{n,2d}. \end{aligned} \end{aligned}$$

Proof

Let \({\text {proj}}\) denote the orthogonal projection matrix that maps a matrix onto its image space. Then note that

$$\begin{aligned} \begin{aligned} \Vert \overline{g}^0\Vert _1&= \Big \Vert {{\,\textrm{diag}\,}}\Big (P\Big (P^\top {{\,\textrm{diag}\,}}(1_U + \frac{\delta }{\Vert c\Vert _\infty } c) P\Big )^{-1} P^\top \Big )\Big \Vert _1 \\&= {{\,\textrm{tr}\,}}\Big ( {{\,\textrm{diag}\,}}\big (1 + \frac{\delta }{\Vert c\Vert _\infty } c\big )^{-1} \cdot {\text {proj}}\big ({{\,\textrm{diag}\,}}(1_U + \frac{\delta }{\Vert c\Vert _\infty } c)^{1/2} \cdot P\big )\Big ) \\&= (1 \pm \delta )L, \end{aligned} \end{aligned}$$
(33)

where the last inequality follows as \({{\,\textrm{tr}\,}}({\text {proj}}(P)) = {\text {rk}}(P) = L\).

Straightforward calculations show that \(\overline{A}\overline{x}^0 = \overline{b}\) and \(\overline{A}^\top \overline{y}^0 + \overline{s}^0 = \overline{c}\). Further, note that \(\overline{s}^0 > 0\) as \(\delta < 1\) and therefore \(\overline{s}^0 \in \mathbb {R}_+^{U+2} \subset \Sigma _{n,2d}^* \times \mathbb {R}_+^2\). The containment \(\mathbb {R}_+^U \subset \mathbb {R}_{\ge 0}^U \subset \Sigma _{n,2d}^*\) is clear by the characterisation \(\Sigma _{n,2d}^* = \{s: \Lambda (s) = P^\top {{\,\textrm{diag}\,}}(s) P \succeq 0\}\).

Further \(\overline{x}^0_{[:U]} = g_{\Sigma ^*}(s^0_{[:U])}) = {{\,\textrm{diag}\,}}(P(P^\top {{\,\textrm{diag}\,}}(\overline{s}^0_{[:U]}) P)^{-1} P^\top ) \in \Sigma _{n,2d}\) is e.g. shown in Proposition 3.3.3 in [33]. We make it explicit by showing that \(\langle \overline{x}^0_{[:U]}, {\hat{s}} \rangle \ge 0\) for any \({\hat{s}} \in \Sigma _{n,2d}^*\). Indeed,

$$\begin{aligned} \begin{aligned} \langle \overline{x}^0_{[:U]}, {\hat{s}} \rangle&= \left\langle {{\,\textrm{diag}\,}}(P(P^\top {{\,\textrm{diag}\,}}(\overline{s}^0_{[:U]}) P)^{-1} P^\top ), {\hat{s}} \right\rangle \\&= {{\,\textrm{tr}\,}}\big ( (P(P^\top {{\,\textrm{diag}\,}}(\overline{s}^0_{[:U]}) P)^{-1} P^\top ) \cdot {{\,\textrm{diag}\,}}({\hat{s}})\big ) \\&= {{\,\textrm{tr}\,}}{\big ((P^\top {{\,\textrm{diag}\,}}(\overline{s}^0_{[:U]}) P)^{-1} (P^\top {{\,\textrm{diag}\,}}({\hat{s}}) P) \big )}\\&\ge 0, \end{aligned} \end{aligned}$$

where the last inequality follows as the argument in \({{\,\textrm{tr}\,}}\) is a product of positive semi-definite matrices.

Define \(\overline{s} = \overline{c} - \overline{A}^{\top } \overline{y}\). The corresponding barrier is

$$\begin{aligned} \begin{aligned}&\overline{F}_\eta (\overline{y}) \\&\quad = -\eta \left\langle \overline{y}, \overline{b} \right\rangle - \log \det \Big (\Lambda \big ((\overline{c} - \overline{A}^\top \overline{y})_{[:U]}\big )\Big ) - \log (\overline{c}_{U+1} - \left\langle \overline{a}_{U+1}, \overline{y} \right\rangle )\\&\qquad - \log (\overline{c}_{U+2} - \left\langle \overline{a}_{U+2}, \overline{y} \right\rangle ) \\&\quad = -\eta \left\langle \overline{y}, \overline{b} \right\rangle - \log \det \Big (\Lambda \big (\overline{s}_{[:U]}\big )\Big ) - \log (\overline{c}_{U+1} - y_{m+1}) - \log (\overline{c}_{U+2} \\&\qquad - \left\langle \frac{1}{R}b - A\overline{g}^0, \overline{y}_{[:m]} \right\rangle ), \end{aligned} \end{aligned}$$

and gradient of the system is

$$\begin{aligned} \begin{aligned} \overline{g}_\eta (\overline{y})&= \begin{bmatrix} -\frac{\eta }{R} b + A g_{\Sigma ^*}(\overline{s}_{[:U]}) + \frac{1}{\overline{s}_{U+2}}(\frac{1}{R}b - A \overline{g}^0) \\ -\eta (1 + \left\langle 1_U, \overline{g}^0 \right\rangle ) + \left\langle 1_U, g_{\Sigma ^*}(\overline{s}_{[:U]}) \right\rangle + \frac{1}{\overline{s}_{U+1}} \end{bmatrix}. \end{aligned} \end{aligned}$$

Simple algebraic calculations now show that for \(\eta ^0 = 1\) we have \(\overline{g}_{\eta ^0}(\overline{y}^0) = 0_{m+1}\), and so in particular \(\Vert \overline{g}_{\eta ^0}(\overline{y}^0)\Vert _{\overline{H}(\overline{y}^0)^{-1}} \le \epsilon _N\) as necessary in Lemma 6.1. It remains to show that near-optimal solutions to the modified problem correspond to near-optimal solutions to the original problem.

Let \(\textrm{OPT}\) resp. \(\overline{\textrm{OPT}}\) be the objective values of the original resp. modified program:

$$\begin{aligned} \textrm{OPT}:= \min _{Ax = b, x \in \Sigma _{n,2d}} \langle c,x \rangle , \qquad \overline{\textrm{OPT}}:= \min _{\overline{A}x = \overline{b}, \overline{x} \in \Sigma _{n,2d} \times \mathbb {R}^2 } \langle \overline{c},\overline{x} \rangle . \end{aligned}$$

Given any optimal \(x^* \in \mathbb {R}^U\) of the original primal of (SOS), consider the following \(\overline{x}^* \in \mathbb {R}^{U+2}\) fulfilling \(\overline{A} \overline{x}^* = \overline{b}, \overline{x}^* \in \Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\):

$$\begin{aligned} \overline{x}^* = \begin{bmatrix} \frac{1}{R} x^* \\ \left\langle 1_U, \overline{g}^0 \right\rangle + 1 - \frac{1}{R} \left\langle 1_U, x^* \right\rangle \\ 0 \end{bmatrix}. \end{aligned}$$

Note that indeed \(\overline{x}^*_{U+1} \in \mathbb {R}_{\ge 0}\) as by (33) we have \(x^*_{U+1} = \left\langle 1_U, \overline{g}^0 \right\rangle + 1 - \frac{1}{R} \left\langle 1_U, x^* \right\rangle \ge (1-\delta )L + 1 - 1 \ge 0\) for \(\delta < 1\) and by choice of R. Therefore

$$\begin{aligned} \overline{\textrm{OPT}} \le \left\langle \overline{c}, \overline{x}^* \right\rangle = \begin{bmatrix} \frac{\delta }{\Vert c\Vert _\infty } c^\top&0&1 \end{bmatrix} \cdot \begin{bmatrix} \frac{1}{R} x^* \\ \left\langle 1_U, \overline{g}^0 \right\rangle + 1 - \frac{1}{R} \left\langle 1_U, x^* \right\rangle \\ 0 \end{bmatrix} = \frac{\delta \cdot \left\langle c, x^* \right\rangle }{R\Vert c\Vert _\infty } = \frac{\delta \cdot \textrm{OPT}}{R\Vert c\Vert _\infty }. \end{aligned}$$
(34)

Now given any feasible primal-dual solution \((\overline{x}, \overline{y}, \overline{s})\) to (Aux-SOS) with duality gap \(\le \delta ^2\) we have

$$\begin{aligned} \frac{\delta }{\Vert c\Vert _\infty } \left\langle c, \overline{x}_{[:U]} \right\rangle \le \frac{\delta }{\Vert c\Vert _\infty } \left\langle c, \overline{x}_{[:U]} \right\rangle + \overline{x}_{U+2} = \left\langle \overline{c}, \overline{x} \right\rangle \le \overline{\textrm{OPT}}+ \delta ^2 \le \frac{\delta }{R\Vert c\Vert _\infty } \textrm{OPT}+ \delta ^2, \end{aligned}$$
(35)

where the first inequality uses \(\overline{x}_{U+2} \in \mathbb {R}_{\ge 0}\) and the last inequality follows from (34). So, for \(\widehat{x}:= R \cdot \overline{x}_{[:U]}\), using (35) we have that

$$\begin{aligned}{} & {} \left\langle c, \widehat{x} \right\rangle = R \left\langle c, \overline{x}_{[:U]} \right\rangle = \frac{R\Vert c\Vert _\infty }{\delta } \frac{\delta }{\Vert c\Vert _\infty } \left\langle c, \overline{x}_{[:U]} \right\rangle \le \frac{R\Vert c\Vert _\infty }{\delta } \Bigl (\frac{\delta }{R\Vert c\Vert _\infty } \textrm{OPT}+ \delta ^2\Bigr ) \\{} & {} \quad = \textrm{OPT}+ \delta R\Vert c\Vert _\infty . \end{aligned}$$

Recall that \(\mathbb {R}_{\ge 0}^U \subset \Sigma _{n,2d}^*\) and therefore \(\Sigma _{n,2d} \subset \mathbb {R}_{\ge 0}^U\). In particular \(\overline{x}_{[:U]} \ge 0\) and so using the equality \(\left\langle 1_U, \overline{x}_{[:U]} \right\rangle + \overline{x}_{U+1} = 1 + \left\langle 1_U, \overline{g}^0 \right\rangle \) (which follows from \(\overline{A}\overline{x} = \overline{b}\) since \((\overline{x}, \overline{y}, \overline{s})\) is feasible) and \(\overline{x}_{U+1} \ge 0\) we can further bound

$$\begin{aligned}{} & {} \Vert \overline{x}_{[:U]}\Vert _1 = \left\langle 1_U, \overline{x}_{[:U]} \right\rangle \\{} & {} \quad \le 1 + \left\langle 1_U, \overline{g}^0 \right\rangle \le 1 +(1 + \delta )L \le 2 L\,, \end{aligned}$$

where the penultimate inequality used (33). So we can further bound

$$\begin{aligned} \frac{1}{\Vert c\Vert _\infty } \left\langle c, \overline{x}_{[:U]} \right\rangle \ge - \frac{1}{\Vert c\Vert _\infty } \Vert c\Vert _\infty \Vert \overline{x}_{[:U}\Vert _1 \ge - 2L. \end{aligned}$$
(36)

Therefore using (35) and (36) we get that

$$\begin{aligned} \overline{x}_{U+2}\le & {} \frac{\delta }{R\Vert c\Vert _\infty } \textrm{OPT}+ \delta ^2 - \frac{\delta }{\Vert c\Vert _\infty } \left\langle c, \overline{x}_{[:U]} \right\rangle \nonumber \\\le & {} \frac{\delta }{R\Vert c\Vert _\infty } \textrm{OPT}+ \delta ^2 + 2\delta L \le \delta + \delta ^2 + 2\delta L \le 4\delta L. \end{aligned}$$
(37)

where the penultimate inequality used \(\textrm{OPT}\le R\Vert c\Vert _\infty \) and the last inequality uses \(\delta \le 1\).

Feasibility of \(\overline{x}\) to \(\overline{A} \overline{x} = \overline{b}\) shows that \(A \overline{x}_{[:U]} + (\frac{1}{R}b - A\overline{g}^0)\overline{x}_{U+2} = \frac{1}{R} b\) and we therefore have

$$\begin{aligned} \begin{aligned} \Vert A\widehat{x} - b \Vert _1&= \Vert RA\overline{x}_{[:U]} - b\Vert _1 = \Vert (b - RA \overline{g}^0)\overline{x}_{U+2}\Vert _1 \le 4\delta L\cdot (R\Vert A\Vert _\infty \Vert \overline{g}^0\Vert _1 + \Vert b\Vert _1) \\&\le 8\delta L\cdot (LR \Vert A\Vert _\infty + \Vert b\Vert _1). \end{aligned} \end{aligned}$$

where the penultimate inequality used (37), and the last inequality follows from (33). \(\square \)

It is worth noting that the system \(\overline{A}\overline{x} = \overline{b}, \overline{x} \in \Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\) is not an instance of (SOS) anymore. This is in contrast for similar initialization techniques for Linear Programming [9] or semi-definite Programming [17]. Nonetheless it is not hard to see that optimization over product cones as in this case \(\Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\) can be done efficiently if the optimization over the factors can be done efficiently, see e.g. the application in [24]. Alternatively note that the dual cone \((\Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2)^* = \Sigma _{n,2d}^* \times \mathbb {R}_{\ge 0}^2\). For \(s \in \Sigma _{n,2d}^*\) we used the barrier \(F(s) = - \log (\det (P^\top {{\,\textrm{diag}\,}}(s)P))\) throughout the paper. This barrier could easily extend via setting

$$\begin{aligned} \overline{P}:= \begin{bmatrix} P &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{bmatrix} \end{aligned}$$

and setting the barrier for \(\overline{s} \in \Sigma _{n,2d}^* \times \mathbb {R}_{\ge 0}^2\) as

$$\begin{aligned} \overline{F}(\overline{s}) = - \log \det (\overline{P}^\top {{\,\textrm{diag}\,}}(\overline{s}) \overline{P}) = - \log (\det (P^\top {{\,\textrm{diag}\,}}(\overline{s}_{[:U]})P)) - \log \overline{s}_{U+1} - \log \overline{s}_{U+2}, \end{aligned}$$

which recovers the standard barrier for the product cone. Our algorithm Algorithm 1 can now equally be run with \(\overline{F}\) instead of F, where then P is replaced by \(\overline{P}\).

B Discussion on representation

In this paper we consider the dual formulation of (SOS), where \(A \in \mathbb {R}^{m \times U}\), \(b \in \mathbb {R}^m\), \(c \in \mathbb {R}^U\):

$$\begin{aligned} \begin{aligned} \max \;&\left\langle y, b \right\rangle \\ A^\top y + s&= c \\ s&\in \Sigma _{n,2d}^*\,. \end{aligned} \end{aligned}$$
(38)

Here m is the number of constraints, and w.l.o.g. we assume \(m \le U\).

We remark that [31] instead consider the following formulation:

$$\begin{aligned} \min \;&\left\langle c, x \right\rangle \\ A x&= b \\ x&\in \Sigma _{n,2d}^*\, . \end{aligned}$$

These two formulations are in fact equivalent, because Eq. (38) is equivalent to

$$\begin{aligned} \min \;&\left\langle {\hat{c}}, x \right\rangle \quad \\ {\hat{A}}x&= {\hat{b}} \\ x&\in \Sigma _{n,2d}^*\,. \end{aligned}$$

Here we define \(\hat{A} \in \mathbb {R}^{(U-m) \times U}\) to be a matrix that satisfies \(\ker ({\hat{A}}) = {\text {Im}}(A^\top )\), i.e., the rows of \(\hat{A}\) are the orthogonal complement of the rows of A. Further we define \({\hat{c}} \in \mathbb {R}^U\) to be any vector that satisfies \(A {\hat{c}} = b\). Finally, we define \({\hat{b}}:= {\hat{A}} c \in \mathbb {R}^{U-m}\).

1.1 B.1 SOS under the monomial basis

In [31], the advantages and disadvantages of three different bases (monomial basis, Chebyshev basis, and interpolant basis) are discussed. Further, explicit expressions of \(\Lambda \) (defined in Theorem 3.4) are given for all three bases. Apart from the advantage of its computational efficiency, [31] chooses interpolant basis in their algorithm also because the interpolant basis representation is numerically stable, and this is required for practical algorithms. As we mainly focus on the theoretical running time of SOS algorithms, in this section we further justify the choice of the interpolant basis in our algorithm. We want to add to the exposition of [31] and stress that the standard monomial basis does not seem suitable even for theoretical algorithms. It is unclear whether in the monomial basis an amortized runtime faster than the naive \(O(U^\omega )\) can be achieved. (Similar argument holds for the Chebyshev basis.)

The Hessian in the monomial basis is given by

$$\begin{aligned} H_{ij}(s) = \sum _{a + b = i, k + \ell = j} [\Lambda (s)^{-1}]_{ak}[\Lambda (s)^{-1}]_{b\ell }. \end{aligned}$$

It is unclear how low-rank updates techniques could be applied here as low rank updates to \(\Lambda \) do not translate to low-rank updates to the Hessian H. Further, the structure of \(\Lambda \) itself in the multivariate case is far less understood than its counterpart in the interpolant basis, which we are going to elaborate on in the remainder of this section.

For n variables and degree d the monomial basis elements correspond to the terms \(x_1^{\alpha _1} \cdot \ldots \cdot x_n^{\alpha _n}\) for \(\alpha = (\alpha _1, \ldots , \alpha _n) \in \mathbb {N}^n\), \(\Vert \alpha \Vert _1 \le d\). When choosing both \(\textbf{p}\) and \(\textbf{q}\) to be the monomial basis, \(\Lambda \) has a special structure. For any basis elements \(p_1, p_2 \in \textbf{p}\) we have that \(p_1 p_2\) itself is a monomial in \({\mathcal {V}}_{n,2d}\). As such, the coefficients \(\lambda _{ij} \in \mathbb {R}^U\) have the special form that \(\lambda _{ij}\) is zero everywhere but in the coordinate that corresponds to the element in \(\textbf{q}\) equalling \(p_1p_2\). As \(\Lambda \) is also uniquely defined by the images of the elements in \({\textbf{q}}\), we can write \(E_u \in \mathbb {R}^{L \times L}\) as \(E_u:= \Lambda (e_u)\) where \(e_u\) is the vector that is zero everywhere but in position u for some \(u \in [U]\). Let \(q_u \in \textbf{q}\) be the associated basis polynomial. Then we see that \((E_u)_{ij} = 1_{[p_ip_j = q_u]}\). It also follows that every matrix \(S \in {\text {Im}}(\Lambda )\) has at most U different entries and each entry is uniquely defined by the corresponding basis element in \(\textbf{q}\). While it is not known how this special structure could be exploited in general, a speedup is known for the univariate case \(n = 1\). Here, \({\text {Im}}(\Lambda )\) are all Hankel matrices: We have \(\textbf{q} = \{1, x, \ldots , x^{2d}\}\) and \(\textbf{p} = \{1, x, \ldots , x^d\}\) so for any vector \(u \in \mathbb {R}^{2d+1}\) we have that

$$\begin{aligned} \Lambda (v) = \begin{bmatrix} v_{0} &{} v_{1} &{} v_{2} &{} \ldots &{} \ldots &{}v_{d} \\ v_{1} &{} v_2 &{} &{} &{} &{}\vdots \\ v_{2} &{} &{} &{} &{} &{} \vdots \\ \vdots &{} &{} &{} &{} &{} v_{2d-2}\\ \vdots &{} &{} &{} &{} v_{2d-2}&{} v_{2d-1} \\ v_{d} &{} \ldots &{} \ldots &{} v_{2d-2} &{} v_{2d-1} &{} v_{2d} \end{bmatrix}. \end{aligned}$$

These highly structured matrices are known to be invertible in time \({{\tilde{O}}}(d^2)\) (see e.g. [27]).

As mentioned above even the bivariate case becomes far more complicated. Let \(n = 2\) and \(d = 2\), and pick the ordered bases as

$$\begin{aligned} {\textbf{q}} = \{1, x, y, x^2, xy, y^2, x^3, x^2y, xy^2, y^3, x^4, xy^3, x^2y^2, xy^3, y^4\}\,, ~~ {\textbf{p}} = \{1, x, y, x^2, xy, y^2\}\,. \end{aligned}$$

Then for \(v \in \mathbb {R}^{n+ 2d \atopwithdelims ()2d} = \mathbb {R}^{15}\) we get for \({\textbf{p}} {\textbf{p}}^\top \) and the corresponding matrix \(\Lambda \) that

$$\begin{aligned} {\textbf{p}} {\textbf{p}}^\top = \begin{bmatrix} 1 &{} x &{} y &{} x^2 &{} xy &{} y^2\\ x &{} x^2 &{} xy &{} x^3 &{} x^2y &{} xy^2\\ y &{} xy &{} y^2 &{} x^2y &{} xy^2 &{} y^3 \\ x^2 &{} x^3 &{} x^2y &{} x^4 &{} x^3y &{} x^2y^2 \\ xy &{} x^2y &{} xy^2 &{} x^3y &{} x^2y^2 &{} xy^3 \\ y^2 &{} xy^2 &{} y^3 &{} x^2y^2 &{} xy^3 &{} y^4 \end{bmatrix}, \quad \Lambda (v) = \begin{bmatrix} v_0 &{} v_1 &{} v_2 &{} v_3 &{} v_4 &{} v_5 \\ v_1 &{} v_3 &{} v_4 &{} v_6 &{} v_7 &{} v_8 \\ v_2 &{} v_4 &{} v_5 &{} v_7 &{} v_8 &{} v_9 \\ v_3 &{} v_6 &{} v_7 &{} v_{10} &{} v_{11} &{} v_7 \\ v_4 &{} v_7 &{} v_8 &{} v_{11} &{} v_{12} &{} v_{13} \\ v_5 &{} v_8 &{} v_9 &{} v_7 &{} v_{13} &{} v_{14} \\ \end{bmatrix}. \end{aligned}$$

While \(\Lambda (v)\) still has some structure it is unclear how \((\Lambda (v))^{-1}\) could be computed more efficiently than in matrix multiplication time.

C Proof of amortization lemma

We include the proof of Lemma 7.2 for completeness. The main difference between this proof and that of [14] is that we cut off at U/L instead of L.

Our proof makes use of the following two facts about \(\omega \) and \(\alpha \) (Lemma A.4 and Lemma A.5 of [9]).

Fact C.1

(Relation of \(\omega \) and \(\alpha \)) \(\omega \le 3 - \alpha \).

Fact C.2

(Upper bound of \({{\mathcal {T}}}_{\textrm{mat}}(n,n,r)\)) For any \(r \le n\), we have that \({{\mathcal {T}}}_{\textrm{mat}}(n,n,r) \le n^{2 + o(1)} + r^{\frac{\omega - 2}{1 - \alpha }} \cdot n^{2 - \frac{\alpha (\omega -2)}{(1 - \alpha )} + o(1)}\).

Lemma A.3

(Restatement of Lemma 7.2) Let t denote the total number of iterations. Let \(r_i \in [L]\) be the rank for the i-th iteration for \(i \in [t]\). Assume \(r_i\) satisfies the following condition: for any vector \(g \in \mathbb {R}_+^L\) which is non-increasing, we have

$$\begin{aligned} \sum _{i=1}^t r_i \cdot g_{r_i} \le O(t \cdot \Vert g\Vert _2). \end{aligned}$$

If the cost in the i-th iteration is \(O({{\mathcal {T}}}_{\textrm{mat}}(U, U, \min \{L r_i, U\}))\), when \(\alpha \ge 5 - 2 \omega \), the amortized cost per iteration is

$$\begin{aligned} U^{2 + o(1)} + U^{\omega - 1/2 + o(1)} \cdot L^{1/2}. \end{aligned}$$

Proof

For \(r_i\) that satisfies \(r_i \le U/L\), we have

$$\begin{aligned} \begin{aligned} {{\mathcal {T}}}_{\textrm{mat}}(U, U, L r_i) \le&~ U^{2 + o(1)} + (L r_i)^{\frac{\omega -2}{1-\alpha }} \cdot U^{2 - \frac{\alpha (\omega -2)}{1-\alpha } + o(1)} \\ =&~ U^{2 + o(1)} + U^{2 - \frac{\alpha (\omega -2)}{1-\alpha } + o(1)} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot r_i^{\frac{\omega -2}{1-\alpha }}, \end{aligned} \end{aligned}$$
(39)

where the first step follows from Fact C.2.

Define a sequence \(g \in \mathbb {R}_+^L\) such that for \(r \in [L]\),

$$\begin{aligned} \begin{aligned} g_r = {\left\{ \begin{array}{ll} r^{\frac{\omega -2}{1-\alpha }-1} &{} \text {if~} r \le U/L, \\ (U/L)^{\frac{\omega - 2}{1 - \alpha }} \cdot r^{-1} &{} \text {if~} r > U/L. \end{array}\right. } \end{aligned} \end{aligned}$$

Note that g is non-increasing because \(\frac{\omega -2}{1-\alpha } \le 1\) (Fact C.1). Then using the condition in the lemma statement, we have

$$\begin{aligned} \begin{aligned} \sum _{i=1}^t \min \{r_i^{\frac{\omega -2}{1-\alpha }}, (U/L)^{\frac{\omega -2}{1-\alpha }}\} =&~ \sum _{i=1}^t r_i \cdot g_{r_i} \\ \le&~ t \cdot \Vert g\Vert _2 \\ \le&~ t \cdot \Big (\int _{x=1}^{U/L} x^{\frac{2(\omega -2)}{1-\alpha }-2} \textrm{d}x + (U/L)^{\frac{2(\omega - 2)}{1 - \alpha }} \cdot \int _{x=U/L}^L x^{-2} \textrm{d}x \Big )^{1/2} \\ \le&~ t \cdot \Big ( c \cdot (U/L)^{\frac{2(\omega -2)}{1-\alpha }-1} + (U/L)^{2(\frac{\omega - 2}{1 - \alpha })} \cdot (U/L)^{-1}\Big )^{1/2} \\ =&~ t \cdot O((U/L)^{\frac{(\omega -2)}{1-\alpha }-1/2}), \end{aligned} \end{aligned}$$
(40)

where the first step follows from the definition of \(g \in \mathbb {R}^L\), the second step follows from the assumption \(\sum _{t=1}^t r_i \cdot g_{r_i} \le t \cdot \Vert g\Vert _2\) in the lemma statement, the third step follows from upper bounding the \(\ell _2\) norm \(\Vert g\Vert _2^2 = \sum _{r=1}^L g_r^2\), and the fourth step follows \(\frac{2(\omega -2)}{1-\alpha } \ge 1\) when \(\alpha \ge 5 - 2 \omega \), so the integral \(\int _{x=1}^{U/L} x^{\frac{2(\omega -2)}{1-\alpha }-2} \textrm{d}x = c \cdot x^{\frac{2(\omega -2)}{1-\alpha }-1}\big |_{1}^{U/L} = O\big ((U/L)^{\frac{2(\omega -2)}{1-\alpha }-1}\big )\) where \(c:= 1/(\frac{2(\omega -2)}{1-\alpha }-1)\).

Thus we have

$$\begin{aligned} \begin{aligned}&\sum _{t=1}^t {{\mathcal {T}}}_{\textrm{mat}}(U, U, \min \{L r_i, U\}) \\&\quad \le \sum _{t=1}^t \Big ( U^{2 + o(1)} + U^{2 - \frac{\alpha (\omega -2)}{1-\alpha } + o(1)} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot \min \{r_i^{\frac{\omega -2}{1-\alpha }}, (U/L)^{\frac{\omega -2}{1-\alpha }} \} \Big ) \\&\quad = t \cdot U^{2 + o(1)} + U^{2 - \frac{\alpha (\omega -2)}{1-\alpha } + o(1)} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot \sum _{t=1}^t \min \{r_i^{\frac{\omega -2}{1-\alpha }}, (U/L)^{\frac{\omega -2}{1-\alpha }}\} \\&\quad \le t \cdot U^{2 + o(1)} + U^{2 - \frac{\alpha (\omega -2)}{1-\alpha } + o(1)} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot t \cdot (U/L)^{\frac{(\omega -2)}{1-\alpha }-1/2} \\&\quad = t \cdot (U^{2 + o(1)} + U^{\omega - 1/2 + o(1)} \cdot L^{1/2}), \end{aligned} \end{aligned}$$

where the first step follows from Eq. (39) and \({{\mathcal {T}}}_{\textrm{mat}}(U,U,U) = U^{\omega } = U^{2 - \frac{\alpha (\omega -2)}{1-\alpha }} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot (U/L)^{\frac{\omega -2}{1-\alpha }}\), the second step follows from moving summation inside, the third step follows from Eq. (40), and the last step follows from adding the terms together. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, S., Natura, B. & Weinstein, O. A Faster Interior-Point Method for Sum-of-Squares Optimization. Algorithmica 85, 2843–2884 (2023). https://doi.org/10.1007/s00453-023-01112-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-023-01112-4

Keywords

Navigation