Abstract
We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let \(p = \sum _i q^2_i\) be an n-variate SOS polynomial of degree 2d. Denoting by \(L:= \left( {\begin{array}{c}n+d\\ d\end{array}}\right) \) and \(U:= \left( {\begin{array}{c}n+2d\\ 2d\end{array}}\right) \) the dimensions of the vector spaces in which \(q_i\)’s and p live respectively, our algorithm runs in time \({\tilde{O}}(LU^{1.87})\). This is polynomially faster than state-of-art SOS and semidefinite programming solvers, which achieve runtime \({\tilde{O}}(L^{0.5}\min \{U^{2.37}, L^{4.24}\})\). The centerpiece of our algorithm is a dynamic data structure for maintaining the inverse of the Hessian of the SOS barrier function under the polynomial interpolant basis, which efficiently extends to multivariate SOS optimization, and requires maintaining spectral approximations to low-rank perturbations of elementwise (Hadamard) products. This is the main challenge and departure from recent IPM breakthroughs using inverse-maintenance, where low-rank updates to the slack matrix readily imply the same for the Hessian matrix.
Similar content being viewed by others
Notes
A subset \({\mathcal {K}}\subset \mathbb {R}^N\) is a convex cone if \(\forall \; x,y \in {\mathcal {K}}\) and \(\alpha , \beta \in \mathbb {R}_+\), \(\alpha x+ \beta y \in {\mathcal {K}}\).
We use \(\widetilde{O}(\cdot )\) to hide \(U^{o(1)}\) and \(\log (1/\delta )\) factors.
Any set of points in \(\mathbb {R}^n\) for which the evaluation of a polynomial in \(\mathcal {V}_{n,d}\) on these points uniquely defines the polynomial.
This equation means \(\forall t \in \mathbb {R}^n\), \(\Lambda ([q_1(t), \ldots , q_U(t)]^\top ) = [p_1(t), \ldots , p_L(t)]^\top \cdot [p_1(t), \ldots , p_L(t)]\).
When solving WSOS, [17] has running time \(O((kL)^{0.5} \cdot (kU L^2 + U^{\omega } + (kL)^{\omega }))\), and [14] has running time \(O((kL)^{0.5} \cdot (U^2 + (kL)^4) + U^{\omega } + (kL)^{2 \omega })\). Here, we were able to drop a factor in k due to the block-diagonal structure of the constraint matrix.
References
Alman, J., Williams, V.V.: A refined laser method and faster matrix multiplication. In: Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 522–539. SIAM (2021)
Ballinger, B., Blekherman, G., Cohn, H., Giansiracusa, N., Kelly, E., Schürmann, A.: Experimental study of energy-minimizing point configurations on spheres. Exp. Math. 18(3), 257–283 (2009)
Bos, L., De Marchi, S., Sommariva, A., Vianello, M.: Computing multivariate Fekete and Leja points by numerical linear algebra. SIAM J. Numer. Anal. 48(5), 1984–1999 (2010)
Barak, B., Hopkins, S.B., Kelner, J.A., Kothari, P.K., Moitra, A., Potechin, A.: A nearly tight sum-of-squares lower bound for the planted clique problem. SIAM J. Comput. 48(2), 687–735 (2019)
Bläser, M.: Fast matrix multiplication. In: Theory of Computing, pp. 1–60 (2013)
Blekherman, G., Parrilo, P.A., Thomas, R.R.: Semidefinite Optimization and Convex Algebraic Geometry. SIAM (2012)
Barak., B., Raghavendra., P., Steurer, D.: Rounding semidefinite programming hierarchies via global correlation. In: 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 472–481. IEEE (2011)
Bachoc, C., Vallentin, F.: New upper bounds for kissing numbers from semidefinite programming. Technical report, Journal of the American Mathematical Society (2006)
Cohen, M.B., Lee, Y.T., Song, Z.: Solving linear programs in the current matrix multiplication time. In: Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC) (2019)
Eisenbrand, F., Grandoni, F.: On the complexity of fixed parameter clique and dominating set. Theor. Comput. Sci. 326(1–3), 57–67 (2004)
Ghaddar, B., Marecek, J., M, M.: Optimal power flow as a polynomial optimization problem. IEEE Trans. Power Syst. 31, 539–546 (2016)
Gall, F.L., Urrutia, F.: Improved rectangular matrix multiplication using powers of the coppersmith-winograd tensor. In: Proceedings of the 2018 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1029–1046. SIAM (2018)
Heß, R., Henrion, D., Lasserre, J.-B., Pham, T.S.: Semidefinite approximations of the polynomial abscissa. SIAM J. Control Optim. 54(3), 1633–1656 (2016)
Huang, B., Jiang, S., Song, Z., Tao, R., Zhang, R.: Solving sdp faster: a robust ipm framework and efficient implementation (2021)
Hopkins, S.B., Kothari, P.K., Potechin, A., Raghavendra, P., Schramm, T., Steurer, D.: The power of sum-of-squares for detecting hidden structures. In: 58th IEEE Annual Symposium on Foundations of Computer Science, (FOCS), pp. 720–731. IEEE Computer Society (2017)
Hopkins, S.B., Li, J.: Mixture models, robustness, and sum of squares proofs. In: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, (STOC), pp. 1021–1034. ACM (2018)
Jiang, H., Kathuria, T., Lee, Y.T., Padmanabhan, S., Song, Z.: A faster interior point method for semidefinite programming. In: 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 910–918. IEEE (2020)
Jiang, S., Man, Y., Song, Z., Yu, Z., Zhuo, D.: Fast graph neural tangent kernel via kronecker sketching (2021). arXiv preprint arXiv:2112.02446, AAAI’22
Karmarkar, N.: A new polynomial-time algorithm for linear programming. In: Proceedings of the 16th Annual ACM Symposium on Theory of Computing (STOC), pp. 302–311 (1984)
Lasserre, J.B.: An Introduction to Polynomial and Semi-Algebraic Optimization. Cambridge Texts in Applied Mathematics. Cambridge University Press (2015)
Laurent, M.: Sums of Squares, Moment Matrices and Optimization over Polynomials. Number 149 in The IMA Volumes in Mathematics and Its Applications Series, pp. 155–270. Springer, Germany (2009)
Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, pp. 296–303 (2014)
Lee, Y.T., Sidford, A.: Path finding methods for linear programming: solving linear programs in \({\tilde{O}}(\sqrt{rank})\) iterations and faster algorithms for maximum flow. In: 2014 IEEE 55th Annual Symposium on Foundations of Computer Science (FOCS), pp. 424–433. IEEE (2014)
Lee, Y.T., Song, Z., Zhang, Q.: Solving empirical risk minimization in the current matrix multiplication time. In: Conference on Learning Theory (COLT), pp. 2140–2157. PMLR (2019)
Nesterov, Y.: Squared functional systems and optimization problems. In: High Performance Optimization, pp. 405–440. Springer (2000)
Nesterov, Y., Nemirovski, A.: Interior-point polynomial algorithms in convex programming. In: Siam Studies in Applied Mathematics (1987)
Pan, V.Y.: Structured Matrices and Polynomials. Birkhäuser, Boston (2001)
Papp, D.: Optimal designs for rational function regression. J. Am. Stat. Assoc. 107(497), 400–411 (2012)
Parrilo, P.: Sum of Squares: Theory and Applications: AMS Short Course, Sum of Squares: Theory and Applications, January 14–15, 2019, Baltimore, Maryland. American Mathematical Society, Providence (2020)
Putinar, M., Vasilescu, F.-H.: Positive polynomials on semi-algebraic sets. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics 328(7), 585–589 (1999)
Papp, D., Yildiz, S.: Sum-of-squares optimization without semidefinite programming. SIAM J. Optim. 29(1), 822–851 (2019)
Roh, T., Dumitrescu, B., Vandenberghe, L.: Multidimensional FIR filter design via trigonometric sum-of-squares optimization. J. Sel. Top. Signal Process. 1(4), 641–650 (2007)
Renegar, J.: A Mathematical View of Interior-Point Methods in Convex Optimization. Society for Industrial and Applied Mathematics (2001)
Strang, G.: Karmarkar’s algorithm and its place in applied mathematics. Math. Intell. 9(2), 4–10 (1987)
Sommariva, A., Vianello, M.: Computing approximate Fekete points by QR factorizations of Vandermonde matrices. Comput. Math. Appl. 57(8), 1324–1336 (2009)
Song, Z., Yang, S., Zhang, R.: Does preprocessing help training over-parameterized neural networks? In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Song, Z., Zhang, L., Zhang R.: Training multi-layer over-parametrized neural network in subquadratic time (2021). arXiv preprint arXiv:2112.07628
Tan, N.: On the power of Lasserre SDP hierarchy. Ph.D. thesis, EECS Department, University of California, Berkeley (2015)
Vaidya, P.M.: Speeding-up linear programming using fast matrix multiplication. In: 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 332–337. IEEE (1989)
van den Brand, J., Peng, B., Song, Z., Weinstein, O.: Training (overparametrized) neural networks in near-linear time. In: 12th Innovations in Theoretical Computer Science Conference (ITCS 2021), vol. 185, pp. 63:1–63:15 (2021)
Yinyu, Y., Michael J, T., Shinji, M.: An \({O}(\sqrt{n}{L})\)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res. 19(1), 53–67 (1994)
Acknowledgements
The second author would like to thank Vissarion Fisikopoulos and Elias Tsigaridas for introducing him from a practical perspective to Sum-of-Square optimization under the interpolant basis.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by NSF CAREER award CCF-1844887. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 757481-ScaleOpt). Supported by NSF CAREER Award CCF-1844887 and ISF Grant #3011005535.
Appendices
A Initialization
There exist standard techniques to transform a convex program to a form that has an easily obtainable strictly feasible point, see e.g. [41]. We follow the initialization procedure presented by [9, 17] and adapt to SOS optimization. Similar initialization lemma exists for WSOS optimization.
Let the matrix \(P \in \mathbb {R}^{U \times L}\) and the operator \(\Lambda : \mathbb {R}^U \rightarrow \mathbb {R}^{L \times L}\) that \(\Lambda (s) = P^{\top } {{\,\textrm{diag}\,}}(s) P\) be defined as in the interpolant basis paragraph of Sect. 3.
Lemma A.1
(Initialization) Given an instance of (SOS) that fulfills Slater’s condition, and let R be an upper bound on the \(\ell _1\)-norm of the primal feasible solutions, i.e. all primal feasible x of (SOS) fulfill \(\Vert x\Vert _1 \le R\), and let \(\delta \in (0,1)\). We define \(\overline{A} \in \mathbb {R}^{(m+1)\times (U+2)}\), \(\overline{b} \in \mathbb {R}^{m + 1}\), and \(\overline{c} \in \mathbb {R}^{U + 2}\) as
and let
where \(\overline{g}^0 = g_{\Sigma ^*}(\overline{s}^0_{[:U]}) \in \mathbb {R}^U\) for the gradient function \(g_{\Sigma ^*}(s):= {{\,\textrm{diag}\,}}(P(P^\top {{\,\textrm{diag}\,}}(s) P)^{-1} P^\top )\) that maps from \(\mathbb {R}^U\) to \(\mathbb {R}^U\). This defines the auxiliary primal-dual system
Then \((\overline{x}^0, \overline{y}^0, \overline{s}^0)\) are feasible to the auxiliary system (Aux-SOS).
Further, under the canonical barrier (we use \(\overline{a}_{i}\) to denote the i-th column of \(\overline{A}\)):
we have that \(\Vert \overline{g}_{\eta ^0}(\overline{y}^0)\Vert _{\overline{H}(\overline{y}^0)^{-1}} = 0\) for \(\eta ^0 = 1\).
Further, for any solution \((\overline{x}, \overline{y}, \overline{s})\) to (Aux-SOS) with duality gap \(\le \delta ^2\), its restriction \(\widehat{x}:= \overline{x}_{[:U]}\) fulfills
Proof
Let \({\text {proj}}\) denote the orthogonal projection matrix that maps a matrix onto its image space. Then note that
where the last inequality follows as \({{\,\textrm{tr}\,}}({\text {proj}}(P)) = {\text {rk}}(P) = L\).
Straightforward calculations show that \(\overline{A}\overline{x}^0 = \overline{b}\) and \(\overline{A}^\top \overline{y}^0 + \overline{s}^0 = \overline{c}\). Further, note that \(\overline{s}^0 > 0\) as \(\delta < 1\) and therefore \(\overline{s}^0 \in \mathbb {R}_+^{U+2} \subset \Sigma _{n,2d}^* \times \mathbb {R}_+^2\). The containment \(\mathbb {R}_+^U \subset \mathbb {R}_{\ge 0}^U \subset \Sigma _{n,2d}^*\) is clear by the characterisation \(\Sigma _{n,2d}^* = \{s: \Lambda (s) = P^\top {{\,\textrm{diag}\,}}(s) P \succeq 0\}\).
Further \(\overline{x}^0_{[:U]} = g_{\Sigma ^*}(s^0_{[:U])}) = {{\,\textrm{diag}\,}}(P(P^\top {{\,\textrm{diag}\,}}(\overline{s}^0_{[:U]}) P)^{-1} P^\top ) \in \Sigma _{n,2d}\) is e.g. shown in Proposition 3.3.3 in [33]. We make it explicit by showing that \(\langle \overline{x}^0_{[:U]}, {\hat{s}} \rangle \ge 0\) for any \({\hat{s}} \in \Sigma _{n,2d}^*\). Indeed,
where the last inequality follows as the argument in \({{\,\textrm{tr}\,}}\) is a product of positive semi-definite matrices.
Define \(\overline{s} = \overline{c} - \overline{A}^{\top } \overline{y}\). The corresponding barrier is
and gradient of the system is
Simple algebraic calculations now show that for \(\eta ^0 = 1\) we have \(\overline{g}_{\eta ^0}(\overline{y}^0) = 0_{m+1}\), and so in particular \(\Vert \overline{g}_{\eta ^0}(\overline{y}^0)\Vert _{\overline{H}(\overline{y}^0)^{-1}} \le \epsilon _N\) as necessary in Lemma 6.1. It remains to show that near-optimal solutions to the modified problem correspond to near-optimal solutions to the original problem.
Let \(\textrm{OPT}\) resp. \(\overline{\textrm{OPT}}\) be the objective values of the original resp. modified program:
Given any optimal \(x^* \in \mathbb {R}^U\) of the original primal of (SOS), consider the following \(\overline{x}^* \in \mathbb {R}^{U+2}\) fulfilling \(\overline{A} \overline{x}^* = \overline{b}, \overline{x}^* \in \Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\):
Note that indeed \(\overline{x}^*_{U+1} \in \mathbb {R}_{\ge 0}\) as by (33) we have \(x^*_{U+1} = \left\langle 1_U, \overline{g}^0 \right\rangle + 1 - \frac{1}{R} \left\langle 1_U, x^* \right\rangle \ge (1-\delta )L + 1 - 1 \ge 0\) for \(\delta < 1\) and by choice of R. Therefore
Now given any feasible primal-dual solution \((\overline{x}, \overline{y}, \overline{s})\) to (Aux-SOS) with duality gap \(\le \delta ^2\) we have
where the first inequality uses \(\overline{x}_{U+2} \in \mathbb {R}_{\ge 0}\) and the last inequality follows from (34). So, for \(\widehat{x}:= R \cdot \overline{x}_{[:U]}\), using (35) we have that
Recall that \(\mathbb {R}_{\ge 0}^U \subset \Sigma _{n,2d}^*\) and therefore \(\Sigma _{n,2d} \subset \mathbb {R}_{\ge 0}^U\). In particular \(\overline{x}_{[:U]} \ge 0\) and so using the equality \(\left\langle 1_U, \overline{x}_{[:U]} \right\rangle + \overline{x}_{U+1} = 1 + \left\langle 1_U, \overline{g}^0 \right\rangle \) (which follows from \(\overline{A}\overline{x} = \overline{b}\) since \((\overline{x}, \overline{y}, \overline{s})\) is feasible) and \(\overline{x}_{U+1} \ge 0\) we can further bound
where the penultimate inequality used (33). So we can further bound
Therefore using (35) and (36) we get that
where the penultimate inequality used \(\textrm{OPT}\le R\Vert c\Vert _\infty \) and the last inequality uses \(\delta \le 1\).
Feasibility of \(\overline{x}\) to \(\overline{A} \overline{x} = \overline{b}\) shows that \(A \overline{x}_{[:U]} + (\frac{1}{R}b - A\overline{g}^0)\overline{x}_{U+2} = \frac{1}{R} b\) and we therefore have
where the penultimate inequality used (37), and the last inequality follows from (33). \(\square \)
It is worth noting that the system \(\overline{A}\overline{x} = \overline{b}, \overline{x} \in \Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\) is not an instance of (SOS) anymore. This is in contrast for similar initialization techniques for Linear Programming [9] or semi-definite Programming [17]. Nonetheless it is not hard to see that optimization over product cones as in this case \(\Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2\) can be done efficiently if the optimization over the factors can be done efficiently, see e.g. the application in [24]. Alternatively note that the dual cone \((\Sigma _{n,2d} \times \mathbb {R}_{\ge 0}^2)^* = \Sigma _{n,2d}^* \times \mathbb {R}_{\ge 0}^2\). For \(s \in \Sigma _{n,2d}^*\) we used the barrier \(F(s) = - \log (\det (P^\top {{\,\textrm{diag}\,}}(s)P))\) throughout the paper. This barrier could easily extend via setting
and setting the barrier for \(\overline{s} \in \Sigma _{n,2d}^* \times \mathbb {R}_{\ge 0}^2\) as
which recovers the standard barrier for the product cone. Our algorithm Algorithm 1 can now equally be run with \(\overline{F}\) instead of F, where then P is replaced by \(\overline{P}\).
B Discussion on representation
In this paper we consider the dual formulation of (SOS), where \(A \in \mathbb {R}^{m \times U}\), \(b \in \mathbb {R}^m\), \(c \in \mathbb {R}^U\):
Here m is the number of constraints, and w.l.o.g. we assume \(m \le U\).
We remark that [31] instead consider the following formulation:
These two formulations are in fact equivalent, because Eq. (38) is equivalent to
Here we define \(\hat{A} \in \mathbb {R}^{(U-m) \times U}\) to be a matrix that satisfies \(\ker ({\hat{A}}) = {\text {Im}}(A^\top )\), i.e., the rows of \(\hat{A}\) are the orthogonal complement of the rows of A. Further we define \({\hat{c}} \in \mathbb {R}^U\) to be any vector that satisfies \(A {\hat{c}} = b\). Finally, we define \({\hat{b}}:= {\hat{A}} c \in \mathbb {R}^{U-m}\).
1.1 B.1 SOS under the monomial basis
In [31], the advantages and disadvantages of three different bases (monomial basis, Chebyshev basis, and interpolant basis) are discussed. Further, explicit expressions of \(\Lambda \) (defined in Theorem 3.4) are given for all three bases. Apart from the advantage of its computational efficiency, [31] chooses interpolant basis in their algorithm also because the interpolant basis representation is numerically stable, and this is required for practical algorithms. As we mainly focus on the theoretical running time of SOS algorithms, in this section we further justify the choice of the interpolant basis in our algorithm. We want to add to the exposition of [31] and stress that the standard monomial basis does not seem suitable even for theoretical algorithms. It is unclear whether in the monomial basis an amortized runtime faster than the naive \(O(U^\omega )\) can be achieved. (Similar argument holds for the Chebyshev basis.)
The Hessian in the monomial basis is given by
It is unclear how low-rank updates techniques could be applied here as low rank updates to \(\Lambda \) do not translate to low-rank updates to the Hessian H. Further, the structure of \(\Lambda \) itself in the multivariate case is far less understood than its counterpart in the interpolant basis, which we are going to elaborate on in the remainder of this section.
For n variables and degree d the monomial basis elements correspond to the terms \(x_1^{\alpha _1} \cdot \ldots \cdot x_n^{\alpha _n}\) for \(\alpha = (\alpha _1, \ldots , \alpha _n) \in \mathbb {N}^n\), \(\Vert \alpha \Vert _1 \le d\). When choosing both \(\textbf{p}\) and \(\textbf{q}\) to be the monomial basis, \(\Lambda \) has a special structure. For any basis elements \(p_1, p_2 \in \textbf{p}\) we have that \(p_1 p_2\) itself is a monomial in \({\mathcal {V}}_{n,2d}\). As such, the coefficients \(\lambda _{ij} \in \mathbb {R}^U\) have the special form that \(\lambda _{ij}\) is zero everywhere but in the coordinate that corresponds to the element in \(\textbf{q}\) equalling \(p_1p_2\). As \(\Lambda \) is also uniquely defined by the images of the elements in \({\textbf{q}}\), we can write \(E_u \in \mathbb {R}^{L \times L}\) as \(E_u:= \Lambda (e_u)\) where \(e_u\) is the vector that is zero everywhere but in position u for some \(u \in [U]\). Let \(q_u \in \textbf{q}\) be the associated basis polynomial. Then we see that \((E_u)_{ij} = 1_{[p_ip_j = q_u]}\). It also follows that every matrix \(S \in {\text {Im}}(\Lambda )\) has at most U different entries and each entry is uniquely defined by the corresponding basis element in \(\textbf{q}\). While it is not known how this special structure could be exploited in general, a speedup is known for the univariate case \(n = 1\). Here, \({\text {Im}}(\Lambda )\) are all Hankel matrices: We have \(\textbf{q} = \{1, x, \ldots , x^{2d}\}\) and \(\textbf{p} = \{1, x, \ldots , x^d\}\) so for any vector \(u \in \mathbb {R}^{2d+1}\) we have that
These highly structured matrices are known to be invertible in time \({{\tilde{O}}}(d^2)\) (see e.g. [27]).
As mentioned above even the bivariate case becomes far more complicated. Let \(n = 2\) and \(d = 2\), and pick the ordered bases as
Then for \(v \in \mathbb {R}^{n+ 2d \atopwithdelims ()2d} = \mathbb {R}^{15}\) we get for \({\textbf{p}} {\textbf{p}}^\top \) and the corresponding matrix \(\Lambda \) that
While \(\Lambda (v)\) still has some structure it is unclear how \((\Lambda (v))^{-1}\) could be computed more efficiently than in matrix multiplication time.
C Proof of amortization lemma
We include the proof of Lemma 7.2 for completeness. The main difference between this proof and that of [14] is that we cut off at U/L instead of L.
Our proof makes use of the following two facts about \(\omega \) and \(\alpha \) (Lemma A.4 and Lemma A.5 of [9]).
Fact C.1
(Relation of \(\omega \) and \(\alpha \)) \(\omega \le 3 - \alpha \).
Fact C.2
(Upper bound of \({{\mathcal {T}}}_{\textrm{mat}}(n,n,r)\)) For any \(r \le n\), we have that \({{\mathcal {T}}}_{\textrm{mat}}(n,n,r) \le n^{2 + o(1)} + r^{\frac{\omega - 2}{1 - \alpha }} \cdot n^{2 - \frac{\alpha (\omega -2)}{(1 - \alpha )} + o(1)}\).
Lemma A.3
(Restatement of Lemma 7.2) Let t denote the total number of iterations. Let \(r_i \in [L]\) be the rank for the i-th iteration for \(i \in [t]\). Assume \(r_i\) satisfies the following condition: for any vector \(g \in \mathbb {R}_+^L\) which is non-increasing, we have
If the cost in the i-th iteration is \(O({{\mathcal {T}}}_{\textrm{mat}}(U, U, \min \{L r_i, U\}))\), when \(\alpha \ge 5 - 2 \omega \), the amortized cost per iteration is
Proof
For \(r_i\) that satisfies \(r_i \le U/L\), we have
where the first step follows from Fact C.2.
Define a sequence \(g \in \mathbb {R}_+^L\) such that for \(r \in [L]\),
Note that g is non-increasing because \(\frac{\omega -2}{1-\alpha } \le 1\) (Fact C.1). Then using the condition in the lemma statement, we have
where the first step follows from the definition of \(g \in \mathbb {R}^L\), the second step follows from the assumption \(\sum _{t=1}^t r_i \cdot g_{r_i} \le t \cdot \Vert g\Vert _2\) in the lemma statement, the third step follows from upper bounding the \(\ell _2\) norm \(\Vert g\Vert _2^2 = \sum _{r=1}^L g_r^2\), and the fourth step follows \(\frac{2(\omega -2)}{1-\alpha } \ge 1\) when \(\alpha \ge 5 - 2 \omega \), so the integral \(\int _{x=1}^{U/L} x^{\frac{2(\omega -2)}{1-\alpha }-2} \textrm{d}x = c \cdot x^{\frac{2(\omega -2)}{1-\alpha }-1}\big |_{1}^{U/L} = O\big ((U/L)^{\frac{2(\omega -2)}{1-\alpha }-1}\big )\) where \(c:= 1/(\frac{2(\omega -2)}{1-\alpha }-1)\).
Thus we have
where the first step follows from Eq. (39) and \({{\mathcal {T}}}_{\textrm{mat}}(U,U,U) = U^{\omega } = U^{2 - \frac{\alpha (\omega -2)}{1-\alpha }} \cdot L^{\frac{\omega -2}{1-\alpha }} \cdot (U/L)^{\frac{\omega -2}{1-\alpha }}\), the second step follows from moving summation inside, the third step follows from Eq. (40), and the last step follows from adding the terms together. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jiang, S., Natura, B. & Weinstein, O. A Faster Interior-Point Method for Sum-of-Squares Optimization. Algorithmica 85, 2843–2884 (2023). https://doi.org/10.1007/s00453-023-01112-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-023-01112-4