Abstract
We consider the sum-of-squares hierarchy of approximations for the problem of minimizing a polynomial f over the boolean hypercube \({{\mathbb {B}}^{n}=\{0,1\}^n}\). This hierarchy provides for each integer \(r \in {\mathbb {N}}\) a lower bound \(\smash {f_{({r})}}\) on the minimum \(f_{\min }\) of f, given by the largest scalar \(\lambda \) for which the polynomial \(f - \lambda \) is a sum-of-squares on \({\mathbb {B}}^{n}\) with degree at most 2r. We analyze the quality of these bounds by estimating the worst-case error \(f_{\min }- \smash {f_{({r})}}\) in terms of the least roots of the Krawtchouk polynomials. As a consequence, for fixed \(t \in [0, 1/2]\), we can show that this worst-case error in the regime \(r \approx t \cdot n\) is of the order \(1/2 - \sqrt{t(1-t)}\) as n tends to \(\infty \). Our proof combines classical Fourier analysis on \({\mathbb {B}}^{n}\) with the polynomial kernel technique and existing results on the extremal roots of Krawtchouk polynomials. This link to roots of orthogonal polynomials relies on a connection between the hierarchy of lower bounds \(\smash {f_{({r})}}\) and another hierarchy of upper bounds \(\smash {f^{({r})}}\), for which we are also able to establish the same error analysis. Our analysis extends to the minimization of a polynomial over the q-ary cube \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\). Furthermore, our results apply to the setting of matrix-valued polynomials.
This is a preview of subscription content, access via your institution.
Notes
Technically, the program (31) allows for the density to be a sum of squares, whereas the program (35) requires the density to be an actual square. This is no true restriction, though, since, as a straightforward convexity argument shows, the optimum solution to (31) can in fact always be chosen to be a square [20].
Note that as p is assumed to be real-valued, the coefficients \(\lambda _i\) must be real. Indeed, for each \(a \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), we have \(\langle p, \chi _a \rangle _{\mu } = \lambda _{|a|} \Vert \chi _a\Vert ^2 = \lambda _{|a|} \Vert \chi ^{-1}_{a}\Vert ^2 = \langle p, \overline{\chi _a} \rangle _{\mu } = \overline{\langle p, \chi _a \rangle _{\mu }}\).
References
Alon, N., Naor, A.: Approximating the cut-norm via Grothendieck’s inequality. In: 36th Annual ACM Symposium on Theory of Computing, pp. 72–80 (2004)
Arora, S., Berger, E., Hazan, E., Kindler, G., Safra, M.: On non-approximability for quadratic programs. In: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pp. 206–215 (2005)
Balas, E., Ceria, S., Cornuéjols, G.: A lift-and-project cutting plane algorithm for mixed 0–1 programs. Math. Program. 58, 295–324 (1993)
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 46(1–3), 89–113 (2004)
Barak, B., Steurer, D.: Sum-of-squares proofs and the quest toward optimal algorithms. In Proceedings of International Congress of Mathematicians (ICM) (2014)
Charikar, M., Wirth, A.: Maximizing quadratic programs: Extending Grothendieck’s inequality. In: Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science, pp. 54–60 (2004)
de Klerk, E., Laurent, M.: A survey of semidefinite programming approaches to the generalized problem of moments and their error analysis. In: Araujo C., Benkart G., Praeger C., Tanbay B. (eds), World Women in Mathematics 2018. Association for Women in Mathematics Series, vol. 20, pp. 17–56. Springer, Cham (2019)
de Klerk, E., Laurent, M.: Convergence analysis of a Lasserre hierarchy of upper bounds for polynomial minimization on the sphere. Math. Program. (2020). https://doi.org/10.1007/s10107-019-01465-1
de Klerk, E., Laurent, M.: Worst-case examples for Lasserre’s measure based hierarchy for polynomial optimization on the hypercube. Math. Oper. Res. 45(1), 86–98 (2020)
Doherty, A.C., Wehner, S.: Convergence of SDP hierarchies for polynomial optimization on the hypersphere (2012). arXiv:1210.5048
Fang, K., Fawzi, H.: The sum-of-squares hierarchy on the sphere, and applications in quantum information theory. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01537-7
Fawzi, H., Saunderson, J., Parrilo, P.A.: Sparse sums of squares on finite abelian groups and improved semidefinite lifts. Math. Program. 160(1–2), 149–191 (2016)
Goemans, M., Williamson, D.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42(6), 1115–1145 (1995)
Karlin, A.R., Mathieu, C., Nguyen, C.T.: Integrality gaps of linear and semi-definite programming relaxations for knapsack. In: Günlük, O., Woeginger, G.J. (eds.) Integer Programming and Combinatoral Optimization, pp. 301–314. Springer, Berlin, Heidelberg (2011)
Kurpisz, A., Leppänen, S., Mastrolilli, M.: Tight Sum-of-Squares lower bounds for binary polynomial optimization problems. In: Chatzigiannakis, I., et al. (eds) 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016) vol. 78, pp. 1–14 (2016)
Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11(3), 796–817 (2001)
Lasserre, J.B.: A max-cut formulation of \(0/1\) programs. Oper. Res. Lett. 44, 158–164 (2016)
Lasserre, J.B.: An explicit exact sdp relaxation for nonlinear 0–1 programs. In: Aardal, K., Gerards, B. (eds.) Integer Programming and Combinatorial Optimization, pp. 293–303, Springer, Berlin, Heidelberg (2001)
Lasserre, J.B.: Moments. Positive Polynomials and Their Applications. Imperial College Press, London (2009)
Lasserre, J.B.: A new look at nonnegativity on closed sets and polynomial optimization. SIAM J. Optim. 21(3), 864–885 (2010)
Laurent, M.: A comparison of the Sherali–Adams, Lovász-Schrijver and Lasserre relaxations for 0-1 programming. Math. Oper. Res., 28(3):470–496 (2003)
Laurent, M.: Lower bound for the number of iterations in semidefinite hierarchies for the cut polytope. Math. Oper. Res. 28(4), 871–883 (2003)
Laurent, M.: Semidefinite representations for finite varieties. Math. Program. 109, 1–26 (2007)
Laurent, M.: Sums of squares, moment matrices and optimization over polynomials. In: Putinar, M., Sullivant, S. (eds.) Emerging Applications of Algebraic Geometry, Vol. 149 of IMA Volumes in Mathematics and its Applications, pp. 157–270. Springer (2009)
Lee, J.R., Raghavendra, P., Steurer, D.: Lower bounds on the size of semidefinite programming relaxations. In: STOC ’15: Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, pp. 567–576 (2015)
Levenshtein, V.I.: Universal bounds for codes and designs. In: Handbook of Coding Theory, vol. 9, pp. 499–648. North-Holland, Amsterdam (1998)
Lovász, L., Schrijver, A.: Cones of matrices and set-functions and 0–1 optimization. SIAM J. Optim. 1, 166–190 (1991)
Macwilliams, F., Sloane, N.: The Theory of Error Correcting Codes, volume 16 of North-Holland Mathematical Library. Elsevier (1983)
Natanson, I.P.: Constructive Function Theory, Vol. I Uniform Approximation (1964)
Nie, J., Schweighofer, M.: On the complexity of Putinar’s positivstellensatz. J. Complex. 23(1), 135–150 (2007)
O’Donnell, R.: SOS is not obviously automatizable, even approximately. In: 8th Innovations in Theoretical Computer Science Conference vol. 59, pp. 1–10 (2017)
Parrilo, P.A.: Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D thesis, California Institure of Technology (2000)
Raghavendra, P., Weitz, B.: On the bit complexity of sum-of-squares proofs. In: 44th International Colloquium on Automata, Languages, and Programming, vol. 80, pp. 1–13 (2017)
Reznick, B.: Uniform denominators in Hilbert’s seventeenth problem. Math. Z. 220(1), 75–97 (1995)
Rothvoss, T.: The Lasserre hierarchy in approximation algorithms. Lecture Notes for the MAPSP 2013 Tutorial (2013)
Sakaue, S., Takeda, A., Kim, S., Ito, N.: Exact semidefinite programming relaxations with truncated moment matrix for binary polynomial optimization problems. SIAM J. Optim. 27(1), 565–582 (2017)
Scherer, C.W., Hol, C.W.J.: Matrix sum-of-squares relaxations for robust semi-definite programs. Math. Program. 107, 189–211 (2006)
Schweighofer, M.: On the complexity of Schmüdgen’s positivstellensatz. J. Complex. 20(4), 529–543 (2004)
Sherali, H.D., Adams, W.P.: A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM J. Discret. Math. 3, 411–430 (1990)
Slot, L., Laurent, M.: Improved convergence analysis of Lasserre’s measure-based upper bounds for polynomial minimization on compact sets. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01468-3
Szegö, G.: Orthogonal Polynomials. vol. 23 in American Mathematical Society colloquium publications. American Mathematical Society (1959)
Slot, L., Laurent, M.: Near-optimal analysis of of Lasserre’s univariate measure-based bounds for multivariate polynomial optimization. Math. Program. (2020). https://doi.org/10.1007/s10107-020-01586-y
Terras, A.: Fourier Analysis on Finite Groups and Applications. London Mathematical Society Student Texts, Cambridge University Press, Cambridge (1999)
Tunçel, L.: Polyhedral and Semidefinite Programming Methods in Combinatorial Optimization. Fields Institute Monograph (2010)
Vallentin, F.: Semidefinite programs and harmonic analysis. arXiv:0809.2017 (2008)
Acknowledgements
We wish to thank Sven Polak and Pepijn Roos Hoefgeest for several useful discussions. We also thank the anonymous referees for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work is supported by the European Union’s Framework Programme for Research and Innovation Horizon 2020 under the Marie Skłodowska-Curie Actions Grant Agreement No. 764759 (MINOA)
Appendices
The q-ary cube
In this section, we indicate how our results for the boolean cube \({\mathbb {B}}^{n}\) may be extended to the q-ary cube \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n} = \{0, 1, \dots , q-1\}^n\) when \(q > 2\) is a fixed integer. Here \({\mathbb {Z}}/ q{\mathbb {Z}}\) denotes the cyclic group of order q, so that \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}={\mathbb {B}}^{n}\) when \(q=2\). The lower bound \(\smash {f_{({r})}}\) for the minimum of a polynomial f over \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) is defined analogously to the case \(q=2\); namely we set
where the condition means that \(f(x) - \lambda \) agrees with a sum of squares \(s \in \varSigma [x]_{2r}\) for all \(x \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) or, alternatively, that \(f - \lambda - s\) belongs to the ideal generated by the polynomials \(x_i(x_i -1)\ldots (x_i - q + 1)\) for \(i \in [n]\). Similarly, the upper bound \(\smash {f^{({r})}}\) is defined as in (7) after equipping \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) with the uniform measure \(\mu \). The parameters \(\smash {f_{({r})}}\) and \(\smash {f^{({r})}}\) may again be computed by solving a semidefinite program of size polynomial in n for fixed \(r, q \in {\mathbb {N}}\), see [23].
As before d(x, y) denotes the Hamming distance and |x| denotes the Hamming weight (number of nonzero components). Note that, for \(x,y\in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), d(x, y) can again be expressed as a polynomial in x, y, with degree \(q-1\) in each of x and y.
We will prove Theorem 11 below, which can be seen as an analog of Corollary 1 for \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\). The general structure of the proof is identical to that of the case \(q=2\). We therefore only give generalizations of arguments as necessary. For reasons that will become clear later, it is most convenient to consider the sum-of-squares bound \(\smash {f_{({r})}}\) on the minimum \(f_{\min }\) of a polynomial f with degree at most \((q-1)d\) over \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), where \(d\le n\) is fixed.
Fourier analysis on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) and Krawtchouk polynomials Consider the space
consisting of the polynomials on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) with complex coefficients. We equip \({\mathcal {R}}[{x}]\) with its natural inner product
where \(\mu \) is the uniform measure on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\). The space \({\mathcal {R}}[{x}]\) has dimension \(|\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}| = q^n\) over \({\mathbb {C}}\) and it is spanned by the polynomials of degree up to \((q-1)n\). The reason we now need to work with polynomials with complex coefficients is that the characters have complex coefficients when \(q>2\).
Let \(\psi = e^{2\pi i / q}\) be a primitive q-th root of unity. For \(a \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), the associated character \(\chi _a \in {\mathcal {R}}[{x}]\) is defined by:
So (14) is indeed the special case of this definition when \(q=2\). The set of all characters \(\{ \chi _a : a \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n} \}\) forms an orthogonal basis for \({\mathcal {R}}[{x}]\) w.r.t. the above inner product \(\langle \cdot , \cdot \rangle _{\mu }\). A character \(\chi _a\) can be written as a polynomial of degree \((q-1) \cdot |a|\) on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), i.e., we have \(\chi _a \in {\mathcal {R}}[{x}]_{(q-1)|a|}\) for all \(a \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\).
As before, we have the direct sum decomposition into pairwise orthogonal subspaces:
where \(H_i\) is spanned by the set \(\{ \chi _a : |a| = i \}\) and \(H_i\subseteq {\mathbb {R}}[x]_{(q-1)i}\). The components \(H_i\) are invariant and irreducible under the action of \({{\,\mathrm{Aut}\,}}(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n})\), which is generated by the coordinate permutations and the action of \(\text {Sym}(q)\) on individual coordinates. Hence any \(p \in {\mathcal {R}}[{x}]\) of degree at most \((q-1)d\) can be (uniquely) decomposed as:
As before \({\mathrm{St}}(0) \subseteq {{\,\mathrm{Aut}\,}}(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n})\) denotes the stabilizer of \(0 \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), which is generated by the coordinate permutations and the permutations in \(\text {Sym}(q)\) fixing 0 in \(\{0,1,\ldots ,q-1\}\) at any individual coordinate. We note for later reference that the subspace of \(H_i\) invariant under action of \({\mathrm{St}}(0)\) is of dimension one, and is spanned by the zonal spherical function:
The Krawtchouk polynomials introduced in Sect. 2 have the following generalization in the q-ary setting:
Analogously to relation (20), the Krawtchouk polynomials \({\mathcal {K}}^n_k\) (\(0\le k\le n\)) are pairwise orthogonal w.r.t. the discrete measure \(\omega \) on [0 : n] given by:
To be precise, we have:
As \({\mathcal {K}}^n_k(0) = (q-1)^k {n \atopwithdelims ()k} = \Vert {\mathcal {K}}^n_k \Vert ^2_{\omega }\), we may normalize \({\mathcal {K}}^n_k\) by setting:
so that \(\widehat{{\mathcal {K}}}^n_k\) satisfies \(\max _{t=0}^n \widehat{{\mathcal {K}}}^n_k(t) = \widehat{{\mathcal {K}}}^n_k(0) = 1\) (cf. 23).
We have the following connection (cf. 22) between the characters and the Krawtchouk polynomials:
Note that for all \(a, x, y \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), we have:
Hence, for any \(x,y\in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\), we also have (cf. 21):
Invariant kernels In analogy to the binary case \(q=2\), for a degree r univariate polynomial \(u\in {\mathbb {R}}[t]_r\) we define the associated polynomial kernel \(K(x,y) := u^2(d(x,y))\) (\(x,y\in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\)) and the associated kernel operator:
Note that K(x, y) is a polynomial on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) with degree \(2r(q-1)\) in each of x and y. Let us decompose the univariate polynomial \(u(t)^2\) in the Krawtchouk basis as
Then the kernel operator \({\mathbf {K}}\) acts as follows on characters: for \(z \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\),
which can be seen by retracing the proof of Theorem 6, and we obtain the Funk–Hecke formula (recall (28)): for any polynomial \(p \in {\mathcal {R}}[{x}]_{(q-1)d}\) with Harmonic decomposition \(p=p_0+\cdots +p_d\),
Performing the analysis It remains to find a univariate polynomial \(u \in {\mathbb {R}}[t]\) of degree r with \(u^2(t) = \sum _{i=0}^{2r}\lambda _i {\mathcal {K}}^n_i(t)\) for which \(\lambda _0 = 1\) and the other scalars \(\lambda _i\) are close to 1. As before (cf. 29), we have:
So we would like to minimize \(\sum _{i=1}^{2r} (1-\lambda _i)\). We are therefore interested in the inner Lasserre hierarchy applied to the minimization of the function \(g(t) = d - \sum _{i = 0}^d \widehat{{\mathcal {K}}}^n_i(t)\) on the set [0 : n] (equipped with the measure \(\omega \) from (47)). We show first that this function g again has a nice linear upper estimator.
Lemma 18
We have:
for all \(k \le n\).
Proof
The proof is almost identical to that of Lemma 4. Let \(t\in [0:n-1]\) and \(0 < k \le d\). Consider the elements \(1^t 0^{n-t}, 1^{t+1} 0^{n-t-1} \in \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) from Lemma 2. Then we have:
where for the inequality we note that \(\chi _a(1^{t} 0^{n-t}) = \chi _a(1^{t+1} 0^{n-t-1})\) if \(a_{t+1} = 0\). As \({\mathcal {K}}^n_k(0) = (q-1)^k {n \atopwithdelims ()k}\), this implies that:
This shows the first inequality of (49). The second inequality follows using the triangle inequality, a telescope summation argument and the fact that \(\widehat{{\mathcal {K}}}^n_k(0)=1\). \(\square \)
From Lemma 18 we obtain that the function \(g(t)= d - \sum _{i = 0}^d \widehat{{\mathcal {K}}}^n_i(t) \) admits the following linear upper estimator: \(g(t) \le d(d+1) \cdot (t/n)\) for \(t \in [0:n]\). Now the same arguments as used for the case \(q=2\) enable us to conclude:
and, when \(d(d+1) \xi _{r+1,q}^n / n \le 1/2\),
Here \(C_d\) is a constant depending only on d and \(\xi _{r+1,q}^n\) is the least root of the Krawtchouk polynomial \({\mathcal {K}}^n_{r+1, q}\). Note that as the kernel \(K(x,y) = u^2(d(x,y))\) is of degree \(2(q-1)r\) in x (and y), we are only able to analyze the corresponding levels \((q-1)r\) of the hierarchies. We come back below to the question on how to show the existence of the above constant \(C_d\).
But first we finish the analysis. Having shown analogs of Theorem 1 and Theorem 3 in this setting, it remains to state the following more general version of Theorem 2, giving information about the smallest roots of the q-ary Krawtchouk polynomials.
Theorem 10
([26], Section 5) Fix \(t\in [0,{q-1\over q}]\). Then the smallest roots \(\xi ^n_{r,q}\) of the q-ary Krawtchouk polynomials \({\mathcal {K}}^n_{r, q}\) satisfy:
Here the above limit means that, for any sequences \((n_j)_j\) and \((r_j)_j\) of integers such that \(\lim _{j\rightarrow \infty } n_j=\infty \) and \(\lim _{j\rightarrow \infty }r_j/n_j=t\), we have \(\lim _{j\rightarrow \infty } \xi _{r_j,q}^{n_j}/n_j =\varphi _q(t)\).
Note that for \(q=2\) we have \(\varphi _q(t) ={1\over 2} -\sqrt{t(1-t)}\), which is the function \(\varphi (t)\) from (5). To avoid technical details we only quote in Theorem 10 the asymptotic analog of Theorem 2 (and not the exact bound on the root \(\xi ^n_{r,q}\) for any n). Therefore we have shown the following q-analog of Corollary 1.
Theorem 11
Fix \(d\le n\) and for \(n,r\in {\mathbb {N}}\) write
There exists a constant \(C_d>0\) (depending also on q) such that, for any \(t\in [0,{q-1\over q}]\), we have:
and, if \(d(d+1)\cdot \varphi _q(t) \le 1/2\), then we also have:
Here \(\varphi _q(t)\) is the function defined in (50). Recall that the limit notation \(r/n\rightarrow t\) means that the claimed convergence holds for any sequences \((n_j)_j\) and \((r_j)_j\) of integers such that \(\lim _{j\rightarrow \infty } n_j=\infty \) and \(\lim _{j\rightarrow \infty }r_j/n_j=t\).
For reference, the function \(\varphi _q(t)\) is shown for several values of q in Fig. 1.
The function \(\varphi _q(t)\) for several values of q. Note that the case \(q=2\) corresponds to the function \(\varphi (t)\) of (5)
A generalization of Lemma 5 The arguments above omit a generalization of Lemma 5, which is instrumental to show the existence of the constant \(C_d\) claimed above. In other words, we still need to show that if \(p : \mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n} \rightarrow {\mathbb {R}}\) is a polynomial of degree \((q-1)d\) on \(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n}\) with harmonic decomposition \(p=p_0+\ldots +p_d\), there then exists a constant \(\gamma _d > 0\) (independent of n) such that:
Then, as in the binary case, we may set \(C_d=d(d+1)\gamma _d\). The proof given in Sect. 6 for the case \(q=2\) applies almost directly to the general case, and we only generalize certain steps as required. So consider again the parameters:
Lemmas 12 and 13, which show that the optimum solution p to \(\rho (n, d, k)\) may be assumed to be invariant under \({\mathrm{St}}(0) \subseteq {{\,\mathrm{Aut}\,}}(\mathbb ({\mathbb {Z}}/ q{\mathbb {Z}})^{n})\), clearly apply to the case \(q > 2\) as well. That is to say, we may assume p is of the form:Footnote 2
where \(X_i = \sum _{|a| = i}\chi _a \in H_i\) is the zonal spherical function of degree \((q-1)i\) (cf. 46 and 18). Using (48), we obtain a reformulation of \(\rho (n, d, k)\) as an LP (cf. 41):
For \(k \in {\mathbb {N}}\), let \(\widehat{{\mathcal {K}}}^\infty _k(t) := \lim _{n \rightarrow \infty } \widehat{{\mathcal {K}}}^n_k(nt) = \Big (1 - \frac{q}{q-1}t\Big )^k\) and consider the program (cf. 43):
As before, we have \(\rho (n, d, k) \le \rho (\infty , d, k)\), noting that (the proofs of) Lemmas 16 and 17 may be applied directly to the case \(q>2\). From there, it suffices to show \(\rho (\infty , d, k) < \infty \), which can be argued in an analogous way to the case \(q=2\).
Matrix-valued polynomials
In this section, we show how the arguments used for the proofs of our main results in Theorems 1 and 3 may be applied in the setting of matrix-valued polynomials, thereby proving Theorems 4 and 5.
Recall that \({\mathcal {S}}^k\) is the space of \(k\times k\) real symmetric matrices and \({\mathcal {S}}^k[x] \subseteq {\mathbb {R}}^{k \times k}[x]={\mathbb {R}}[x]^{k\times k}\) is the space of n-variate polynomials whose coefficients lie in \({\mathcal {S}}^k\). Given a polynomial matrix \(F\in {\mathcal {S}}^k[x]\) we consider the matrix-valued polynomial optimization problem:
for which we have the outer Lasserre hierarchy:
and the inner Lasserre hierarchy:
Here, the set \(\varSigma ^{k \times k}_r\) consists of all sum-of-squares polynomial matrices \(S \in {\mathcal {S}}^k[x]\), of the form:
The outer hierarchy (proof of Theorem4) We generalize the outline of Sect. 1.5 to the matrix-valued setting. Let \(F\in {\mathcal {S}}^k[x]\) be the polynomial matrix of degree d to be optimized, and assume w.l.o.g. that \(0 \le \Vert F\Vert _\infty \le 1\). Here, and throughout this section, \(\Vert F\Vert _\infty := \max _{x \in {\mathbb {B}}^{n}} \Vert F(x) \Vert \) is the largest absolute value of an eigenvalue of F(x) over \({\mathbb {B}}^{n}\). A kernel \(K\) of the form \(K(x, y) = u^2(d(x,y))\) with \(u \in {\mathbb {R}}[t]_r\) (cf. 8) induces a linear operator \({\mathbf {K}}\) on \({\mathcal {S}}^k[x]\) by:
If \(P(x) \succeq 0\) for all \(x \in {\mathbb {B}}^{n}\), then the polynomial \({\mathbf {K}}P\) is a sum-of-squares polynomial matrix of degree at most 2r on \({\mathbb {B}}^{n}\). Indeed, we then have:
Given \(\delta > 0\) to be determined later, set \({\tilde{F}} = F + \delta I\). Assuming that \({\mathbf {K}}\) is non-singular, we can write \(F = {\mathbf {K}}({\mathbf {K}}^{-1} {\tilde{F}})\). Therefore, assuming that \({\mathbf {K}}^{-1} {\tilde{F}}\) is positive semidefinite over \({\mathbb {B}}^{n}\), we find that \(F + \delta I\) is a sum-of-squares polynomial matrix of degree 2r on \({\mathbb {B}}^{n}\), and thus that \(F_{\min } - F_{(r)} \le \delta \).
To guarantee positive semidefiniteness of \({\tilde{F}}\), it suffices to ensure that (cf. 9):
Indeed, as the smallest eigenvalue of \({\tilde{F}}(x)\) is at least \(\delta \) for each \(x \in {\mathbb {B}}^{n}\), the smallest eigenvalue of \({\mathbf {K}}^{-1}F(x)\) must then be at least zero.
As in the case of scalar-valued polynomials, the eigenvalues of \({\mathbf {K}}\) are given by the coefficients \(\lambda _i\) in the expansion \({u^2(t) = \sum _{i = 0}^{2r} \lambda _i {\mathcal {K}}^n_i(t)}\). Indeed, if \(P \in {\mathcal {S}}^k[x]\) is a polynomial matrix of degree d then we may decompose it into harmonic components entry-wise to obtain \({P(x) = \sum _{i=0}^d P_i(x)}\) and (cf. Theorem 6):
It remains to express the quantity \(\Vert {\mathbf {K}}^{-1} - I\Vert \) in terms of the eigenvalues \(\lambda _i\) of \({\mathbf {K}}\), after which the proof proceeds as in the case of scalar polynomials. For this, note that:
where \(\gamma _d\) is the constant of Lemma 5 (cf. 34). The last inequality relies on the following generalization of Lemma 5, whose proof here is essentially as given in [11].
Lemma 19
Let \(P(x) = \sum _{i=0}^d P_i(x)\) be a polynomial matrix of degree d, decomposed into harmonic components. If \(\gamma _d\) is the constant of Lemma 5, we then have:
Proof
For any matrix \(M \in {\mathcal {S}}^k\), its spectral norm is \(\Vert M \Vert _{} = \max _{y \in {\mathbb {R}}^k} \{ |y^\top M y| : \Vert y\Vert = 1\}\). Therefore, we have:
For fixed y, the function \(p^y : x \mapsto y^\top P(x) y\) is a (scalar) polynomial on \({\mathbb {B}}^{n}\) of degree d, whose harmonic components are given by \(p^y_i : x \mapsto y^\top P_i(x) y\). Therefore, we may invoke Lemma 5 to bound:
and conclude that \(\Vert P_i\Vert _\infty \le \gamma _d \Vert P\Vert _\infty \). \(\square \)
The inner hierarchy (proof of Theorem5) We generalize the arguments of Sect. 5 to the matrix-valued setting. Let F again be the polynomial matrix of degree d to be optimized, and assume w.l.o.g. that \(0 \le \Vert F\Vert _\infty \le 1\) and that the minimum in the optimization problem (53) is attained at 0, i.e., that \(F_{\min } = \lambda _{\min } \big (F(0)\big )\).
As in the scalar case, we work to reduce problem (55) to a (now matrix-valued) instance of the inner hierachy in one variable. Note first that \(F^{(r)} \le F_\mathrm{sym}^{(r)}\) for each \(r \in {\mathbb {N}}\), where \(F_\mathrm{sym}^{(r)}\) is obtained by restricting the optimization in (55) to polynomial matrices S(x) of the form \(S(x) = U(|x|)\). Writing \({\widehat{F}}\) for the univariate polynomial matrix satisfying
we find (cf. 38):
It remains to analyze the program (56). We first give a linear upper estimator for \({\widehat{F}}\) (cf. Lemma 9).
Lemma 20
For all \(t \in [0:n]\), we have:
where \(C={\widehat{F}}(0)\) is a constant matrix with \(\lambda _{\min }(C) = 0\).
Proof
We may write \({\widehat{F}}(t) = \sum _{i=0}^d \varLambda _i \widehat{{\mathcal {K}}}^n_i(t)\) for certain \(\varLambda _i \in {\mathcal {S}}^k\). We then have:
making use of Lemma 19 and (24) for the final inequality. It remains to note that \(\sum _{i=0}^d \varLambda _i = {\widehat{F}}(0)\), and that \(\lambda _{\min } ({\widehat{F}}(0)) = 0\) by assumption. \(\square \)
As \({\widehat{F}}(t) \preceq {\widehat{G}}(t)\) for all \(t \in [0:n]\), we have \(F_\mathrm{sym}^{(r)} \le {\widehat{G}}^{(r)}_{[0:n], \omega }\), where:
is the inner Lasserre hierarchy for G computed on [0 : n] w.r.t. the measure \(\omega \). To conclude the argument, we prove the following generalization of Theorem 7 (see also Remark 1).
Corollary 3
Let \(G(t) = ct \cdot I + C\) be a linear matrix-valued polynomial with \(c>0\) and \(\lambda _{\min }(C) = 0\). Then we have:
where \(\xi ^n_{r+1}\) is the least root of the degree \(r+1\) Krawtchouk polynomial.
Proof
Let u be a unit eigenvector for C corresponding to (one of its) zero eigenvalues. Then for any univariate sum-of-squares polynomial \(s \in \varSigma _r\), the matrix-valued polynomial \(U(t) = s(t) u u^\top \) is a sum-of-squares polynomial matrix of degree 2r. Furthermore, for such a U we have:
and
Therefore, writing \(g(t) = ct\), and making use of Theorem 7 and Remark 1, we have:
This concludes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Slot, L., Laurent, M. Sum-of-squares hierarchies for binary polynomial optimization. Math. Program. 197, 621–660 (2023). https://doi.org/10.1007/s10107-021-01745-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-021-01745-9
Keywords
- Binary polynomial optimization
- Lasserre hierarchy
- Sum-of-squares polynomials
- Fourier analysis
- Krawtchouk polynomials
- Polynomial kernels
- Semidefinite programming
- Polynomial matrices