Abstract
This work derives upper bounds on the convergence rate of the moment-sum-of-squares hierarchy with correlative sparsity for global minimization of polynomials on compact basic semialgebraic sets. The main conclusion is that both sparse hierarchies based on the Schmüdgen and Putinar Positivstellensätze enjoy a polynomial rate of convergence that depends on the size of the largest clique in the sparsity graph but not on the ambient dimension. Interestingly, the sparse bounds outperform the best currently available bounds for the dense hierarchy when the maximum clique size is sufficiently small compared to the ambient dimension and the performance is measured by the running time of an interior point method required to obtain a bound on the global minimum of a given accuracy.
Similar content being viewed by others
1 Introduction
This work provides rates of convergence for the sums-of-squares hierarchy with correlative sparsity. For a positive \(n\in \mathbb {N}\), consider the polynomial optimization problem
where f is an element of the ring \(\mathbb {R}[x]\) of polynomials in \(x=(x_1,\dots ,x_n)\), and \(S(\textbf{g})\) is a basic compact semialgebraic set determined by a finite collection of polynomials \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\) by \(S(\textbf{g})=\{x\in \mathbb {R}^n:g_i(x)\ge 0,\;i=1,\dots ,{\bar{k}}\}\). An approach to attack this problem, first proposed by Lasserre [10] and Parrilo [20], is as follows: Imagine we knew that \(f(x)-\lambda \) could be written as
with \(g_0(x)=1\) and \(\sigma _j\) and \(\sigma _J\) being sum-of-squares (SOS) polynomials. Then the right-hand sides of each of these equations would be clearly nonnegative on \(S(\textbf{g})\), so we would know that \(f_{\min }\ge \lambda \). By bounding the degree of the SOS polynomials, we obtain the following two hierarchies of lower bounds:
where \(\Sigma [x]\) is the convex cone of all sum-of-squares polynomials. These satisfy \(\textrm{lb}_q(f,r)\le \textrm{lb}_p(f,r)\le f_{\min }\). The lower bound \( \textrm{lb}_q(f,r)\) is associated to a so-called quadratic module certificate, while \( \textrm{lb}_p(f,r)\) corresponds to a preordering certificate; this terminology is justified by the definitions in Sect. 1.2. The well-known Putinar and Schmüdgen Positivstellensätze [21, 23], respectively, guarantee that these bounds converge to \(f_{\min }\) as \(r\rightarrow +\infty \), the former with the additional assumption that the associated quadratic module be Archimedian.Footnote 1 Here we prove sparse quantitative versions of these results.
Polynomial optimization schemes have generated substantial interest due to their abundant fields of application; see for example [12, 13]. The first proof of convergence, without a convergence rate, was given by Lasserre [10] using the Archimedian Positivstellensatz due to Putinar [21]. Eventually, rates of convergence were obtained; initially in [18] these were logarithmic in the degree of the polynomials involved, and later on they were improved [2, 4, 14, 25] (using ideas of [3, 22]) to polynomial rates; refer to Table 1. The crux of the argument used to obtain those rates is a bound of the deformation incurred by a polynomial strictly-positive on the domain of interest, as it passes through an integral operator that closely approximates the identity and is associated to a strictly-positive polynomial kernel that is itself composed of sums of squares and similar to the Christoffel–Darboux and Jackson kernels (see Definition 10). More recently, results showing an exponential bound on the convergence rate was obtained in [1] with the additional assumption on the positive definiteness of the Hessian at all global minimizers.
The techniques used to obtain these results generally involve linear operators on the space of polynomials (mostly Christoffel–Darboux kernel operators; see [25]) that are close to the identity and that, for positive polynomials, are easily (usually, by construction) proved to output polynomials that are sums of squares and/or of their products with the functions in \(\textbf{g}\). All of these results deal, however, with the dense case.
In this work, we treat the case where the problem possesses the so-called correlative sparsity, where each function \(g_i\) depends only on a certain subset of variables and the function f decomposes as a sum of functions depending only on these subsets of variables. This structure can be exploited in order to define sparse lower bounds that are cheaper to compute but possibly weaker. Nevertheless, these sparse lower bounds allow one to tackle large-scale polynomial optimization problems arising from various applications including roundoff error bounds in computer arithmetic, quantum correlations and robustness certification of deep networks; see the recent survey [15]. In [11] Lasserre proved that these sparse lower bounds converge as the degree of the SOS multipliers tends to infinity provided the variable groups satisfy the so-called running intersection property (see Definition 1). A shorter and more direct proof was provided in [6], and was adapted in [17] to obtain a sparse variant of Reznick’s Positivstellensatz. In this work, we show polynomial rates of convergence for sparse hierarchies based on both Schmüdgen and Putinar Positivstellensätze. Importantly, we obtain rates that depend only on the size of the largest clique in the sparsity graph rather than the overall ambient dimension. This allows the perhaps surprising conclusion that, asymptotically, the sparse hierarchy is more accurate than the dense hierarchy for a given computation time of an optimization method, provided that the size of the largest clique is no more than the square root of the ambient dimension. This assumes that the running time of the optimization method is governed by the size of the largest PSD block and the number of such blocks in the semidefinite programming reformulations of the dense and sparse SOS problems which is the case for the interior point method as well as the most commonly used first-order methods.
To the best of our knowledge, these are the first quantitative results of this kind. Our proof techniques rely on an adaption of [6] and utilize heavily the recent results from [2, 14], and can thus be seen as a generalization of these works to the sparse setting.
Since our results are very technical and their full statement necessitates the introduction of a lot of notation, we have prepared a rough summary to help the reader get a glance at what they are, presented next. We urge the reader to mind the fact that we have not fully spelled-out all the details and definitions, for which we refer to the next section.
Definition 1
A collection \(\{J_1,\dots ,J_\ell \}\) of subsets of \(\{1,\dots ,n\}\supset J_j\) satisfies the running intersection property if for all \(1\le k\le \ell -1\) we have
Theorem 2
(Rough summary of results) Let:
-
\(n>0\), \(\ell >0\), and \({\bar{k}}\) be positive integers,
-
\({\textbf{r}}_1,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\) be some multi-indices with \(r_{j,i}\ge 1\),
-
\(J_1,\dots ,J_\ell \subset \{1,\dots ,n\}\) be sets of indices satisfying (RIP),
-
\(p_1,\dots ,p_\ell \) be polynomials such that \(p_j\) depends only on the variables \(x_i\) with \(i\in J_j\), and its degree in the variable \(x_i\) is \(\le r_{j,i}\),
-
\(p=p_1+p_2+\dots +p_\ell \) be a polynomial that this a sum of the polynomials \(p_1,\dots ,p_\ell \),
We will denote by \(|J_j|\) the cardinality of the set \(J_j\).
-
i.
(Schmüdgen-type, Theorem 6) Assume that, for large-enough \(c_1>0\) (determined explicitly in the statement of Theorem 6 and depending only on n and \(J_1,\dots , J_\ell \)),
$$\begin{aligned} p(x)\ge c_1\frac{\Vert p\Vert _{L^\infty ([-1,1]^n)}}{r_{j,i}^{\frac{2}{3+\max _j|J_j|}}} \end{aligned}$$(1)for all \(1\le i\le n\), \(1\le j\le \ell \), and \(x\in [-1,1]^n\). Then p can be written as a sum \(p=h_1+\dots +h_\ell \) of polynomials \(h_j\) that belong to the respective preorderings generated by the polynomials \(1-x_i^2\) with \(i\in J_j\) and only depend on the variables \(x_i\) with \(i\in J_j\). This means that there are polynomials \(\sigma _{j,K}\) that are sums of squares, that depend only on \(x_i\) for \(i\in J_j\), and such that
$$\begin{aligned} h_j=\sum _{K\subset J_j}\sigma _{j,K}\prod _{m \in K}(1-x_m^2). \end{aligned}$$The sum is taken over all (possibly empty) subsets K of \(J_j\); the product on the right is understood to be 1 when \(K=\emptyset \). Moreover, the degree of each term \(\sigma _{j,K}\prod _{m\in J_j}(1-x_m^2)\) in the variable \(x_i\) is bounded by \(r_{j,i}\).
-
ii.
(Putinar-type, Theorem 8) Additionally we let
-
\(K_1,\dots ,K_\ell \subset \{1,\dots ,{\bar{k}}\}\) be sets of indices,
-
\(g_1,\dots ,g_{\bar{k}}\) be polynomials such that, if \(m\in K_j\) then \(g_m\) depends only on the variables \(x_i\) with \(i\in J_j\), and satisfying some additional technical assumptions.Footnote 2
Now, instead of assuming (1), we assume that, for large-enough \(c_2,c_3>0\) (determined more-or-less explicitly in the statement of Theorem 8 and depending only on n, \(J_1,\dots ,J_\ell \), \(g_1,\dots ,g_{{\bar{k}}}\)),
$$\begin{aligned} p(x)\ge c_2\frac{(\Vert p\Vert _{L^\infty (S(\textbf{g}))}\deg p_j \sum _i{{\,\textrm{Lip}\,}}p_i)^{c_3}}{r_{j,i}^{\frac{1}{13+3|J_j|}}}, \end{aligned}$$for x in the set \(S(\textbf{g})=\{x\in \mathbb {R}^n:g_j(x)\ge 0,\;j=1,\dots ,{\bar{k}}\}\), and for all \(1\le i\le n\), \(1\le j\le \ell \). Then p can be written as a sum \(p=h_1+\dots +h_\ell \) of polynomials \(h_j\) that belong to the respective quadratic modules generated by the polynomials \((g_i)_{i\in K_j}\) and only depend on the variables \((x_i)_{i\in J_j}\). This means that there are polynomials \(\sigma _{j,0}\) and \(\sigma _{j,k}\) that are sums of squares, that depend only on \(x_i\) for \(i\in J_j\), and such that
$$\begin{aligned} h_j=\sigma _{j,0}+\sum _{k\in K_j}\sigma _{j,k}g_k. \end{aligned}$$Moreover, the degree of each term \(\sigma _{j,0}\) and \(\sigma _{j,k}g_k\) in the variable \(x_i\) is bounded by \(r_{j,i}+2\).
-
Although the exponents of \(r_{j,i}\) in the above statement are often much smaller than 2, hence making the rates slower than those that had been obtained (see Table 1) for the dense case, we have also analyzed the complexity involved in solving the corresponding optimization problems, showing that the sparse hierarchies may outperform the dense ones in certain situations.
We proceed to summarize this analysis. When using the sums-of-squares hierarchies, we use, as a proxy for the complexity necessary to obtain a certificate that a lower bound of the minimum of a polynomial p has been found to \(\varepsilon >0\) accuracy, the size and number of the positive-semidefinite (PSD) blocks in the corresponding semidefinite program (SDP).
We denote:
-
\(B_{\textrm{dense}}(\varepsilon )\) the size of PSD block in the dense case, which is equal to the number \(\left( {\begin{array}{c}n+r\\ r\end{array}}\right) \) of monomials of degree r in n variables, and in our argument we take r to be of the order of \(O(\varepsilon ^{-1/2})\) as in the best results listed in Table 1,
-
\(B_{\textrm{sparseSchm}}(\varepsilon )\) the size of the largest PSD block in the sparse case of Theorem 2i. (i.e., optimizing over \([-1,1]^n\) using a Schmüdgen-style scheme), multiplied by the number \(\ell \) of blocks,
-
\(B_{\textrm{sparsePut}}(\varepsilon )\) the size of the largest PSD block in the sparse case of Theorem 2ii. (i.e., optimizing over \(S(\textbf{g})\) using a Putinar-style scheme), multiplied by the number \(\ell \) of blocks.
The quotients
being \(<1\) is thus indicative of our rates being better than the dense ones.
Proposition 3
(Summary of SDP size results) With some technical assumptions, we have:
-
1.
(Proposition 7) If \(n>|J_j|^2+3|J_j|\) for all \(1\le j\le \ell \), then, for minimizing p over \([-1,1]^n\), we have
$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparseSchm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=0. \end{aligned}$$ -
2.
(Proposition 9) If \(n>6|J_j|^2+26|J_j|\) for all \(1\le j\le \ell \), then, for minimizing p over \(S(\textbf{g})\), we have
$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparsePut}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=0. \end{aligned}$$
In other words, our rates improve on those already found for the dense case as long as n is sufficiently large with respect to the sizes \(|J_j|\) of the blocks of variables indexed by the sets \(J_1,\dots , J_\ell \), and \(\varepsilon >0\) is sufficiently small.
Let us turn to some concrete examples. The functions in our examples were considered before in [27, §6.1], where sums-of-squares algorithms leveraging sparsity were benchmarked on them.
Example 4
Consider the chain singular function for \(x\in [-1,1]^{n}\):
where \(P=\{1,3,5,\dots ,n-3\}\) and n is a multiple of 4. Then we can take \(J_j=\{j,j+1,j+2,j+3\}\) for \(j=1,\dots , n-3\), so that these sets satisfy (RIP) and \(|J_j|=4\). The proofs of Propositions 7 and 9 show that in this case we have, for large-enough n and as \(\varepsilon \searrow 0\),
Example 5
Consider the Broyden banded function, defined for each \(n\in \mathbb {N}\) by
where \(P_i=\{j:j\ne i,\;\max (1,i-5)\le j\le \min (n,i+1)\}\), on the box \([-1,1]^{n}\). Then \(J_i=P_i\cup \{i\}\), and \(|J_i|\le 7\), and these sets satisfy (RIP). The proofs of Propositions 7 and 9 show that in this case we have, for large-enough n and as \(\varepsilon \searrow 0\),
The paper is organized as follows. The results are detailed below in Sect. 1.2 and further discussed in Sect. 1.2.1, after a brief interlude to establish some notations in Sect. 1.1. Some machinery is developed in Sects. 2 and 3, regarding variants of the Jackson kernel and some approximation theory, respectively, and the proofs of the main theorems are presented in Sect. 4.
1.1 Notation
Denote by \(\mathbb {R}\) the set of real numbers, by \(\mathbb {N}\) the set of positive integers, and by \(\mathbb {N}_0=\{0,1,\dots \}\) the set of nonnegative integers. Denote by \(e_1,\dots ,e_n\) the vectors of the standard basis of Euclidean space \(\mathbb {R}^n\).
For a Lipschitz continuous function \(f:[-1,1]^n\rightarrow \mathbb {R}\), we set
We take this to be at least 1 to simplify estimates below.
A multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) is an n-tuple of nonnegative integers \(i_k\), and its weight is denoted by
For a multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) and \(J\subset \{1,\dots ,n\}\), we write \(I\subseteq J\) to indicate that the index k of every nonzero entry \(i_k>0\) is contained in J, that is, \(k\in J\) for all \(1\le k\le n\) such that \(i_k>0\).
Similarly, given a multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) and a subset \(J\subseteq \mathbb {N}_0\), we let \(I_{J}\) be the multi-index whose k-th entry is either \(i_k\) if \(k\in J\) or 0 if \(k\notin J\).
For two multi-indices \(I\) and \(I'\), we write \(I\le I'\) if the entrywise inequalities \(i_k\le i'_k\) hold for all \({1}\le k\le n\).
We distinguish two special multi-indices:
We denote \(x^I=x_1^{i_1}x_2^{i_2}\dots x_n^{i_n}\). Also, we denote the Hamming weight of \(I\in \mathbb {N}_0^n\) by
In other words, \(w(I)\) is the number of nonzero entries in I.
We denote the space of polynomials in n variables by \(\mathbb {R}[x]_{}\), and within this set we distinguish the subspace \(\mathbb {R}[x]_{d}\) of polynomials of total degree at most d. We denote, for a polynomial \(p(x)=\sum _{I}c_Ix^I\), by \({{\,\mathrm{\overline{\deg }}\,}}p\) the vector whose i-th entry is the degree of p in \(x_i\),
Observe that \(\deg p\le \left| {{\,\mathrm{\overline{\deg }}\,}}p\right| = \sum _{k=1}^n \max _{c_I \ne 0} i_k\).
Set also
Given a subset \(J\subset \{1,\dots ,n\}\), we let \(\mathbb {R}[x_{J}]_{}\) denote the set of polynomials in the variables \(\{x_j\}_{j\in J}\). For a multi-index \({\textbf{r}}=(r_1,\dots ,r_n)\in \mathbb {N}_0^n\), we let \(\mathbb {R}[x_{}]_{{\textbf{r}}}\) denote the set of polynomials p such that, if \(p(x)=\sum _{I}c_Ix^I\) for some real numbers \(c_I\in \mathbb {R}\), then for each \(I=(i_1,\dots ,i_n)\) with \(c_I\ne 0\) we also have \(i_k\le r_k\) for \(1\le k\le n\). Finally, we let \(\mathbb {R}[x_{J}]_{{\textbf{r}}}=\mathbb {R}[x_{J}]_{}\cap \mathbb {R}[x_{}]_{{\textbf{r}}}\); in other words, \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\) is the set of polynomials p with \({{\,\mathrm{\overline{\deg }}\,}}p\le {\textbf{r}}\) in the variables \(\{x_j:j\in J\}\subseteq \{x_1,\dots ,x_n\}\).
Given a set X, we write \(X^n\) to denote the product
We denote by \(\Vert \cdot \Vert _\infty \) the supremum norm on \([-1,1]^n\).
The notation \(\lceil s\rceil \) stands for the least integer \(\ge s\).
1.2 Results
Let \({\Sigma [x_{J}]}\) denote the set of polynomials p that are sums of squares of polynomials in \(\mathbb {R}[x_{J}]_{}\), that is, of the form \(p=p_1^2+\dots +p_\ell ^2\) for \(p_1,\dots ,p_\ell \in \mathbb {R}[x_{J}]_{}\).
Let \({\bar{k}}\in \mathbb {N}\) and let \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\) be a collection of polynomials \(g_i\in \mathbb {R}[x]_{}\) defining a set
For convenience, denote also \(g_0=1\). To the collection \(\textbf{g}\) and a multi-index \({\textbf{r}}\), we associate the (variable- and degree-wise truncated) quadratic module associated to the collection \(\textbf{g}\) and a multi-index \({\textbf{r}}\) be
Similarly, we have a (variable- and degree-wise truncated) preordering
where
Denote, for \(j=2,3,\dots ,\ell \),
Then (RIP) is the condition that, for all \(1\le k\le \ell -1\), there is some \(s\le k\) such that \(\mathcal {J}_{k+1}\subset J_s\).
1.2.1 Sparse Schmüdgen-type representation on \([-1,1]^n\)
Let \({\overline{L}}:=\sum _{k=1}^\ell {{\,\textrm{Lip}\,}}p_k\), \(M:=\max _{\begin{array}{c} 1\le k\le \ell \\ 1\le m\le n \end{array}}({{\,\mathrm{\overline{\deg }}\,}}p_k)_m\), and \({\overline{J}}:=\max _{1\le k\le \ell }|J_k|\).
Theorem 6
Let \(n>0\) and \(\ell \ge 2\), and let \({\textbf{r}}_1,{\textbf{r}}_2,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\),
be nowhere-vanishing multi-indices.
Let also \(J_1,\dots ,J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP).
Let \(p=p_1+p_2+\dots +p_\ell \) be a polynomial that is the sum of finitely many polynomials \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\). Then if \(p\ge \varepsilon \) on \([-1,1]^n\), we have
as long as, for all \(1\le j\le \ell \) and all \(1\le i\le n\),
For small enough \(0<\varepsilon <4C_{\textrm{Jac}}(\ell +2){\overline{J}}\,{\overline{L}}/M\), this boils down to
with
The proof is presented in Sect. 4.1.
Discussion
Solving the dense problem considered by [14] using the sum-of-squares hierarchy reduces to a semidefinite program with the largest PSD block of size \(\left( {\begin{array}{c}n+r\\ r\end{array}}\right) \) that typical optimization methods (e.g., interior point or first order) can solve in an amount of time proportional to a power of
at least when certain non-degeneracy conditions are satisfied [5]. The bounds we find in Theorem 6 —in the case in which \(J_j\) is the largest of the sets \(J_1,\dots ,J_\ell \)— give a bound for the complexity of the leading term as (the same power of)
The reason we have \(|{\textbf{r}}_j|\le |J_j|C'\varepsilon ^{-|J_j|-3}\) is that \(r_{j,i}\le O(\varepsilon ^{-\frac{|J_j|+3}{2}})\) and there are at most \(|J_j|\) values of i with \(r_{j,i}\ne 0\).
Proposition 7
If \(n>|J_j|(|J_j|+3)\) for all \(j=1,\dots ,\ell \), then we have
Thus, if the size of the largest clique is of the order of square root of the ambient dimension n or smaller, the sparse bound outperforms the best available dense bound if the performance is measured by the amount of time required by an optimization method to find a bound of a given accuracy \(\varepsilon \).
Proof of Proposition 7
by Lemma 20 we have, as \(\varepsilon \searrow 0\),
and this tends to 0 if the sparsity of the polynomial p is such that \(n>|J_j|(|J_j|+3)\). \(\square \)
1.2.2 Sparse Putinar-type representation on arbitrary domains
For a set of polynomials \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\), denote
and, for a subset \(K\subset \{1,\dots ,{\bar{k}}\}\), denote
Theorem 8
Let \(n>0\), \({\bar{k}}>0\), \(\ell \ge 2\), \(J_1,\dots ,J_\ell \subset \{1,\dots ,n\}\), \({\textbf{r}}_1,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), and \(p=p_1+\dots +p_\ell \) with \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\).
Assume that the sets \(J_1,\dots ,J_\ell \) satisfy (RIP).
Let \(K_1,\dots ,K_\ell \subset \{1,\dots ,{\bar{k}}\}\) and let \(\textbf{g}=\{g_1,\dots ,g_{\bar{k}}\}\subset \mathbb {R}[x]_{}\) be a collection of \({\bar{k}}\) polynomials such that, if \(i\in K_j\) for some \(1\le j\le \ell \), then \(g_i\in \mathbb {R}[x_{J_j}]_{}\). Assume that
Assume that \(S(\textbf{g})\subset [-1,1]^n\) and that there exist polynomials \(s_{j,i}\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\), \(j=1,\dots ,\ell \), \(i\in \{0\}\cup K_j\), such that the Archimedean conditions
hold; that is to say, we assume that \(1-\sum _{i\in J_j}x_i^2\in \mathcal Q_{{\textbf{r}}_j,J_j}(\textbf{g}_{K_j})\). Let \({\mathsf c}_1,\dots ,{\mathsf c}_{\ell }\ge 1\) and \({\mathsf L}_1,\dots ,{\mathsf L}_{\ell }\ge 1\) be constants such thatFootnote 3
Then there are constants \({\textbf{C}_{j}}>0\), depending only on \(\textbf{g}\), \(J_1,\dots ,J_\ell \), such that, if \(p\ge \varepsilon >0\) on \(S(\textbf{g})\), we have
as long as, for all \(1\le j\le \ell \) and \(1\le k\le n\),
The proof of the theorem can be found in Sect. 4.2.
Discussion By the same arguments we used in the discussion at the end of the previous section, if we assume \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\), the bounds we find in Theorem 8 give a bound for the complexity of the leading term as (a power of)Footnote 4
at least when certain non-degeneracy conditions are satisfied [5]. The assumption \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\) is realized for example when the so-called constraint qualification condition that, at each point \(x\in S(\textbf{g})\) all the active constraints \(g_{i_1},\dots ,g_{i_l}\) (i.e., those satisfying \(g_{i_j}(x)=0)\) have linearly independent gradients \(\nabla g_{i_1}(x),\dots ,\nabla g_{i_l}(x)\)), holds; this latter statement is proved in [2, Thm 2.11].
In this case, we have:
Proposition 9
If \(\frac{n}{>}|J_j|(26+6|J_j|)\) for all \(j=1,\dots ,\ell \) and if \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\), then we have
Again the implication is that the sparse bound asymptotically outperforms the dense bound provided that the largest clique is sufficiently small.
Proof of Proposition 9
Lemma 20 gives
which tends to 0 if \(\frac{n}{2}>|J_j|(13+3|J_j|)\). \(\square \)
Organization of the paper The proof of Theorem 6 can be seen as a variable-separated version of the proof in [14], which relies on the Jackson kernel. Therefore in Sect. 2 we derive the suitable ingredients for sparse Jackson kernels while carefully taking into account each variable separately.
A strategy is also required to write a positive polynomial p that is known to be a sum \(p=p_1+\dots +p_\ell \) with \(p_i\in \mathbb {R}[x_{J_i}]_{}\) as a similar sum \(p=h_1+\dots +h_\ell \) but now with \(h_j\in \mathbb {R}[x_{J_j}]_{}\) and \(h_j\ge 0 \) on \([-1,1]^{|J_j|}\); this is done in Sect. 3.
Section 4 gives the proofs of Theorems 6 and 8, together with the statement and proof of Lemma 20, which was used in the proofs of Propositions 7 and 9 above.
2 The sparse Jackson kernel
In this section, we derive Corollary 13, one of the main ingredients of the proof of Theorem 6. The corollary guarantees that polynomials bounded from below by \(\varepsilon >0\) on \([-1,1]^n\) are in the preordering \(\mathcal P_{\textbf{r},J}(\{1-x_i^2\}_{i\in J})\) (defined in Sect. 1.2) for \(\varepsilon \) large enough. The corollary follows from Theorem 14, which gives a refined estimate of the distance between a polynomial p and its preimage under a Jackson-style operator that treats each variable separately. We begin with some preliminaries and a useful lemma.
The measure \(\mu _{n}\) on the box \([-1,1]^n\) defined by
is known as the (normalized) Chebyshev measure; it is a probability measure on \([-1,1]^n\). It induces the inner product
and the norm \(\Vert f\Vert _{\mu _{n}}=\sqrt{\langle f , f \rangle _{\mu _{n}}}\).
For \(k=0,1,\dots \), we let \(T_k\in \mathbb {R}[x]_{}\) be the univariate Chebyshev polynomial of degree k, defined by
The Chebyshev polynomials satisfy \(|T_k(x)|\le 1\) for all \(x\in [-1,1]\), and
For a multi-index \(I=(i_1,\dots ,i_n)\), we let
be the multivariate Chebyshev polynomials, which then satisfy (see for example [28, §II.A.1]), for multi-indices \(I\) and \(I'\),
Thus \(p\in \mathbb {R}[x]_{d}\) can be expanded as \(p=\sum _{|I|\le d}2^{w(I)}\langle p , T_I \rangle _{\mu _{n}}T_I\).
If we let, for a finite collection \(\Lambda \subseteq \mathbb {R}\times \mathbb {N}_0^n\) of pairs \((\lambda ,I)\) of a real number \(\lambda \) and a multi-index \(I\),
then, for any \(p\in \mathbb {R}[x]_{}\), we have
This means that, if we set all the nonzero numbers \(\lambda \) equal to 1, then \({\textbf{K}_{}^{\Lambda }}\) is the identity operator in the linear span of \( \{T_I:{\exists \lambda \ne 0\;\mathrm {s.t.}\;}(\lambda ,I)\in \Lambda \}\subseteq \mathbb {R}[x]_{d}\).
We let, for \(r,k\in \mathbb {N}\),
and
We set, for \({\textbf{r}}=(r_1,\dots ,r_n)\in \mathbb {N}_0^n\),
and
Then \({K_{{\textbf{r}}}^{\textrm{Jac}}}=K_{}^{\Lambda _{\textbf{r}}}\) is the (\({\textbf{r}}\)-adapted) Jackson kernel, and its associated linear operator \({\textbf{K}_{}^{\Lambda _{\textbf{r}}}}\) is denoted \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\).
Lemma 10
Let \({\textbf{r}}\in \mathbb {N}_0^n\) be a multi-index. The operator \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) defined above has the following properties:
-
i.
\({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(\mathbb {R}[x_{J}]_{{\textbf{r}}})\subseteq \mathbb {R}[x_{J}]_{{\textbf{r}}}\).
-
ii.
We have
$$\begin{aligned} {\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(T_I)={\left\{ \begin{array}{ll} \lambda ^{\textbf{r}}_IT_I, &{} I\le {\textbf{r}},\\ 0,&{}\text {otherwise.} \end{array}\right. } \end{aligned}$$In particular, \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(1)=1\).
-
iii.
\({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) is invertible in \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\) with \(J=\{i:1\le i\le n,\;r_i>0\}\).
-
iv.
\(0< \lambda ^{\textbf{r}}_I\le 1\) for all \(0\le I\le {\textbf{r}}\).
-
v.
For \(I=(i_1,\dots ,i_n)\) and \( {\textbf{r}}=(r_1,\dots ,r_n)\) in \(\mathbb {N}_0^n\),
$$\begin{aligned} |1-\lambda ^{\textbf{r}}_I|=1-\lambda ^{\textbf{r}}_I\le n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}. \end{aligned}$$ -
vi.
For \(I=(i_1,\dots ,i_n)\) and \( {\textbf{r}}=(r_1,\dots ,r_n)\) in \(\mathbb {N}_0^n\) that verify (9), we have
$$\begin{aligned} \left| 1-\frac{1}{\lambda ^{\textbf{r}}_I}\right| \le 2n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}. \end{aligned}$$ -
vii.
Let \(p \in \mathbb {R}[x]_{J,{\textbf{r}}}\) with \(p(x)\ge 0\) for all \(x\in [-1,1]^n\) and \(\Vert p\Vert _\infty \le 1\). Let \({\textbf{r}}=(r_1,\dots ,r_n)\) be a multi-index such that \(I\le {\textbf{r}}\) for all \(I\in {\mathcal I_{p}}\), and assume that, for all \(I=(i_1,\dots ,i_n)\in {\mathcal I_{p}}\), condition (9) is verified. Then we have
$$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p)-p\right\| _\infty \le 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{\begin{array}{c} I\in {\mathcal I_{p}}\\ 1\le j\le n \end{array}}\left[ 2^{w(I)/2}\frac{i_j^2}{(r_j+2)^2}\right] . \end{aligned}$$ -
viii.
\({K_{{\textbf{r}}}^{\textrm{Jac}}}(x,y)\ge 0\) for all \(x,y\in [-1,1]^n\).
-
ix.
If \(p\in \mathbb {R}[x]_{}\) is such that \(p(x)\ge 0\) for \(x\in [-1,1]^n\), then \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)(x)\ge 0\) for all \(x\in [-1,1]^n\).
Proof
Throughout, we follow [14].
Item (ii) is immediate from the definitions and (7). Item (i) follows from item (ii) and the fact that \(\{T_I:I\le {\textbf{r}},\;I\subseteq J\}\) is a basis for \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\).
Observe that item (ii.) means that \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) is diagonal in \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\), so in order to prove item (iii.) it suffices to show that \(\lambda ^{\textbf{r}}_I>0\) for all \(I\le {\textbf{r}}\), \(I\subseteq J\). This follows immediately from item (iv.), which in turn follows from the definition of \(\lambda ^{\textbf{r}}_I\) and [14, Proposition 6(ii)], which shows that \(0<\lambda _k^r\le 1\) for all \(0\le k\le r\).
Similarly, by [14, Proposition 6(iii)] we have that, if \(k\le r\), then
Thus, if \(\gamma _j=1-\lambda _{i_j}^{r_j}\le \pi ^2i_j^2/(r_j+2)^2 \) and \(\gamma =\max _j\gamma _j\), we also have, using Bernoulli’s inequality [14, Lemma 11],
This shows item (v). Using it, we can prove item (vi) as follows: condition (9) implies, by item (v), that \(|1-\lambda ^{\textbf{r}}_I|\le 1/2\), and hence \(|\lambda ^{\textbf{r}}_I|\ge 1/2\), so
leveraging item (v) again.
Let us show item (vii). From items (ii) and (iii), we have
because \(|T_I(x)|\le 1\) for all \(x\in [-1,1]^n\). Plugging in the estimate from item (vi.), we get
where we have also used
which follows from (7).
To prove item (viii), let, for a fixed multi-index \({\textbf{r}}\),
and observe that
where \({\textbf{K}_{x_k}^{\Lambda _{k}}}\) is the operator \({\textbf{K}_{}^{\Lambda _{k}}}\) acting in the variable \(x_k\), i.e.,
Equation (8) follows from the identity
that can be checked from the definitions. Item (viii) then follows from the well-known fact that \({K_{(r)}^{\textrm{Jac}}}(x,y)\ge 0\) for all \(r\in \mathbb {N}_0\) and all \(x,y\in [-1,1]\); see for example [28, §II.C.2–3].
Item (ix) follows immediately from item (viii). \(\square \)
Theorem 11
We have \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(\mathbb {R}[x_{J}]_{{\textbf{r}}})\subseteq \mathbb {R}[x_{J}]_{{\textbf{r}}}\), and if \(p(x)\ge 0\) on \([-1,1]^n\) then \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\ge 0\) on \([-1,1]^n\).
Also, we have:
-
P1.
If \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) satisfies \(p(x)\ge 0\) for all \(x\in [-1,1]^n\),
$$\begin{aligned}{\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\in \mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J}).\end{aligned}$$ -
P2.
Let \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) be a polynomial that satisfies \(0\le p(x)\le 1\) for all \(x\in [-1,1]^n\), and for all \(I=(i_1,\dots ,i_n)\in {\mathcal I_{p}}\), assume that \({\textbf{r}}=(r_1,\dots ,r_n)\) verifies
$$\begin{aligned} \frac{i_j^2}{(r_j+2)^2}\le \frac{1}{2\pi ^2n},\quad 1\le j\le n. \end{aligned}$$(9)Assume that
$$\begin{aligned} \varepsilon \ge 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{I\in {{\mathcal I_{p}}}}\left[ 2^{w(I)/2}\max _j\frac{i_j^2}{(r_j+2)^2}\right] >0. \end{aligned}$$Then
$$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )-(p+\varepsilon )\right\| _\infty \le \varepsilon . \end{aligned}$$
Proof
The first statement of the theorem corresponds to Lemma 10(i) and 10(ix). Property p2. follows from Lemma 10(vii).
Let us prove property p1. Take a finite subset \(\{z_i\}_{i}\) of \([-1,1]^n\) and a corresponding set of positive weights \(\{w_i\}_i\subset \mathbb {R}\) giving a quadrature rule for integration of polynomials \(q\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\), so that
Then we have, for p as in the statement of p1.,
with \(w_ip(z_i)\ge 0\). Since, by Lemma 10(viii) and Theorem 12 below, \({K_{{\textbf{r}}}^{\textrm{Jac}}}(z_i,x)\) is in \(\mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J})\), so is \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\). \(\square \)
Theorem 12
([8, Th. 10.3]) If \(p \in \mathbb {R}[y]\) is a univariate polynomial of degree d nonnegative on the interval \([a,b] \subset \mathbb {R}\), then
where \(\Sigma _d\) is the cone of sum-of-squareds of polynomials of degree at most d.
Corollary 13
If \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) satisfies \(0\le p(x)\le 1\) for all \(x\in [-1,1]^n\), then
for all multi-indices \({\textbf{r}}\) satisfying (9) and
Here, \({\textbf{r}}=(r_1,\dots ,r_n)\) and \(I=(i_1,\dots ,i_n)\).
Proof
By property p2. in Theorem 11,
Thus, \(({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )\ge 0\) on \([-1,1]^n\). By property p1. and Lemma 10(i),
\(\square \)
The rest of this section is devoted to results used in the proof of Theorem 11.
3 Sparse approximation theory
In this section, we prove a useful result, Lemma 15, that is crucial to our proof of Theorem 6. Given a positive polynomial \(f\ge \varepsilon >0\) for which we know there is an expression of the type \(f=f_1+\dots +f_\ell \), with each \(f_j\) depending only on the variables indexed by a subset \(J_j\subset \{1,\dots ,n\}\) (but with \(f_j\) not necessarily positive), the lemma gives us positive polynomials \(h_1,\dots , h_\ell \) such that \(f=h_1+\dots +h_\ell \) and \(h_j\) depends only on the variables indexed by the subset \(J_j\). To prove the lemma we need some preliminaries.
For \(1\le i\le n\) and a function \(f:[-1,1]^n\rightarrow \mathbb {R}\), let
Theorem 14
Let \(f\in C^0([-1,1]^n)\) be a Lipschitz function with variable-wise Lipschitz constants \({{\,\textrm{Lip}\,}}_1 f,\dots ,{{\,\textrm{Lip}\,}}_nf\). Then there is a constant \(C_{\textrm{Jac}}>0\) such that, for each multi-index \(\textbf{m}=(m_1,\dots ,m_n)\in \mathbb {N}^n\), there is a polynomial \(p\in \mathbb {R}[x]_{\textbf{m}}\) such that
and
The constant \(C_{\textrm{Jac}}\) does not depend on n, f, or \(\textbf{m}\).
Proof
Jackson [7, p. 2–6] proved that there is a constant \(C>0\) (independent of \(n,f,\textbf{m}\)) such that, if \(g:\mathbb {R}\rightarrow \mathbb {R}\) is Lipschitz and \(\pi \)-periodic, \(g(0)=g(\pi )\), then
For a multivariate Lipschitz function \(g:\mathbb {R}^n\rightarrow \mathbb {R}\) and a multi-index \(\textbf{m}=(m_1,\dots ,m_n)\in \mathbb {N}^n\), let
Then we have, using the triangle inequality and the single-variable inequality (10) at each step,
The function \(\prod _j(\sin m_j\theta _j/m_j\sin \theta _j)^4\) is a polynomial of degree \(m_j\) in \(\cos \theta _j\) (cf. [7, p. 3]). If we replace f with its Lipschitz extension to \([-2,2]^n\) and apply the results above to \(g(\theta )=f(2\cos \theta _1,\dots ,2\cos \theta _n)\) we get a polynomial \(L_n(g)(\theta )\) in \(\cos \theta _1,\dots ,\cos \theta _n\) satisfying the above inequality. Thus
is a polynomial with \({{\,\mathrm{\overline{\deg }}\,}}p\le \textbf{m}\) that satisfies (cf. [7, p. 13–14])
since \({{\,\textrm{Lip}\,}}_i g\le 2{{\,\textrm{Lip}\,}}_i f\). This proves the first statement, setting \(C_{\textrm{Jac}}=2C\). We also have
and, by linearity and monotonicity of \(L_n\),
whence
Lemma 15
(a version of [6, Lemma 3]) Let \(J_1,\dots , J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP). Suppose \(f=f_1+\dots +f_\ell \) with \(\ell \ge 2\), \(f_j\in \mathbb {R}[x_{J_j}]_{}\). Let \(\varepsilon >0\) be such that \(f\ge \varepsilon \) on \(S(\textbf{g})\subseteq [-1,1]^n\). Pick numbers \(\epsilon ,\eta >0\) so that
Set, for \(2\le l\le \ell \),
with \(\mathcal {J}_l\) as in (2), and \(D_{1,m}=D_{2,m}\).
Then \(f=h_1+\dots +h_\ell \) for some \(h_j\in \mathbb {R}[x_{J_j}]_{}\) with \(h_j\ge \eta \) on \(S(\textbf{g})\subseteq [-1,1]^n\) and
where \(\bar{D}_{j,m}\) is the multi-index whose k-th entry equals \(D_{j,m}\) if \(k\in \mathcal {J}_j=J_j\cap \bigcup _{k<j}J_i\) and 0 otherwise, and the maximum is taken entry-wise.
Additionally, if \({{\,\textrm{Lip}\,}}f\) denotes the Lipschitz constant of f on \([-1,1]^n\), then
Finally, we have
Remark 16
If \(S(\textbf{g})=[-1,1]^n\), we also have the obvious estimate \(\Vert h_j\Vert _\infty \le \Vert f\Vert _\infty \), that follows from \(0\le h_j\le f\).
Proof
In order to prove the result by induction, let us first consider the case \(\ell =2\). In this case, \(\varepsilon =\epsilon \) and \(\epsilon >2\eta \). Assume that \(J_1\cap J_2\ne \emptyset \). For a subset \(J\subset \{1,\dots ,n\}\), let \(\pi _J\) denote the projection onto the variables with indices in J, that is, \(\pi _J(x)=(x_i)_{i\in J}\in [-1,1]^J\) for \(x\in [-1,1]^n\).
Define \(g:[-1,1]^{J_1\cap J_2}\rightarrow \mathbb {R}\) by
The function g is Lipschitz continuous on \([-1,1]^{J_1\cap J_2}\). To see why, let \(x,x'\in [-1,1]^{J_1\cap J_2}\) and pick \(y,y'\in \pi _{J_1{{\setminus }}J_2}(S(\textbf{g}))\subseteq [-1,1]^{J_1{{\setminus }}J_2}\) minimizing \(f_2(x,y)\) and \(f_2(x',y')\), respectively. Then
where \({{\,\textrm{Lip}\,}}(f_2)\) denotes the Lipschitz constant of \(f_2\) on \([-1,1]^{n}\).
The function g also satisfies
on \(S(\textbf{g})\). The second inequality follows from the definition of g, and the first one can be shown taking \((x,y,z)\in S(\textbf{g})\) with \(x\in [-1,1]^{J_1\cap J_2}\), \(y\in [-1,1]^{J_1{\setminus } J_2}\), and \(z\in [-1,1]^{J_2{\setminus }J_1}\), taking care to pick y only after x has been chosen, in such a way that the minimum is in the definition of g is realized there, that is, \(g(x)=f_2(x,y)-\varepsilon /2\) holds (this is possible by compactness of \(S(\textbf{g})\) and continuity of f); then we have
For \(j\in J_1\cap J_2\), let
Set \(m_j=0\) for all other \(0\le j\le n\), and \(\textbf{m}=(m_1,\dots ,m_n)=\bar{D}_{2,2}\). Then Theorem 14 gives a polynomial \(p_2\) such that
Also,
Let
so that \(f=h_1+h_2\), \(h_1\ge \eta \) and \(h_2\ge \eta \) on \(S(\textbf{g})\), and \(h_j\in \mathbb {R}[x_{J_j}]_{}\).
The bound (12) follows from the definition of \(h_j\) and (13). Observe also that, by the last part of Theorem 14,
Finally, we have
so
For the induction step, let \(\ell \ge 3\) and set \(\tilde{f}=f_1+\dots +f_{\ell -1}-(\ell -2)(\epsilon -\eta )\), so that we have \(f-(\ell -2)(\epsilon -\eta )=\tilde{f}+f_\ell \ge \epsilon \) since \(f\ge \varepsilon =(\ell -1)\epsilon -(\ell -2)\eta \). The proof for the case \(\ell =2\) with \(\varepsilon =\varepsilon _{\ell -1}\) gives a polynomial \(p_\ell \in \mathbb {R}[x_{\mathcal {J}_\ell }]_{}\) such that
on \(S(\textbf{g})\), and with \({{\,\mathrm{\overline{\deg }}\,}}p_\ell =\bar{D}_{\ell ,\ell }\), \({{\,\textrm{Lip}\,}}p_\ell \le 2{{\,\textrm{Lip}\,}}f_\ell \).and, analogously to (14),
Write
where \(f_j'=f_j-p_\ell \) for the largest j with \(\mathcal {J}_\ell \subset J_j\) (which must happen for some j, by (RIP)) and \(f_k'=f_k\) for all other \(k\ne j\). Thus \(f'_j\in \mathbb {R}[x_{J_j}]_{}\),
The induction hypothesis applies to the polynomial
This means that there are polynomials \(h_1,\dots ,h_{\ell -1}\) such that
-
\(f_1'+\dots +f_{\ell -1}'=f_1+\dots +f_{\ell -1}-p_\ell =h_1+\dots +h_{\ell -1}\),
-
\(h_j\in \mathbb {R}[x_{J_j}]_{}\) for all \(1\le j\le \ell -1\),
-
\(h_j\ge \eta \) for all \(1\le j\le \ell -1\),
-
We have, for all \(1\le j\le \ell -1\),
$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}h_j&\le \max ({{\,\mathrm{\overline{\deg }}\,}}f'_j,\bar{D}_{j,\ell }\dots ,\bar{D}_{\ell -1,\ell })\\&\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,{{\,\mathrm{\overline{\deg }}\,}}p_\ell ,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell -1,\ell })\\&= \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell }). \end{aligned}$$Observe that the second index in each \(\bar{D}_{k,\ell }\) is \(\ell \) because of the accumulation of Lipschitz constants resulting from the estimate (16).
-
We have, for all \(1\le j\le \ell -1\), again because of (16),
$$\begin{aligned} {{\,\textrm{Lip}\,}}h_j\le 3\sum _{k=j}^\ell {{\,\textrm{Lip}\,}}f_k. \end{aligned}$$ -
We have, for all \(1\le j\le \ell -1\), using (15),
$$\begin{aligned}{} & {} \Vert h_j\Vert _\infty \le 3\times 2^{\ell -2}\sum _{k=1}^{\ell -1} \Vert f'_k\Vert _\infty \le 3\times 2^{\ell -2} \left( \sum _{k=1}^{\ell -1} \Vert f_k\Vert _\infty +\Vert p_\ell \Vert _\infty \right) \\{} & {} \le 3\times 2^{\ell -1}\sum _{k=1}^\ell \Vert f_k\Vert _\infty . \end{aligned}$$
Let
Then again \(f_1+\dots +f_\ell =h_1+\dots +h_\ell \), \(h_\ell \in \mathbb {R}[x_{J_\ell }]_{}\), \(h_\ell \ge \eta \) on \(S(\textbf{g})\), \({{\,\mathrm{\overline{\deg }}\,}}h_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_\ell ,\bar{D}_{\ell ,\ell })\), \({{\,\textrm{Lip}\,}}h_\ell \le {{\,\textrm{Lip}\,}}f_\ell +{{\,\textrm{Lip}\,}}p_\ell \le 3{{\,\textrm{Lip}\,}}f_\ell \), \(\Vert h_\ell \Vert _\infty \le \Vert f_\ell \Vert _\infty +\Vert p_\ell \Vert \le 3\Vert f_\ell \Vert \le 3\times 2^{\ell -1} \sum _{j=1}^\ell \Vert f_j\Vert _\infty \), so the lemma is proven. \(\square \)
4 Proofs
4.1 Proof of Theorem 6
Overview Theorem 6 follows from Theorem 17, which presents a more detailed bound, together with the definitions of \(\overline{L},M,\overline{J}\). To prove the latter theorem, we first use the sparse approximation theory developed in Sect. 3 to represent the sparse polynomial p as a sum of positive polynomials \(h_1+\dots +h_{\ell }\), each of them depending on a clique of variables \(J_j\). We then use Corollary 13 to see that each \(h_j\) belongs to the preordering.
Theorem 17
Let \(n>0\) and \(\ell \ge 2\), and let \({\textbf{r}}_1,{\textbf{r}}_2,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\), be nowhere-vanishing multi-indices. Let also \(J_1,\dots ,J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP). Let \(p=p_1+p_2+\dots +p_\ell \) be a polynomial that is the sum of finitely many polynomials \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\). Then if \(p\ge \varepsilon \) on \([-1,1]^n\), we have
as long as, for all \(1\le j\le \ell \) and all \(1\le i\le n\),
and
Proof of Theorem 17
Let
Apply Lemma 15 with \(\textbf{g}=0\), so that \(S(\textbf{g})=[-1,1]^n\). From the lemma, we get polynomials \(h_1,\dots ,h_\ell \) with
Here,
since \(\varepsilon _j-2\eta =\eta (\ell +2)/2\ell \). We also set
Thus
Apply Corollary 13 to each of the polynomials
to see that, for
(recall that \({\mathcal I_{H_j}}\) is the set of multiindices \(I=(i_1, \dots ,i_n)\) corresponding to exponents of \(x_1, \dots ,x_n\) in the terms appearing in \(H_j\) and \(w(I)\) is the number of nonzero entries in I) we have
when applying the corollary, note that (18) implies (9) in this case because, if \(I=(i_1,\dots ,i_n)\in {\mathcal I_{H_j}}\), then
by the definition of \(\bar{D}_l\). Observe that (21) means also that
Note that we have \({{\,\mathrm{\overline{\deg }}\,}}H_j={{\,\mathrm{\overline{\deg }}\,}}h_j\), \({\mathcal I_{H_j}}{\setminus }{\mathcal I_{h_j}}=\emptyset \), and \({\mathcal I_{h_j}}{\setminus }{\mathcal I_{H_j}}\subseteq \{(0,\dots ,0)\}\) since the powers of all terms in \(h_j\) and in \(H_j\) are the same, with the only possible exception of the constant term, which may appear in one of these and vanish in the other. Now, going back to our choice (19) of \(\eta \) and using (17), we have
for all \(j \in \{1,\ldots ,\ell \}\). Notice that after separating two of the \(|J_j|+2\) terms in the product and removing the \(+1\) factor from them, we obtain
where we have used the definition of \(\bar{D}_l\), as well as the fact that each factor has been replaced by one that is smaller or equal, the original expression containing the maximum of them on each factor. Next, use \({{\,\mathrm{\overline{\deg }}\,}}H_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_j,\dots , \bar{D}_\ell )\) as well as \(w(I)\le |J_j|\) for every multi-index I in \({\mathcal I_{H_j}}\), which is true because \(H_j\in \mathbb {R}[x_{J_j}]_{}\), yielding
since we have \(\max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j\le \Vert p\Vert _\infty \). With this bound for \(\eta \), together with the fact that \(\min _{[-1,1]^n}h_j\ge \eta \), we get
so that, by (20) and (22), \(h_j\in \mathcal P_{{\textbf{r}},J_j}(\{1-x_i^2\}_{i\in J_j})\) and hence
\(\square \)
4.2 Proof of Theorem 8
Overview For this proof, we first use the sparse approximation theory developed in Sect. 3 to represent the sparse polynomial p as a sum of positive polynomials \(h_1+\dots +h_{\ell }\), each of them depending on a clique of variables \(J_j\). We then work with each of these polynomials \(h_j\) using the tools developed by Baldi–Mourrain [2] to write \(h_j={\hat{f}}_j+\hat{q}_j\), where \(\hat{q}_j\) is by construction obviously an element of the corresponding quadratic module, and \({\hat{f}}_j\) is strictly positive on \([-1,1]^n\). Thus Corollary 13 can be applied to \({\hat{f}}_j\), which shows that it belongs to the preordering, and then one argues (also following the ideas of [2]) that the preordering is contained in the quadratic module, hence giving that \({\hat{f}}_j\) is contained in the latter as well. In sum, this shows that \(h_j\) is in the quadratic module, which is what want. Most of the heavy lifting goes to estimating the minimum of \({\hat{f}}_j\) to justify the application of Corollary 13.
Proof of Theorem 8
For each \(j=1,\dots , \ell \), pick \({\textbf{C}_{j}}>0\) such that the following two bounds are satisfied:
Note that these only depend on \(\textbf{g}\) and \(J_1,\dots ,J_\ell \).
Apply Lemma 15 to \(f=p\), \(f_i=p_i\), \(\epsilon =3\varepsilon /2\ell \), \(\eta =\varepsilon /2(\ell +2)\) to get polynomials \(h_1,\dots , h_\ell \) such that
and
In the dense setting, Baldi–Mourrain [2] construct a family of single-variable polynomials
providing useful approximation properties that we have adapted to the (separated-variables) sparse setting and collected in Lemma 18. To state this, we set, for all \(j=1,\dots ,\ell \) and for \((t_j,m_j)\in \mathbb {N}\times \mathbb {N}\) as well as for \(s_j>0\),
Let us give an idea of what these functions do. The single-variable polynomial \({\mathsf h}_{t_j,m_j}\) is of degree \(m_j\) and roughly speaking approximates the function that equals 1 on \((-\infty ,0)\) and \(1/{t_j}\) elsewhere. Thus \(q_{j,t_j,m_j}\) almost vanishes (for large \(t_j\)) on \(S(\textbf{g}_{K_j})\), and outside of this domain it is roughly a sum of multiples of the negative parts of \(\textbf{g}_{K_j}\)’s entries. The definition of \(f_{j,s_j,t_j,m_j}\) is engineered to obtain a polynomial that is almost equal to \(h_j\) in \(S(\textbf{g}_{K_j})\) yet remains positive throughout \([-1,1]^n\). Instead of going into the details of the construction, we record the properties we need in the following lemma. \(\square \)
Lemma 18
(a version of [2, Props. 2.13, 3.1, and 3.2, Lem. 3.5]) Assume (3) and the Archimedean conditions (4) are satisfied. Then for each \(j=1,\dots ,\ell \) there are values \(s_j,t_j, m_j\) of the parameters involved in Definition (27) and Definition (28), such that the following holds with the shorthands
-
i.
[2, Prop. 3.1] gives
$$\begin{aligned} {\hat{f}}_j(x)\ge \frac{1}{2}\min _{y\in S(\textbf{g}_{K_j})}h_j(y)\ge \frac{\eta }{2}=\frac{\varepsilon }{4(\ell +2)}\quad \text {for all}\quad x\in [1,1]^n. \end{aligned}$$ -
ii.
We have \(\hat{q}_j\in \mathcal Q_{\textbf{r},J_j}(\textbf{g})\) for all multi-indices \(\textbf{r}=(r_1,\dots ,r_n)\) with
$$\begin{aligned} r_i\ge (2m_{j}+ 1)\max _{k\in K_j}({{\,\mathrm{\overline{\deg }}\,}}g_k)_i. \end{aligned}$$ -
iii.
[2, eq. (20)] gives the existence of a constant \({C_m}>0\) such that
$$\begin{aligned} m_{j}\le {C_m}{\mathsf c}_j^{\frac{4}{3}}{\bar{k}}^{\frac{1}{3}}2^{4{\mathsf L}_j}( \deg h_j )^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}. \end{aligned}$$ -
iv.
[2, eq. (16)] gives the existence of a constant \({C_f}>0\) such that
$$\begin{aligned} \Vert {\hat{f}}_{j}\Vert _\infty \le {C_f}\Vert h_j\Vert _\infty 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j(\deg h_j)^{2{\mathsf L}_j}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-{\mathsf L}_j}. \end{aligned}$$ -
v.
[2, eq. (17)] gives the existence of a constant \({C_d}>0\) such that
$$\begin{aligned} \deg {\hat{f}}_{j}\le {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)(\deg h_j)^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}. \end{aligned}$$
Item (ii.) followsFootnote 5 from \(\deg {\mathsf h}_{t_j,m_j}=m_j\) and the definition of \(q_{j,t_j,m_j}\). The proofs of the other items can be found in the indicated sources.
Take \(s_j,t_j,m_j,{\hat{f}}_j, \hat{q}_j\) for \(j=1,\dots ,\ell \) satisfying the properties (i)–(v) collected in Lemma 18.
Continuing with the proof of Theorem 8, denote
Since \({\hat{f}}_{j}\ge \varepsilon /4(\ell +2)\) on \([-1,1]^n\), we may apply Corollary 13 with \(p=F_{j}\) to get that
as long as
and (9) are verified.
In this context, the condition (9) required in Corollary 13 is equivalent to the theorem’s assumption (6); let us show how this works: First, using \(i_k\le ({{\,\mathrm{\overline{\deg }}\,}}F_j)_k\), \({{\,\mathrm{\overline{\deg }}\,}}F_j\le {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_j\le (\deg {\hat{f}}_j)\textbf{1}\), Lemma 18(v.), we get
Now use Eq. (26) to get that this is
where we have also used the fact that
and the last estimate from (25). Next, use (11), \(|\mathcal {J}_j|\le |J_j|\), \(\varepsilon _i-2\eta _i=\frac{\ell +6}{2\ell (\ell +2)}\varepsilon \) to get
where we have additionally used Eq. (23) and our assumption (6); this is precisely (9).
We would next like to show that
Let us first explain why this will be enough to prove the theorem. Once we have (33), by Lemma 19\({\hat{f}}_j\) is also contained in \(\mathcal Q_{\textbf{r}_j+\textbf{2},J_j}(1-\sum _{i\in J_j}x_i^2)\), and it is our assumption (4) that \(\mathcal Q_{\textbf{r}_j+\textbf{2},J_j}(1-\sum _{i\in J_j}x_i^2)\subseteq \mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j})\). In other words, we have
By Lemma 18(ii.), \(\hat{q}_{j}\) also belongs to \(\mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j})\), so we can conclude that
which is equivalent to the conclusion of the theorem.
Thus we need to prove (33). Let us first show that
implies (33) Observe that (30) is equivalent to
If (34) were true, we would then have
So in view of (31) and (35), we would indeed have \({\hat{f}}_j\in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j})\), which is (33).
Let us now collect some preliminary estimates that will help us to prove (34). For \(I\in {\mathcal I_{F_j}}\) we have \(w(I)\le |J_j|\) so we estimate
We also estimate
Now we will estimate \( \max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}\) from above. Using Lemma 18(i.) and (iv.), we get
Use (25), (26) and (32) to see that this is
For the last line, we have used the definition of \(\bar{D}_{l,m}\) as in Lemma 15.
Additionally, we obtain the following estimate
The first inequality comes from (28), the second one from (26) and Lemma 18(ii.), the third one from Lemma 18(iii.), and the last one from (25), (32), and (26). Compare with Lemma 18(v.).
With those estimates under our belt, we now turn to showing that (34) is true. Using (38), (36), (37), as well as \({{\,\mathrm{\overline{\deg }}\,}}F_j\le {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_j\), we can start to estimate the right-hand side of (34) by
Next, denote
This will help us to reorganize and consolidate the terms. Use (39) to see that this is
Now use (11) as well as \(\varepsilon _i-2\eta _i=\frac{\ell +6}{2\ell (\ell +2)}\varepsilon \) to see that the above is bounded by
Finally use (24) and then (5) to get that the above is less than
This shows that (34) holds, and hence also (33), which proves the theorem. \(\square \)
Lemma 19
([2, Lemma 3.8]) Let \(J\subset \{1,\dots ,n\}\), and let \({\textbf{r}}=(r_1,\dots ,r_n)\) be a multi-index such that \(r_i>0\) only if \(i\in J\).
The quadratic module \(\mathcal Q_{{\textbf{r}}+\textbf{2},J}(1-\sum _{i\in J}x_i^2)\) contains the preordering \(\mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J})\),
Proof
This follows from
and
The increase of \(\textbf{2}\) in \({\textbf{r}}\) stems from the fact that \(\deg (1-x_i^2)=2\) while the degree of the right-hand side above is 4. \(\square \)
4.3 An asymptotic lemma
Lemma 20
For \(a,b,c,d,p,q>0\), with \(cq-ap\ne 0\), we have
In other words, as \(\varepsilon \searrow 0\),
Proof
Recall the Stirling series can be used [19, p. 293–294] to see that
for all large \(n>0\). We use it to get that,
Notice that the terms in both (42)—where the non-constant terms cancel out—and in (44) are asymptotically much smaller than the absolute value of the denominator, which tends to \(+\infty \). Line (43) is
As \(\varepsilon \searrow 0\), the factors inside the logarithm tend to 1, so the numerator tends to 0, while the denominator tends to \(\pm \infty \), and the quotient tends to 0. Let us now show that the remaining two lines (40)–(41) together tend to 1 in the limit. Now,
so we get
In this quotient, both the numerator and the denominator tend to \(\pm \infty \), so we can apply a version of the l’Hôpital rule, which states that, if the limit of the quotient of their derivatives exists, then the original limit above equals that limit. Taking the limit of the quotient of the derivatives gives
\(\square \)
Notes
This means that there are \(R>0\) and \(\sigma _j\in \Sigma [x]\) such that \(R-\Vert x\Vert ^2=\sum _{j}\sigma _j g_j(x)\).
Apart from assumptions of scale (3) and an Archimedean condition (4), for this simplified version of the results we have for simplicity taken \({\mathsf L}_j=1\) and \({\mathsf c}_j=1\) in the statement of Theorem 8. With these values, the exponent of \(\varepsilon \) in (5) is \((238+55|J_j|)/9\), which we estimate from below with the simpler expression \(26+6|J_j|\) to derive the statement of Theorem 2.
This is a version of the Łojasiewicz inequality, and its validity (with appropriate constants \({\mathsf c}_j,{\mathsf L}_j\)) for semialgebraic functions is justified in [2, Thm. 2.3] and the papers cited therein.
With \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\), the exponent of \(\varepsilon \) in (5) is \((238+55|J_j|)/9\), which we estimate from below with the simpler expression \(26+6|J_j|\) to derive the exponent of \(\varepsilon \) here.
This calculation is slightly different to the one in [2, Lem. 3.5] because the definition of \(q_{j,t_j,m_j}\) (or in their notations, \(f-p\)) differs from the one given there in that the functions \({\mathsf h}_j\) are squared here, an idea we take from the exposition of the results of [2] in the dissertation of L. Baldi and that is advantageous because then \(q_{j,t_j,m_j}\in \mathcal Q_{{\textbf{r}},J_j}(\textbf{g})\) automatically. This requires taking \(m_j\) twice as large, and we absorb this difference into the constant \({C_m}\).
References
Bach, F., Rudi, A.: Exponential convergence of sum-of-squares hierarchies for trigonometric polynomials (2022). arXiv:2211.04889
Baldi, L., Mourrain, B.: On the effective Putinar’s Positivstellensatz and moment approximation. Math. Program. 200, 71–103 (2023)
Doherty, A.C, Wehner, S.: Convergence of SDP hierarchies for polynomial optimization on the hypersphere (2012). arXiv:1210.5048
Fang, K., Fawzi, H.: The sum-of-squares hierarchy on the sphere and applications in quantum information theory. Math. Program. 190(1), 331–360 (2021)
Gribling, S., Polak, S., Slot, L.: A note on the computational complexity of the moment-SOS hierarchy for polynomial optimization. In Proceedings of the 2023 International Symposium on Symbolic and Algebraic Computation, ISSAC ’23, pp. 280–288. ACM, New York, NY, USA (2023)
Grimm, D., Netzer, T., Schweighofer, M.: A note on the representation of positive polynomials with structured sparsity. Arch. Math. 89(5), 399–403 (2007)
Jackson, D.: The Theory of Approximation, volume 11 of Colloquium Publications. American Mathematical Society, Providence (1930)
Karlin, S., Shapley, L.S.: Geometry of Moment Spaces, volume 12. American Mathematical Society, Providence (1953)
Kirschner, F., De Klerk, E.: Convergence rates of RLT and Lasserre-type hierarchies for the generalized moment problem over the simplex and the sphere. Optim. Lett. 16, 2191–2208 (2022)
Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J. Optim. 11(3), 796–817 (2001)
Lasserre, J.B.: Convergent SDP-relaxations in polynomial optimization with sparsity. SIAM J. Optim. 17(3), 822–843 (2006)
Lasserre, J.B.: Moments, Positive Polynomials and Their Applications, vol. 1. World Scientific, Singapore (2009)
Laurent, M.: Sums of squares, moment matrices and optimization over polynomials. In: Putinar, M., Sullivant, S. (eds.) Emerging Applications of Algebraic Geometry, pp. 157–270. Springer, Berlin (2009)
Laurent, M., Slot, L.: An effective version of Schmüdgen’s Positivstellensatz for the hypercube. Optim. Lett. 17, 515–530 (2023)
Magron, V., Wang, J.: Sparse Polynomial Optimization: Theory and Practice. Series on Optimization and Its Applications. World Scientific Press, Singapore (2023)
Mai, N.H.A., Magron, V.: On the complexity of Putinar–Vasilescu’s Positivstellensatz. J. Complex. 72, 101663 (2022)
Mai, N.H.A., Magron, V., Lasserre, J.: A sparse version of Reznick’s Positivstellensatz. Math. Oper. Res. 48(2), 812–833 (2022)
Nie, J., Schweighofer, M.: On the complexity of Putinar’s Positivstellensatz. J. Complex. 23(1), 135–150 (2007)
Olver, F.: Asymptotics and Special Functions. CRC Press, Boca Raton (1997)
Parrilo, P.A.: Semidefinite programming relaxations for semialgebraic problems. Math. Program. 96(2), 293–320 (2003)
Putinar, M.: Positive polynomials on compact semi-algebraic sets. Indiana Univ. Math. J. 42(3), 969–984 (1993)
Reznick, B.: Uniform denominators in Hilbert’s seventeenth problem. Math. Z. 220(1), 75–97 (1995)
Schmüdgen, K.: The K-moment problem for compact semi-algebraic sets. Math. Ann. 289(1), 203–206 (1991)
Schweighofer, M.: On the complexity of Schmüdgen’s Positivstellensatz. J. Complex. 20(4), 529–543 (2004)
Slot, L.: Sum-of-squares hierarchies for polynomial optimization and the Christoffel–Darboux kernel. SIAM J. Optim. 32(4), 2612–2635 (2022)
Slot, L., Laurent, M.: Sum-of-squares hierarchies for binary polynomial optimization. Math. Program. 197, 621–660 (2022)
Waki, H., Kim, S., Kojima, M., Muramatsu, M.: Sums of squares and semidefinite program relaxations for polynomial optimization problems with structured sparsity. SIAM J. Optim. 17(1), 218–242 (2006)
Weiße, A., Wellein, G., Alvermann, A., Fehske, H.: The kernel polynomial method. Rev. Mod. Phys. 78(1), 275 (2006)
Acknowledgements
This work has been supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Actions, grant agreement 813211 (POEMA), by the AI Interdisciplinary Institute ANITI funding, through the French “Investing for the Future PIA3” program under the Grant agreement n\(^\circ \) ANR-19-PI3A-0004 as well as by the National Research Foundation, Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme. This work was also co-funded by the European Union under the project ROBOPROX (reg. no. CZ.02.01.01/00/22_008/0004590). R. Ríos-Zertuche was also partially funded by UiT Aurora Center MASCOT.
Funding
Open access funding provided by UiT The Arctic University of Norway (incl University Hospital of North Norway).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no interests to delcare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Korda, M., Magron, V. & Ríos-Zertuche, R. Convergence rates for sums-of-squares hierarchies with correlative sparsity. Math. Program. (2024). https://doi.org/10.1007/s10107-024-02071-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10107-024-02071-6