1 Introduction

This work provides rates of convergence for the sums-of-squares hierarchy with correlative sparsity. For a positive \(n\in \mathbb {N}\), consider the polynomial optimization problem

$$\begin{aligned} f_{\min {}} {:}{=}y\min _{x\in S(\textbf{g})}f(x) \end{aligned}$$

where f is an element of the ring \(\mathbb {R}[x]\) of polynomials in \(x=(x_1,\dots ,x_n)\), and \(S(\textbf{g})\) is a basic compact semialgebraic set determined by a finite collection of polynomials \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\) by \(S(\textbf{g})=\{x\in \mathbb {R}^n:g_i(x)\ge 0,\;i=1,\dots ,{\bar{k}}\}\). An approach to attack this problem, first proposed by Lasserre [10] and Parrilo [20], is as follows: Imagine we knew that \(f(x)-\lambda \) could be written as

$$\begin{aligned} f(x)-\lambda =\sum _{j=0}^{\bar{k}}\sigma _j g_j(x)\quad \text {or}\quad f(x)-\lambda =\sum _{J\subseteq \{1,\dots ,{\bar{k}}\}} \sigma _J \prod _{j\in J}g_j(x), \end{aligned}$$

with \(g_0(x)=1\) and \(\sigma _j\) and \(\sigma _J\) being sum-of-squares (SOS) polynomials. Then the right-hand sides of each of these equations would be clearly nonnegative on \(S(\textbf{g})\), so we would know that \(f_{\min }\ge \lambda \). By bounding the degree of the SOS polynomials, we obtain the following two hierarchies of lower bounds:

$$\begin{aligned} \textrm{lb}_q(f,r)&\textstyle =\max \left\{ \lambda \in \mathbb {R}:f-\lambda =\sum \limits _{j=0}^{\bar{k}}\sigma _j g_j, \textstyle \deg ( \sigma _j g_j)\le 2r,\;\sigma _j\in \Sigma [x]\right\} ,\\ \textrm{lb}_p(f,r)&=\textstyle \max \left\{ \lambda \in \mathbb {R}:f-\lambda = \sum \limits _{J\subseteq \{1,\dots ,{\bar{k}}\}} \sigma _J\prod _{j\in J}g_j, \textstyle \deg \left( \sigma _J\prod _{j\in J}g_j\right) \le 2r,\; \sigma _J\in \Sigma [x]\right\} , \end{aligned}$$

where \(\Sigma [x]\) is the convex cone of all sum-of-squares polynomials. These satisfy \(\textrm{lb}_q(f,r)\le \textrm{lb}_p(f,r)\le f_{\min }\). The lower bound \( \textrm{lb}_q(f,r)\) is associated to a so-called quadratic module certificate, while \( \textrm{lb}_p(f,r)\) corresponds to a preordering certificate; this terminology is justified by the definitions in Sect. 1.2. The well-known Putinar and Schmüdgen Positivstellensätze [21, 23], respectively, guarantee that these bounds converge to \(f_{\min }\) as \(r\rightarrow +\infty \), the former with the additional assumption that the associated quadratic module be Archimedian.Footnote 1 Here we prove sparse quantitative versions of these results.

Polynomial optimization schemes have generated substantial interest due to their abundant fields of application; see for example [12, 13]. The first proof of convergence, without a convergence rate, was given by Lasserre [10] using the Archimedian Positivstellensatz due to Putinar [21]. Eventually, rates of convergence were obtained; initially in [18] these were logarithmic in the degree of the polynomials involved, and later on they were improved [2, 4, 14, 25] (using ideas of [3, 22]) to polynomial rates; refer to Table 1. The crux of the argument used to obtain those rates is a bound of the deformation incurred by a polynomial strictly-positive on the domain of interest, as it passes through an integral operator that closely approximates the identity and is associated to a strictly-positive polynomial kernel that is itself composed of sums of squares and similar to the Christoffel–Darboux and Jackson kernels (see Definition 10). More recently, results showing an exponential bound on the convergence rate was obtained in [1] with the additional assumption on the positive definiteness of the Hessian at all global minimizers.

The techniques used to obtain these results generally involve linear operators on the space of polynomials (mostly Christoffel–Darboux kernel operators; see [25]) that are close to the identity and that, for positive polynomials, are easily (usually, by construction) proved to output polynomials that are sums of squares and/or of their products with the functions in \(\textbf{g}\). All of these results deal, however, with the dense case.

Table 1 Known results on the asymptotic error of Lasserre’s hierarchies of lower bounds; based in part on [25, Table 1]

In this work, we treat the case where the problem possesses the so-called correlative sparsity, where each function \(g_i\) depends only on a certain subset of variables and the function f decomposes as a sum of functions depending only on these subsets of variables. This structure can be exploited in order to define sparse lower bounds that are cheaper to compute but possibly weaker. Nevertheless, these sparse lower bounds allow one to tackle large-scale polynomial optimization problems arising from various applications including roundoff error bounds in computer arithmetic, quantum correlations and robustness certification of deep networks; see the recent survey [15]. In [11] Lasserre proved that these sparse lower bounds converge as the degree of the SOS multipliers tends to infinity provided the variable groups satisfy the so-called running intersection property (see Definition 1). A shorter and more direct proof was provided in [6], and was adapted in [17] to obtain a sparse variant of Reznick’s Positivstellensatz. In this work, we show polynomial rates of convergence for sparse hierarchies based on both Schmüdgen and Putinar Positivstellensätze. Importantly, we obtain rates that depend only on the size of the largest clique in the sparsity graph rather than the overall ambient dimension. This allows the perhaps surprising conclusion that, asymptotically, the sparse hierarchy is more accurate than the dense hierarchy for a given computation time of an optimization method, provided that the size of the largest clique is no more than the square root of the ambient dimension. This assumes that the running time of the optimization method is governed by the size of the largest PSD block and the number of such blocks in the semidefinite programming reformulations of the dense and sparse SOS problems which is the case for the interior point method as well as the most commonly used first-order methods.

To the best of our knowledge, these are the first quantitative results of this kind. Our proof techniques rely on an adaption of [6] and utilize heavily the recent results from [2, 14], and can thus be seen as a generalization of these works to the sparse setting.

Since our results are very technical and their full statement necessitates the introduction of a lot of notation, we have prepared a rough summary to help the reader get a glance at what they are, presented next. We urge the reader to mind the fact that we have not fully spelled-out all the details and definitions, for which we refer to the next section.

Definition 1

A collection \(\{J_1,\dots ,J_\ell \}\) of subsets of \(\{1,\dots ,n\}\supset J_j\) satisfies the running intersection property if for all \(1\le k\le \ell -1\) we have

$$\begin{aligned} J_{k+1}\cap \bigcup _{j=1}^kJ_j\subset J_s\quad \mathrm{for\, some} \,s\le k. \end{aligned}$$
(RIP)

Theorem 2

(Rough summary of results) Let:

  • \(n>0\), \(\ell >0\), and \({\bar{k}}\) be positive integers,

  • \({\textbf{r}}_1,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\) be some multi-indices with \(r_{j,i}\ge 1\),

  • \(J_1,\dots ,J_\ell \subset \{1,\dots ,n\}\) be sets of indices satisfying (RIP),

  • \(p_1,\dots ,p_\ell \) be polynomials such that \(p_j\) depends only on the variables \(x_i\) with \(i\in J_j\), and its degree in the variable \(x_i\) is \(\le r_{j,i}\),

  • \(p=p_1+p_2+\dots +p_\ell \) be a polynomial that this a sum of the polynomials \(p_1,\dots ,p_\ell \),

We will denote by \(|J_j|\) the cardinality of the set \(J_j\).

  1. i.

    (Schmüdgen-type, Theorem 6) Assume that, for large-enough \(c_1>0\) (determined explicitly in the statement of Theorem 6 and depending only on n and \(J_1,\dots , J_\ell \)),

    $$\begin{aligned} p(x)\ge c_1\frac{\Vert p\Vert _{L^\infty ([-1,1]^n)}}{r_{j,i}^{\frac{2}{3+\max _j|J_j|}}} \end{aligned}$$
    (1)

    for all \(1\le i\le n\), \(1\le j\le \ell \), and \(x\in [-1,1]^n\). Then p can be written as a sum \(p=h_1+\dots +h_\ell \) of polynomials \(h_j\) that belong to the respective preorderings generated by the polynomials \(1-x_i^2\) with \(i\in J_j\) and only depend on the variables \(x_i\) with \(i\in J_j\). This means that there are polynomials \(\sigma _{j,K}\) that are sums of squares, that depend only on \(x_i\) for \(i\in J_j\), and such that

    $$\begin{aligned} h_j=\sum _{K\subset J_j}\sigma _{j,K}\prod _{m \in K}(1-x_m^2). \end{aligned}$$

    The sum is taken over all (possibly empty) subsets K of \(J_j\); the product on the right is understood to be 1 when \(K=\emptyset \). Moreover, the degree of each term \(\sigma _{j,K}\prod _{m\in J_j}(1-x_m^2)\) in the variable \(x_i\) is bounded by \(r_{j,i}\).

  2. ii.

    (Putinar-type, Theorem 8) Additionally we let

    • \(K_1,\dots ,K_\ell \subset \{1,\dots ,{\bar{k}}\}\) be sets of indices,

    • \(g_1,\dots ,g_{\bar{k}}\) be polynomials such that, if \(m\in K_j\) then \(g_m\) depends only on the variables \(x_i\) with \(i\in J_j\), and satisfying some additional technical assumptions.Footnote 2

    Now, instead of assuming (1), we assume that, for large-enough \(c_2,c_3>0\) (determined more-or-less explicitly in the statement of Theorem 8 and depending only on n, \(J_1,\dots ,J_\ell \), \(g_1,\dots ,g_{{\bar{k}}}\)),

    $$\begin{aligned} p(x)\ge c_2\frac{(\Vert p\Vert _{L^\infty (S(\textbf{g}))}\deg p_j \sum _i{{\,\textrm{Lip}\,}}p_i)^{c_3}}{r_{j,i}^{\frac{1}{13+3|J_j|}}}, \end{aligned}$$

    for x in the set \(S(\textbf{g})=\{x\in \mathbb {R}^n:g_j(x)\ge 0,\;j=1,\dots ,{\bar{k}}\}\), and for all \(1\le i\le n\), \(1\le j\le \ell \). Then p can be written as a sum \(p=h_1+\dots +h_\ell \) of polynomials \(h_j\) that belong to the respective quadratic modules generated by the polynomials \((g_i)_{i\in K_j}\) and only depend on the variables \((x_i)_{i\in J_j}\). This means that there are polynomials \(\sigma _{j,0}\) and \(\sigma _{j,k}\) that are sums of squares, that depend only on \(x_i\) for \(i\in J_j\), and such that

    $$\begin{aligned} h_j=\sigma _{j,0}+\sum _{k\in K_j}\sigma _{j,k}g_k. \end{aligned}$$

    Moreover, the degree of each term \(\sigma _{j,0}\) and \(\sigma _{j,k}g_k\) in the variable \(x_i\) is bounded by \(r_{j,i}+2\).

Although the exponents of \(r_{j,i}\) in the above statement are often much smaller than 2, hence making the rates slower than those that had been obtained (see Table 1) for the dense case, we have also analyzed the complexity involved in solving the corresponding optimization problems, showing that the sparse hierarchies may outperform the dense ones in certain situations.

We proceed to summarize this analysis. When using the sums-of-squares hierarchies, we use, as a proxy for the complexity necessary to obtain a certificate that a lower bound of the minimum of a polynomial p has been found to \(\varepsilon >0\) accuracy, the size and number of the positive-semidefinite (PSD) blocks in the corresponding semidefinite program (SDP).

We denote:

  • \(B_{\textrm{dense}}(\varepsilon )\) the size of PSD block in the dense case, which is equal to the number \(\left( {\begin{array}{c}n+r\\ r\end{array}}\right) \) of monomials of degree r in n variables, and in our argument we take r to be of the order of \(O(\varepsilon ^{-1/2})\) as in the best results listed in Table 1,

  • \(B_{\textrm{sparseSchm}}(\varepsilon )\) the size of the largest PSD block in the sparse case of Theorem 2i. (i.e., optimizing over \([-1,1]^n\) using a Schmüdgen-style scheme), multiplied by the number \(\ell \) of blocks,

  • \(B_{\textrm{sparsePut}}(\varepsilon )\) the size of the largest PSD block in the sparse case of Theorem 2ii. (i.e., optimizing over \(S(\textbf{g})\) using a Putinar-style scheme), multiplied by the number \(\ell \) of blocks.

The quotients

$$\begin{aligned} \frac{B_{\textrm{sparseSchm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}\qquad \text {and}\qquad \frac{B_{\textrm{sparsePut}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )} \end{aligned}$$

being \(<1\) is thus indicative of our rates being better than the dense ones.

Proposition 3

(Summary of SDP size results) With some technical assumptions, we have:

  1. 1.

    (Proposition 7) If \(n>|J_j|^2+3|J_j|\) for all \(1\le j\le \ell \), then, for minimizing p over \([-1,1]^n\), we have

    $$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparseSchm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=0. \end{aligned}$$
  2. 2.

    (Proposition 9) If \(n>6|J_j|^2+26|J_j|\) for all \(1\le j\le \ell \), then, for minimizing p over \(S(\textbf{g})\), we have

    $$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparsePut}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=0. \end{aligned}$$

In other words, our rates improve on those already found for the dense case as long as n is sufficiently large with respect to the sizes \(|J_j|\) of the blocks of variables indexed by the sets \(J_1,\dots , J_\ell \), and \(\varepsilon >0\) is sufficiently small.

Let us turn to some concrete examples. The functions in our examples were considered before in [27, §6.1], where sums-of-squares algorithms leveraging sparsity were benchmarked on them.

Example 4

Consider the chain singular function for \(x\in [-1,1]^{n}\):

$$\begin{aligned} f_{\textrm{cs}}(x)=\sum _{i\in P}\left( (x_i+10x_{i+1})^2+5(x_{i+2}-x_{i+3})^2+(x_{i+1}-2x_{i+2})^4+10(x_i-10x_{i+3})^4\right) , \end{aligned}$$

where \(P=\{1,3,5,\dots ,n-3\}\) and n is a multiple of 4. Then we can take \(J_j=\{j,j+1,j+2,j+3\}\) for \(j=1,\dots , n-3\), so that these sets satisfy (RIP) and \(|J_j|=4\). The proofs of Propositions 7 and 9 show that in this case we have, for large-enough n and as \(\varepsilon \searrow 0\),

$$\begin{aligned} \frac{B_{\textrm{sparseSchm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=O(\varepsilon ^{\frac{n}{2}-14}) \qquad \text {and}\qquad \frac{B_{\textrm{sparsePut}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=O(\varepsilon ^{\frac{n}{2}-62}). \end{aligned}$$

Example 5

Consider the Broyden banded function, defined for each \(n\in \mathbb {N}\) by

$$\begin{aligned} f_{\textrm{Bb}}(x)=\sum _{i=1}^n\left( x_i(2+5x_i^2)+1-\sum _{j\in P_i}(1+x_j)x_j\right) ^2, \end{aligned}$$

where \(P_i=\{j:j\ne i,\;\max (1,i-5)\le j\le \min (n,i+1)\}\), on the box \([-1,1]^{n}\). Then \(J_i=P_i\cup \{i\}\), and \(|J_i|\le 7\), and these sets satisfy (RIP). The proofs of Propositions 7 and 9 show that in this case we have, for large-enough n and as \(\varepsilon \searrow 0\),

$$\begin{aligned} \frac{B_{\textrm{sparseSchm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=O(\varepsilon ^{\frac{n}{2}-35}) \qquad \text {and}\qquad \frac{B_{\textrm{sparsePut}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=O(\varepsilon ^{\frac{n}{2}-143}). \end{aligned}$$

The paper is organized as follows. The results are detailed below in Sect. 1.2 and further discussed in Sect. 1.2.1, after a brief interlude to establish some notations in Sect. 1.1. Some machinery is developed in Sects. 2 and 3, regarding variants of the Jackson kernel and some approximation theory, respectively, and the proofs of the main theorems are presented in Sect. 4.

1.1 Notation

Denote by \(\mathbb {R}\) the set of real numbers, by \(\mathbb {N}\) the set of positive integers, and by \(\mathbb {N}_0=\{0,1,\dots \}\) the set of nonnegative integers. Denote by \(e_1,\dots ,e_n\) the vectors of the standard basis of Euclidean space \(\mathbb {R}^n\).

For a Lipschitz continuous function \(f:[-1,1]^n\rightarrow \mathbb {R}\), we set

$$\begin{aligned} {{\,\textrm{Lip}\,}}f=\max \left( 1,\sup _{x,y\in [-1,1]^n}\frac{|f(x)-f(y)|}{\Vert x-y\Vert }\right) . \end{aligned}$$

We take this to be at least 1 to simplify estimates below.

A multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) is an n-tuple of nonnegative integers \(i_k\), and its weight is denoted by

$$\begin{aligned} |I|=\sum _{k=1}^n i_k. \end{aligned}$$

For a multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) and \(J\subset \{1,\dots ,n\}\), we write \(I\subseteq J\) to indicate that the index k of every nonzero entry \(i_k>0\) is contained in J, that is, \(k\in J\) for all \(1\le k\le n\) such that \(i_k>0\).

Similarly, given a multi-index \(I=(i_1,\dots ,i_n)\in \mathbb {N}_0^n\) and a subset \(J\subseteq \mathbb {N}_0\), we let \(I_{J}\) be the multi-index whose k-th entry is either \(i_k\) if \(k\in J\) or 0 if \(k\notin J\).

For two multi-indices \(I\) and \(I'\), we write \(I\le I'\) if the entrywise inequalities \(i_k\le i'_k\) hold for all \({1}\le k\le n\).

We distinguish two special multi-indices:

$$\begin{aligned} \textbf{1}=(1,1,\dots ,1)\qquad \text {and}\qquad \textbf{2}=(2,2,\dots ,2). \end{aligned}$$

We denote \(x^I=x_1^{i_1}x_2^{i_2}\dots x_n^{i_n}\). Also, we denote the Hamming weight of \(I\in \mathbb {N}_0^n\) by

$$\begin{aligned} w(I)=\#\{k:i_k>0,\;1\le k\le n\}. \end{aligned}$$

In other words, \(w(I)\) is the number of nonzero entries in I.

We denote the space of polynomials in n variables by \(\mathbb {R}[x]_{}\), and within this set we distinguish the subspace \(\mathbb {R}[x]_{d}\) of polynomials of total degree at most d. We denote, for a polynomial \(p(x)=\sum _{I}c_Ix^I\), by \({{\,\mathrm{\overline{\deg }}\,}}p\) the vector whose i-th entry is the degree of p in \(x_i\),

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}p=\left( \max _{c_I\ne 0}i_1,\max _{c_I\ne 0}i_2,\dots ,\max _{c_I\ne 0}i_n \right) . \end{aligned}$$

Observe that \(\deg p\le \left| {{\,\mathrm{\overline{\deg }}\,}}p\right| = \sum _{k=1}^n \max _{c_I \ne 0} i_k\).

Set also

$$\begin{aligned} {{\mathcal I_{p}}}=\{I\in \mathbb {N}_0^n:c_I\ne 0\}. \end{aligned}$$

Given a subset \(J\subset \{1,\dots ,n\}\), we let \(\mathbb {R}[x_{J}]_{}\) denote the set of polynomials in the variables \(\{x_j\}_{j\in J}\). For a multi-index \({\textbf{r}}=(r_1,\dots ,r_n)\in \mathbb {N}_0^n\), we let \(\mathbb {R}[x_{}]_{{\textbf{r}}}\) denote the set of polynomials p such that, if \(p(x)=\sum _{I}c_Ix^I\) for some real numbers \(c_I\in \mathbb {R}\), then for each \(I=(i_1,\dots ,i_n)\) with \(c_I\ne 0\) we also have \(i_k\le r_k\) for \(1\le k\le n\). Finally, we let \(\mathbb {R}[x_{J}]_{{\textbf{r}}}=\mathbb {R}[x_{J}]_{}\cap \mathbb {R}[x_{}]_{{\textbf{r}}}\); in other words, \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\) is the set of polynomials p with \({{\,\mathrm{\overline{\deg }}\,}}p\le {\textbf{r}}\) in the variables \(\{x_j:j\in J\}\subseteq \{x_1,\dots ,x_n\}\).

Given a set X, we write \(X^n\) to denote the product

$$\begin{aligned} X^n=\underbrace{X\times X\times \dots \times X}_n. \end{aligned}$$

We denote by \(\Vert \cdot \Vert _\infty \) the supremum norm on \([-1,1]^n\).

The notation \(\lceil s\rceil \) stands for the least integer \(\ge s\).

1.2 Results

Let \({\Sigma [x_{J}]}\) denote the set of polynomials p that are sums of squares of polynomials in \(\mathbb {R}[x_{J}]_{}\), that is, of the form \(p=p_1^2+\dots +p_\ell ^2\) for \(p_1,\dots ,p_\ell \in \mathbb {R}[x_{J}]_{}\).

Let \({\bar{k}}\in \mathbb {N}\) and let \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\) be a collection of polynomials \(g_i\in \mathbb {R}[x]_{}\) defining a set

$$\begin{aligned} S(\textbf{g})=\{x\in \mathbb {R}^n:g_i(x)\ge 0,\;i=1,\dots ,{\bar{k}}\}. \end{aligned}$$

For convenience, denote also \(g_0=1\). To the collection \(\textbf{g}\) and a multi-index \({\textbf{r}}\), we associate the (variable- and degree-wise truncated) quadratic module associated to the collection \(\textbf{g}\) and a multi-index \({\textbf{r}}\) be

$$\begin{aligned} \mathcal Q_{{\textbf{r}},J}(\textbf{g})=\left\{ \sum _{i=0}^{{\bar{k}}}\sigma _ig_i:\sigma _i\in {\Sigma [x_{J}]},{{\,\mathrm{\overline{\deg }}\,}}(\sigma _ig_i)\le {\textbf{r}}\right\} . \end{aligned}$$

Similarly, we have a (variable- and degree-wise truncated) preordering

$$\begin{aligned} \mathcal P_{{\textbf{r}},J}(\textbf{g})= & {} \mathcal Q_{{\textbf{r}},J}(\{g_K:K\subseteq \{1,\dots ,{\bar{k}}\}\})\\= & {} \left\{ \sum _{K\subseteq \{1,\dots ,{\bar{k}}\}}\sigma _{K}g_{K}:\sigma _{K}\in {\Sigma [x_{J}]},\;{{\,\mathrm{\overline{\deg }}\,}}(\sigma _{K}g_{K})\le {\textbf{r}}\right\} , \end{aligned}$$

where

$$\begin{aligned} g_K=\prod _{i\in K}g_i. \end{aligned}$$

Denote, for \(j=2,3,\dots ,\ell \),

$$\begin{aligned} \mathcal {J}_j=J_j\cap \bigcup _{k<j}J_k. \end{aligned}$$
(2)

Then (RIP) is the condition that, for all \(1\le k\le \ell -1\), there is some \(s\le k\) such that \(\mathcal {J}_{k+1}\subset J_s\).

1.2.1 Sparse Schmüdgen-type representation on \([-1,1]^n\)

Let \({\overline{L}}:=\sum _{k=1}^\ell {{\,\textrm{Lip}\,}}p_k\), \(M:=\max _{\begin{array}{c} 1\le k\le \ell \\ 1\le m\le n \end{array}}({{\,\mathrm{\overline{\deg }}\,}}p_k)_m\), and \({\overline{J}}:=\max _{1\le k\le \ell }|J_k|\).

Theorem 6

Let \(n>0\) and \(\ell \ge 2\), and let \({\textbf{r}}_1,{\textbf{r}}_2,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\),

be nowhere-vanishing multi-indices.

Let also \(J_1,\dots ,J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP).

Let \(p=p_1+p_2+\dots +p_\ell \) be a polynomial that is the sum of finitely many polynomials \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\). Then if \(p\ge \varepsilon \) on \([-1,1]^n\), we have

$$\begin{aligned} p\in \mathcal P_{{\textbf{r}}_1,J_1}(\{1-x_i^2\}_{i\in J_1})+\dots +\mathcal P_{{\textbf{r}}_\ell ,J_\ell }(\{1-x_i^2\}_{i\in J_\ell }) \end{aligned}$$

as long as, for all \(1\le j\le \ell \) and all \(1\le i\le n\),

$$\begin{aligned} r_{j,i}^2\ge \frac{2^{{\overline{J}}+3}(\ell +2)n\pi ^2\Vert p\Vert _\infty }{\varepsilon }\displaystyle \left( \max \left( M,4C_{\textrm{Jac}}(\ell +2)\frac{{\overline{J}}\,{\overline{L}}}{\varepsilon }\right) +2\right) ^{{\overline{J}}+2} . \end{aligned}$$

For small enough \(0<\varepsilon <4C_{\textrm{Jac}}(\ell +2){\overline{J}}\,{\overline{L}}/M\), this boils down to

$$\begin{aligned} r_{j,i}\ge \sqrt{\frac{A\,\Vert p\Vert _\infty }{\varepsilon ^{{\overline{J}}+3}}}= O(\varepsilon ^{-\frac{{\overline{J}}+3}{2}}), \end{aligned}$$

with

$$\begin{aligned} A=n\pi ^2(4C_{\textrm{Jac}}{\overline{J}}\,{\overline{L}}+2)^{{\overline{J}}+2}(2(\ell +2))^{{\overline{J}}+3}. \end{aligned}$$

The proof is presented in Sect. 4.1.

Discussion

Solving the dense problem considered by [14] using the sum-of-squares hierarchy reduces to a semidefinite program with the largest PSD block of size \(\left( {\begin{array}{c}n+r\\ r\end{array}}\right) \) that typical optimization methods (e.g., interior point or first order) can solve in an amount of time proportional to a power of

$$\begin{aligned} \left( {\begin{array}{c}n+r\\ r\end{array}}\right) \approx \left( {\begin{array}{c}n+C\varepsilon ^{-1/2}\\ C\varepsilon ^{-1/2}\end{array}}\right) =:B_{\textrm{dense} }(\varepsilon ), \end{aligned}$$

at least when certain non-degeneracy conditions are satisfied [5]. The bounds we find in Theorem 6 —in the case in which \(J_j\) is the largest of the sets \(J_1,\dots ,J_\ell \)— give a bound for the complexity of the leading term as (the same power of)

$$\begin{aligned} \ell \left( {\begin{array}{c}|J_j|+|{{\textbf{r}}_j}|\\ |{{\textbf{r}}_j}|\end{array}}\right) \le \ell \left( {\begin{array}{c}|J_j|(1+C'\varepsilon ^{-\frac{|J_j|+3}{2}})\\ |J_j|C'\varepsilon ^{-\frac{|J_j|+3}{2}}\end{array}}\right) =:B_{\textrm{sparseSchm}}(\varepsilon ). \end{aligned}$$

The reason we have \(|{\textbf{r}}_j|\le |J_j|C'\varepsilon ^{-|J_j|-3}\) is that \(r_{j,i}\le O(\varepsilon ^{-\frac{|J_j|+3}{2}})\) and there are at most \(|J_j|\) values of i with \(r_{j,i}\ne 0\).

Proposition 7

If \(n>|J_j|(|J_j|+3)\) for all \(j=1,\dots ,\ell \), then we have

$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparse Schm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=0. \end{aligned}$$

Thus, if the size of the largest clique is of the order of square root of the ambient dimension n or smaller, the sparse bound outperforms the best available dense bound if the performance is measured by the amount of time required by an optimization method to find a bound of a given accuracy \(\varepsilon \).

Proof of Proposition 7

by Lemma 20 we have, as \(\varepsilon \searrow 0\),

$$\begin{aligned} \frac{B_{\textrm{sparse Schm}}(\varepsilon )}{B_{\textrm{dense}}(\varepsilon )}=\frac{\displaystyle \left( {\begin{array}{c}|J_j|(1+C'\varepsilon ^{-\frac{|J_j|+3}{2}})\\ |J_j|C'\varepsilon ^{-\frac{|J_j|+3}{2}}\end{array}}\right) }{\left( {\begin{array}{c}n+C\varepsilon ^{-1/2}\\ C\varepsilon ^{-1/2}\end{array}}\right) }=O(\varepsilon ^{\frac{1}{2}(n-|J_j|(|J_j|+3))}), \end{aligned}$$

and this tends to 0 if the sparsity of the polynomial p is such that \(n>|J_j|(|J_j|+3)\). \(\square \)

1.2.2 Sparse Putinar-type representation on arbitrary domains

For a set of polynomials \(\textbf{g}=\{g_1,\dots ,g_{{\bar{k}}}\}\), denote

$$\begin{aligned} S(\textbf{g})=\{x\in \mathbb {R}^n:\hbox {} g_i(x)\ge 0\, \hbox {for all}\, i=1,\dots ,{\bar{k}}\} \end{aligned}$$

and, for a subset \(K\subset \{1,\dots ,{\bar{k}}\}\), denote

$$\begin{aligned} \textbf{g}_{K}=\{g_i:i\in K\}. \end{aligned}$$

Theorem 8

Let \(n>0\), \({\bar{k}}>0\), \(\ell \ge 2\), \(J_1,\dots ,J_\ell \subset \{1,\dots ,n\}\), \({\textbf{r}}_1,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), and \(p=p_1+\dots +p_\ell \) with \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\).

Assume that the sets \(J_1,\dots ,J_\ell \) satisfy (RIP).

Let \(K_1,\dots ,K_\ell \subset \{1,\dots ,{\bar{k}}\}\) and let \(\textbf{g}=\{g_1,\dots ,g_{\bar{k}}\}\subset \mathbb {R}[x]_{}\) be a collection of \({\bar{k}}\) polynomials such that, if \(i\in K_j\) for some \(1\le j\le \ell \), then \(g_i\in \mathbb {R}[x_{J_j}]_{}\). Assume that

$$\begin{aligned} \Vert g_j\Vert _\infty \le \frac{1}{2},\quad j=1,2,\dots , {\bar{k}}. \end{aligned}$$
(3)

Assume that \(S(\textbf{g})\subset [-1,1]^n\) and that there exist polynomials \(s_{j,i}\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\), \(j=1,\dots ,\ell \), \(i\in \{0\}\cup K_j\), such that the Archimedean conditions

$$\begin{aligned} 1-\sum _{i\in J_j}x_i^2=s_{j,0}(x_{J_j})^2+\sum _{i\in K_j}s_{j,i}(x_{J_j})^2g_i(x_{J_j}),\quad j=1,\dots , \ell , \end{aligned}$$
(4)

hold; that is to say, we assume that \(1-\sum _{i\in J_j}x_i^2\in \mathcal Q_{{\textbf{r}}_j,J_j}(\textbf{g}_{K_j})\). Let \({\mathsf c}_1,\dots ,{\mathsf c}_{\ell }\ge 1\) and \({\mathsf L}_1,\dots ,{\mathsf L}_{\ell }\ge 1\) be constants such thatFootnote 3

$$\begin{aligned} {\text {dist}}(x,S(\textbf{g}_{K_j}))^{{\mathsf L}_j}\le - {\mathsf c}_j\min \left\{ \{0\}\cup \left\{ g_i(x):i\in K_j\right\} \right\} \quad \text {for all}\quad x\in [-1,1]^n. \end{aligned}$$

Then there are constants \({\textbf{C}_{j}}>0\), depending only on \(\textbf{g}\), \(J_1,\dots ,J_\ell \), such that, if \(p\ge \varepsilon >0\) on \(S(\textbf{g})\), we have

$$\begin{aligned} p\in \mathcal Q_{{\textbf{r}}_1+\textbf{2},J_j}(\textbf{g}_{K_j})+\dots +\mathcal Q_{{\textbf{r}}_\ell +\textbf{2},J_\ell }(\textbf{g}_{K_\ell }) \end{aligned}$$

as long as, for all \(1\le j\le \ell \) and \(1\le k\le n\),

$$\begin{aligned}{} & {} (r_{j,k}+2)^2\ge {\textbf{C}_{j}}\frac{4(\ell +2)\left( \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}(\deg p_j\sum _{i=1}^\ell {{\,\textrm{Lip}\,}}p_i)^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}}{\varepsilon ^{1+{\mathsf L}_j+\frac{4{\mathsf L}_j+1}{3}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}}, \end{aligned}$$
(5)
$$\begin{aligned}{} & {} (r_{j,k}+2)^2\ge {\textbf{C}_{j}}\left( \frac{\left( \sum _{i=1}^\ell \Vert p_i\Vert _\infty \right) ^{\frac{4{\mathsf L}_j+1}{3}}(\deg p_j\sum _{i=1}^\ell {{\,\textrm{Lip}\,}}p_i)^{\frac{8{\mathsf L}_j}{3}}}{ \varepsilon ^{\frac{12{\mathsf L}_j+1}{3}}}\right) ^2. \end{aligned}$$
(6)

The proof of the theorem can be found in Sect. 4.2.

Discussion By the same arguments we used in the discussion at the end of the previous section, if we assume \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\), the bounds we find in Theorem 8 give a bound for the complexity of the leading term as (a power of)Footnote 4

$$\begin{aligned} \ell \left( {\begin{array}{c}|J_j|+|{{\textbf{r}}_j}|\\ |{{\textbf{r}}_j}|\end{array}}\right) \approx \left( {\begin{array}{c}|J_j|(1+C''\varepsilon ^{-13-3|J_j|})\\ |J_j|C''\varepsilon ^{-13-3|J_j|}\end{array}}\right) )=:B_{\text {sparsePut}}, \end{aligned}$$

at least when certain non-degeneracy conditions are satisfied [5]. The assumption \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\) is realized for example when the so-called constraint qualification condition that, at each point \(x\in S(\textbf{g})\) all the active constraints \(g_{i_1},\dots ,g_{i_l}\) (i.e., those satisfying \(g_{i_j}(x)=0)\) have linearly independent gradients \(\nabla g_{i_1}(x),\dots ,\nabla g_{i_l}(x)\)), holds; this latter statement is proved in [2, Thm 2.11].

In this case, we have:

Proposition 9

If \(\frac{n}{>}|J_j|(26+6|J_j|)\) for all \(j=1,\dots ,\ell \) and if \({\mathsf L}_1=\dots ={\mathsf L}_{{\bar{k}}}=1\), then we have

$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{B_{\textrm{sparsePut}}}{B_{\textrm{dense}}}=0. \end{aligned}$$

Again the implication is that the sparse bound asymptotically outperforms the dense bound provided that the largest clique is sufficiently small.

Proof of Proposition 9

Lemma 20 gives

$$\begin{aligned} \frac{B_{\textrm{sparsePut}}}{B_{\textrm{dense}}}= \frac{\displaystyle \left( {\begin{array}{c}|J_j|(1+C''\varepsilon ^{-13-3|J_j|})\\ |J_j|C'' \varepsilon ^{-13-3|J_j|}\end{array}}\right) }{\left( {\begin{array}{c}n+C\varepsilon ^{-1/2}\\ C\varepsilon ^{-1/2}\end{array}}\right) }=O(\varepsilon ^{\frac{n}{2}-|J_j|(13+3|J_j|)}), \end{aligned}$$

which tends to 0 if \(\frac{n}{2}>|J_j|(13+3|J_j|)\). \(\square \)

Organization of the paper The proof of Theorem 6 can be seen as a variable-separated version of the proof in [14], which relies on the Jackson kernel. Therefore in Sect. 2 we derive the suitable ingredients for sparse Jackson kernels while carefully taking into account each variable separately.

A strategy is also required to write a positive polynomial p that is known to be a sum \(p=p_1+\dots +p_\ell \) with \(p_i\in \mathbb {R}[x_{J_i}]_{}\) as a similar sum \(p=h_1+\dots +h_\ell \) but now with \(h_j\in \mathbb {R}[x_{J_j}]_{}\) and \(h_j\ge 0 \) on \([-1,1]^{|J_j|}\); this is done in Sect. 3.

Section 4 gives the proofs of Theorems 6 and 8, together with the statement and proof of Lemma 20, which was used in the proofs of Propositions 7 and 9 above.

2 The sparse Jackson kernel

In this section, we derive Corollary 13, one of the main ingredients of the proof of Theorem 6. The corollary guarantees that polynomials bounded from below by \(\varepsilon >0\) on \([-1,1]^n\) are in the preordering \(\mathcal P_{\textbf{r},J}(\{1-x_i^2\}_{i\in J})\) (defined in Sect. 1.2) for \(\varepsilon \) large enough. The corollary follows from Theorem 14, which gives a refined estimate of the distance between a polynomial p and its preimage under a Jackson-style operator that treats each variable separately. We begin with some preliminaries and a useful lemma.

The measure \(\mu _{n}\) on the box \([-1,1]^n\) defined by

$$\begin{aligned} d\mu _{n}(x){:}{=}y\frac{dx_1}{\pi \sqrt{1-x_1^2}}\cdots \frac{dx_n}{\pi \sqrt{1-x_n^2}},\quad x=(x_1,\dots ,x_n)\in [-1,1]^n, \end{aligned}$$

is known as the (normalized) Chebyshev measure; it is a probability measure on \([-1,1]^n\). It induces the inner product

$$\begin{aligned} \langle f , g \rangle _{\mu _{n}}{:}{=}y\int _{[-1,1]^n}f(x)g(x)\,d\mu _{n}(x) \end{aligned}$$

and the norm \(\Vert f\Vert _{\mu _{n}}=\sqrt{\langle f , f \rangle _{\mu _{n}}}\).

For \(k=0,1,\dots \), we let \(T_k\in \mathbb {R}[x]_{}\) be the univariate Chebyshev polynomial of degree k, defined by

$$\begin{aligned} T_k(\cos \theta ){:}{=}y\cos (k\theta ),\quad \theta \in \mathbb {R}. \end{aligned}$$

The Chebyshev polynomials satisfy \(|T_k(x)|\le 1\) for all \(x\in [-1,1]\), and

$$\begin{aligned} \langle T_a , T_b \rangle _{\mu _{1}}=\int _{-1}^1\frac{T_a(x)T_b(x)}{\pi \sqrt{1-x^2}}dx={\left\{ \begin{array}{ll}0,&{}a\ne b,\\ 1,&{}a=b=0,\\ \frac{1}{2},&{}a=b\ne 0.\end{array}\right. } \end{aligned}$$

For a multi-index \(I=(i_1,\dots ,i_n)\), we let

$$\begin{aligned} T_I(x_1,\dots ,x_n){:}{=}y T_{i_1}(x_1)T_{i_2}(x_2)\cdots T_{i_n}(x_n) \end{aligned}$$

be the multivariate Chebyshev polynomials, which then satisfy (see for example [28, §II.A.1]), for multi-indices \(I\) and \(I'\),

$$\begin{aligned} \deg T_I=|I|\quad \text {and}\quad \langle T_I , T_{I'} \rangle _{\mu _{n}}={\left\{ \begin{array}{ll} 0,&{}I\ne I',\\ 2^{-w(I)},&{}I=I'. \end{array}\right. } \end{aligned}$$
(7)

Thus \(p\in \mathbb {R}[x]_{d}\) can be expanded as \(p=\sum _{|I|\le d}2^{w(I)}\langle p , T_I \rangle _{\mu _{n}}T_I\).

If we let, for a finite collection \(\Lambda \subseteq \mathbb {R}\times \mathbb {N}_0^n\) of pairs \((\lambda ,I)\) of a real number \(\lambda \) and a multi-index \(I\),

$$\begin{aligned} K_{}^{\Lambda }(x,y)=\sum _{(\lambda ,I)\in \Lambda }2^{w(I)}\lambda \, T_I(x)T_I(y),\quad x,y\in \mathbb {R}^n, \end{aligned}$$

then, for any \(p\in \mathbb {R}[x]_{}\), we have

$$\begin{aligned} {\textbf{K}_{}^{\Lambda }}(p)(x){:}{=}\int _{[-1,1]^n}K_{}^{\Lambda }(x,y)p(y)\,d\mu _{n}(y)=\sum _{\begin{array}{c} (\lambda ,I)\in \Lambda \\ |I|\le d \end{array}} 2^{w(I)}\lambda \langle p , T_I \rangle _{\mu _{n}} T_I(x). \end{aligned}$$

This means that, if we set all the nonzero numbers \(\lambda \) equal to 1, then \({\textbf{K}_{}^{\Lambda }}\) is the identity operator in the linear span of \( \{T_I:{\exists \lambda \ne 0\;\mathrm {s.t.}\;}(\lambda ,I)\in \Lambda \}\subseteq \mathbb {R}[x]_{d}\).

We let, for \(r,k\in \mathbb {N}\),

$$\begin{aligned} \lambda ^r_k=\frac{1}{r+2}\left( (r+2-k)\cos \tfrac{\pi k}{r+2}+\frac{\sin \frac{\pi k}{r+2}}{\sin \frac{\pi }{r+2}}\cos \tfrac{\pi }{r+2}\right) ,\quad 1\le k\le r, \end{aligned}$$

and

$$\begin{aligned} \lambda ^r_0=1. \end{aligned}$$

We set, for \({\textbf{r}}=(r_1,\dots ,r_n)\in \mathbb {N}_0^n\),

$$\begin{aligned} \lambda ^{\textbf{r}}_I=\prod _{j=1}^n\lambda _{i_j}^{r_j} \end{aligned}$$

and

$$\begin{aligned} \Lambda _{\textbf{r}}=\{(\lambda ^{\textbf{r}}_I,I):I\le {\textbf{r}}\}. \end{aligned}$$

Then \({K_{{\textbf{r}}}^{\textrm{Jac}}}=K_{}^{\Lambda _{\textbf{r}}}\) is the (\({\textbf{r}}\)-adapted) Jackson kernel, and its associated linear operator \({\textbf{K}_{}^{\Lambda _{\textbf{r}}}}\) is denoted \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\).

Lemma 10

Let \({\textbf{r}}\in \mathbb {N}_0^n\) be a multi-index. The operator \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) defined above has the following properties:

  1. i.

    \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(\mathbb {R}[x_{J}]_{{\textbf{r}}})\subseteq \mathbb {R}[x_{J}]_{{\textbf{r}}}\).

  2. ii.

    We have

    $$\begin{aligned} {\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(T_I)={\left\{ \begin{array}{ll} \lambda ^{\textbf{r}}_IT_I, &{} I\le {\textbf{r}},\\ 0,&{}\text {otherwise.} \end{array}\right. } \end{aligned}$$

    In particular, \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(1)=1\).

  3. iii.

    \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) is invertible in \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\) with \(J=\{i:1\le i\le n,\;r_i>0\}\).

  4. iv.

    \(0< \lambda ^{\textbf{r}}_I\le 1\) for all \(0\le I\le {\textbf{r}}\).

  5. v.

    For \(I=(i_1,\dots ,i_n)\) and \( {\textbf{r}}=(r_1,\dots ,r_n)\) in \(\mathbb {N}_0^n\),

    $$\begin{aligned} |1-\lambda ^{\textbf{r}}_I|=1-\lambda ^{\textbf{r}}_I\le n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}. \end{aligned}$$
  6. vi.

    For \(I=(i_1,\dots ,i_n)\) and \( {\textbf{r}}=(r_1,\dots ,r_n)\) in \(\mathbb {N}_0^n\) that verify (9), we have

    $$\begin{aligned} \left| 1-\frac{1}{\lambda ^{\textbf{r}}_I}\right| \le 2n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}. \end{aligned}$$
  7. vii.

    Let \(p \in \mathbb {R}[x]_{J,{\textbf{r}}}\) with \(p(x)\ge 0\) for all \(x\in [-1,1]^n\) and \(\Vert p\Vert _\infty \le 1\). Let \({\textbf{r}}=(r_1,\dots ,r_n)\) be a multi-index such that \(I\le {\textbf{r}}\) for all \(I\in {\mathcal I_{p}}\), and assume that, for all \(I=(i_1,\dots ,i_n)\in {\mathcal I_{p}}\), condition (9) is verified. Then we have

    $$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p)-p\right\| _\infty \le 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{\begin{array}{c} I\in {\mathcal I_{p}}\\ 1\le j\le n \end{array}}\left[ 2^{w(I)/2}\frac{i_j^2}{(r_j+2)^2}\right] . \end{aligned}$$
  8. viii.

    \({K_{{\textbf{r}}}^{\textrm{Jac}}}(x,y)\ge 0\) for all \(x,y\in [-1,1]^n\).

  9. ix.

    If \(p\in \mathbb {R}[x]_{}\) is such that \(p(x)\ge 0\) for \(x\in [-1,1]^n\), then \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)(x)\ge 0\) for all \(x\in [-1,1]^n\).

Proof

Throughout, we follow [14].

Item (ii) is immediate from the definitions and (7). Item (i) follows from item (ii) and the fact that \(\{T_I:I\le {\textbf{r}},\;I\subseteq J\}\) is a basis for \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\).

Observe that item (ii.) means that \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\) is diagonal in \(\mathbb {R}[x_{J}]_{{\textbf{r}}}\), so in order to prove item (iii.) it suffices to show that \(\lambda ^{\textbf{r}}_I>0\) for all \(I\le {\textbf{r}}\), \(I\subseteq J\). This follows immediately from item (iv.), which in turn follows from the definition of \(\lambda ^{\textbf{r}}_I\) and [14, Proposition 6(ii)], which shows that \(0<\lambda _k^r\le 1\) for all \(0\le k\le r\).

Similarly, by [14, Proposition 6(iii)] we have that, if \(k\le r\), then

$$\begin{aligned} \left| 1-\lambda ^r_k\right| =1-\lambda ^r_k\le \frac{\pi ^2 k^2}{(r+2)^2}. \end{aligned}$$

Thus, if \(\gamma _j=1-\lambda _{i_j}^{r_j}\le \pi ^2i_j^2/(r_j+2)^2 \) and \(\gamma =\max _j\gamma _j\), we also have, using Bernoulli’s inequality [14, Lemma 11],

$$\begin{aligned} 1-\lambda ^{\textbf{r}}_I&=1-\prod _{j=1}^n\lambda ^{r_j}_{i_j}\\&=1-\prod _{j=1}^n(1-\gamma _j)\\&\le 1-(1-\gamma )^n \\&\le n\gamma \\&\le n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}. \end{aligned}$$

This shows item (v). Using it, we can prove item (vi) as follows: condition (9) implies, by item (v), that \(|1-\lambda ^{\textbf{r}}_I|\le 1/2\), and hence \(|\lambda ^{\textbf{r}}_I|\ge 1/2\), so

$$\begin{aligned} \left| 1-\frac{1}{\lambda ^{\textbf{r}}_I}\right| =\frac{|1-\lambda ^{\textbf{r}}_I|}{|\lambda ^{\textbf{r}}_I|}\le 2n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}, \end{aligned}$$

leveraging item (v) again.

Let us show item (vii). From items (ii) and (iii), we have

$$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p)-p\right\| _\infty&=\left\| \sum _{I}\left[ \frac{1}{\lambda ^{\textbf{r}}_I}2^ {w(I)}\langle p , T_I \rangle _{\mu _{n}} T_{I}-2^{w(I)}\langle p , T_I \rangle _{\mu _{n}}T_{I}\right] \right\| _\infty \\&\le \sum _{I}2^{w(I)}|\langle p , T_I \rangle _{\mu _{n}}|\left| 1-\frac{1}{\lambda ^{\textbf{r}}_I}\right| , \end{aligned}$$

because \(|T_I(x)|\le 1\) for all \(x\in [-1,1]^n\). Plugging in the estimate from item (vi.), we get

$$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p)-p\right\| _\infty\le & {} \sum _{I}2^{w(I)/2+1}n\pi ^2\max _j\frac{i_j^2}{(r_j+2)^2}\\\le & {} 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{I\in {\mathcal I_{p}}} \left[ 2^{w(I)/2}\max _j\frac{i_j^2}{(r_j+2)^2}\right] , \end{aligned}$$

where we have also used

$$\begin{aligned} |\langle p , T_I \rangle _{\mu _{n}}|\le \Vert p\Vert _{\mu _{n}}\Vert T_I\Vert _{\mu _{n}}\le \Vert T_I\Vert _{\mu _{n}}=2^{-w(I)/2}, \end{aligned}$$

which follows from (7).

To prove item (viii), let, for a fixed multi-index \({\textbf{r}}\),

$$\begin{aligned} \Lambda _k&=\{(\lambda _I^{\textbf{r}},I):I\le (0,\dots ,0,r_k,0,\dots ,0)\}\\&=\{(\lambda _{i_k}^{r_k},(0,\dots ,0,i_k,0,\dots ,0)):i_k\le r_k\},\quad 1\le k\le n, \end{aligned}$$

and observe that

$$\begin{aligned} {\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}={\textbf{K}_{}^{\Lambda _{\textbf{r}}}}={\textbf{K}_{x_1}^{\Lambda _{1}}}\circ {\textbf{K}_{x_2}^{\Lambda _{2}}}\circ \dots \circ {\textbf{K}_{x_n}^{\Lambda _{n}}} \end{aligned}$$
(8)

where \({\textbf{K}_{x_k}^{\Lambda _{k}}}\) is the operator \({\textbf{K}_{}^{\Lambda _{k}}}\) acting in the variable \(x_k\), i.e.,

$$\begin{aligned} {\textbf{K}_{x_k}^{\Lambda _k}}(p)(x)=\int _{-1}^{1}K_{}^{\Lambda _k}(x_k,y) \,p(x_1,\dots ,x_{k-1},y,x_{k+1},\dots ,x_n)\,d\mu _{1}(y). \end{aligned}$$

Equation (8) follows from the identity

$$\begin{aligned} K_{}^{\Lambda _{\textbf{r}}}(x,y)&=K_{}^{\Lambda _{1}}(x_1,y_1)K_{}^{\Lambda _{2}}(x_2,y_2)\cdots K_{}^{\Lambda _{n}}(x_n,y_n)\\&={K_{{(r_1)}}^{\textrm{Jac}}}(x_1,y_1){K_{{(r_2)}}^{\textrm{Jac}}}(x_2,y_2)\cdots {K_{{(r_n)}}^{\textrm{Jac}}}(x_n,y_n), \end{aligned}$$

that can be checked from the definitions. Item (viii) then follows from the well-known fact that \({K_{(r)}^{\textrm{Jac}}}(x,y)\ge 0\) for all \(r\in \mathbb {N}_0\) and all \(x,y\in [-1,1]\); see for example [28, §II.C.2–3].

Item (ix) follows immediately from item (viii). \(\square \)

Theorem 11

We have \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(\mathbb {R}[x_{J}]_{{\textbf{r}}})\subseteq \mathbb {R}[x_{J}]_{{\textbf{r}}}\), and if \(p(x)\ge 0\) on \([-1,1]^n\) then \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\ge 0\) on \([-1,1]^n\).

Also, we have:

  1. P1.

    If \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) satisfies \(p(x)\ge 0\) for all \(x\in [-1,1]^n\),

    $$\begin{aligned}{\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\in \mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J}).\end{aligned}$$
  2. P2.

    Let \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) be a polynomial that satisfies \(0\le p(x)\le 1\) for all \(x\in [-1,1]^n\), and for all \(I=(i_1,\dots ,i_n)\in {\mathcal I_{p}}\), assume that \({\textbf{r}}=(r_1,\dots ,r_n)\) verifies

    $$\begin{aligned} \frac{i_j^2}{(r_j+2)^2}\le \frac{1}{2\pi ^2n},\quad 1\le j\le n. \end{aligned}$$
    (9)

    Assume that

    $$\begin{aligned} \varepsilon \ge 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{I\in {{\mathcal I_{p}}}}\left[ 2^{w(I)/2}\max _j\frac{i_j^2}{(r_j+2)^2}\right] >0. \end{aligned}$$

    Then

    $$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )-(p+\varepsilon )\right\| _\infty \le \varepsilon . \end{aligned}$$

Proof

The first statement of the theorem corresponds to Lemma 10(i) and 10(ix). Property p2. follows from Lemma 10(vii).

Let us prove property p1. Take a finite subset \(\{z_i\}_{i}\) of \([-1,1]^n\) and a corresponding set of positive weights \(\{w_i\}_i\subset \mathbb {R}\) giving a quadrature rule for integration of polynomials \(q\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\), so that

$$\begin{aligned} \int _{[-1,1]^n}q(x)\,d\mu _{n}(x)=\sum _iw_iq(z_i)\quad \text {for all}\quad q\in \mathbb {R}[x_{J}]_{{\textbf{r}}}. \end{aligned}$$

Then we have, for p as in the statement of p1.,

$$\begin{aligned} {\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)(x)=\sum _{i}w_ip(z_i){K_{{\textbf{r}}}^{\textrm{Jac}}}(z_i,x), \end{aligned}$$

with \(w_ip(z_i)\ge 0\). Since, by Lemma 10(viii) and Theorem 12 below, \({K_{{\textbf{r}}}^{\textrm{Jac}}}(z_i,x)\) is in \(\mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J})\), so is \({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}(p)\). \(\square \)

Theorem 12

([8, Th. 10.3]) If \(p \in \mathbb {R}[y]\) is a univariate polynomial of degree d nonnegative on the interval \([a,b] \subset \mathbb {R}\), then

$$\begin{aligned} {\left\{ \begin{array}{ll} p = \sigma _0 + \sigma _1(b-y)(y-a),\quad \sigma _0\in \Sigma _d[y], \;\;\;\;\,\sigma _1 \in \Sigma _{d-2}[y] &{} d \mathrm {\;\, even,} \\ p = \sigma _0(y-a) + \sigma _1(b-y),\quad \sigma _0\in \Sigma _{d-1}[y], \;\sigma _1 \in \Sigma _{d-1}[y] &{} d \mathrm {\;\, odd}, \end{array}\right. } \end{aligned}$$

where \(\Sigma _d\) is the cone of sum-of-squareds of polynomials of degree at most d.

Corollary 13

If \(p\in \mathbb {R}[x_{J}]_{{\textbf{r}}}\) satisfies \(0\le p(x)\le 1\) for all \(x\in [-1,1]^n\), then

$$\begin{aligned} p+\varepsilon \in \mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J}) \end{aligned}$$

for all multi-indices \({\textbf{r}}\) satisfying (9) and

$$\begin{aligned} \varepsilon \ge 2n\pi ^2\left( \prod _{1\le k\le n}(({{\,\mathrm{\overline{\deg }}\,}}p)_k+1)\right) \max _{I\in {{\mathcal I_{p}}}}\left( 2^{w(I)/2}\max _{1\le j\le n}\frac{i_j^2}{(r_j+2)^2}\right) . \end{aligned}$$

Here, \({\textbf{r}}=(r_1,\dots ,r_n)\) and \(I=(i_1,\dots ,i_n)\).

Proof

By property p2. in Theorem 11,

$$\begin{aligned} \left\| ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )-(p+\varepsilon )\right\| _\infty \le \varepsilon . \end{aligned}$$

Thus, \(({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )\ge 0\) on \([-1,1]^n\). By property p1. and Lemma 10(i),

$$\begin{aligned} p+\varepsilon ={\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}}\circ ({\textbf{K}_{{\textbf{r}}}^{\textrm{Jac}}})^{-1}(p+\varepsilon )\in \mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J}). \end{aligned}$$

\(\square \)

The rest of this section is devoted to results used in the proof of Theorem 11.

3 Sparse approximation theory

In this section, we prove a useful result, Lemma 15, that is crucial to our proof of Theorem 6. Given a positive polynomial \(f\ge \varepsilon >0\) for which we know there is an expression of the type \(f=f_1+\dots +f_\ell \), with each \(f_j\) depending only on the variables indexed by a subset \(J_j\subset \{1,\dots ,n\}\) (but with \(f_j\) not necessarily positive), the lemma gives us positive polynomials \(h_1,\dots , h_\ell \) such that \(f=h_1+\dots +h_\ell \) and \(h_j\) depends only on the variables indexed by the subset \(J_j\). To prove the lemma we need some preliminaries.

For \(1\le i\le n\) and a function \(f:[-1,1]^n\rightarrow \mathbb {R}\), let

$$\begin{aligned} {{\,\textrm{Lip}\,}}_if=\sup _{\begin{array}{c} x\in [-1,1]^n\\ y\in [-1,1] \end{array}}\frac{|f(x)-f(x_1,\dots ,x_{i-1},y,x_{i+1},\dots ,x_n)|}{|x_i-y|}. \end{aligned}$$

Theorem 14

Let \(f\in C^0([-1,1]^n)\) be a Lipschitz function with variable-wise Lipschitz constants \({{\,\textrm{Lip}\,}}_1 f,\dots ,{{\,\textrm{Lip}\,}}_nf\). Then there is a constant \(C_{\textrm{Jac}}>0\) such that, for each multi-index \(\textbf{m}=(m_1,\dots ,m_n)\in \mathbb {N}^n\), there is a polynomial \(p\in \mathbb {R}[x]_{\textbf{m}}\) such that

$$\begin{aligned} \sup _{x\in [-1,1]^n}|f(x)-p(x)|\le C_{\textrm{Jac}}\sum _{i=1}^n\frac{{{\,\textrm{Lip}\,}}_i f}{m_i} \end{aligned}$$

and

$$\begin{aligned} {{\,\textrm{Lip}\,}}_i p\le 2{{\,\textrm{Lip}\,}}_i f. \end{aligned}$$

The constant \(C_{\textrm{Jac}}\) does not depend on n, f, or \(\textbf{m}\).

Proof

Jackson [7, p. 2–6] proved that there is a constant \(C>0\) (independent of \(n,f,\textbf{m}\)) such that, if \(g:\mathbb {R}\rightarrow \mathbb {R}\) is Lipschitz and \(\pi \)-periodic, \(g(0)=g(\pi )\), then

$$\begin{aligned} \left| g(\theta )-\frac{\int _{-\pi /2}^{\pi /2} g(\theta -\vartheta )\left( \frac{\sin m\vartheta }{m\sin \vartheta }\right) ^4d\vartheta }{\int _{-\pi /2}^{\pi /2}\left( \frac{\sin m\vartheta }{m\sin \vartheta }\right) ^4d\vartheta }\right| \le \frac{C{{\,\textrm{Lip}\,}}g}{m},\quad m\in \mathbb {N}, x\in \mathbb {R}. \end{aligned}$$
(10)

For a multivariate Lipschitz function \(g:\mathbb {R}^n\rightarrow \mathbb {R}\) and a multi-index \(\textbf{m}=(m_1,\dots ,m_n)\in \mathbb {N}^n\), let

$$\begin{aligned}{} & {} L_i(g)(\theta )\\{} & {} \qquad =\frac{\int _{-\pi /2}^{\pi /2}\dots \int _{-\pi /2}^{\pi /2} g(\theta _1-\vartheta _1,\dots ,\theta _i-\vartheta _i,\theta _{i+1},\dots ,\theta _n)\prod _{j=1}^i\left( \frac{\sin m_j\vartheta _j}{m_j\sin \vartheta _j}\right) ^4d\vartheta _1\dots d\vartheta _i}{\prod _{j=1}^i\int _{-\pi /2}^{\pi /2}\left( \frac{\sin m_j\vartheta _j}{m_j\sin \vartheta _j}\right) ^4 d\vartheta _j}. \end{aligned}$$

Then we have, using the triangle inequality and the single-variable inequality (10) at each step,

$$\begin{aligned}{} & {} |g(\theta )-L_n(g)(\theta )|\\{} & {} \quad \le |g(\theta )-L_1(g)(\theta )|+|L_1(g)(\theta )-L_2(g)(\theta )|+\dots +|L_{n-1}(g)(\theta )-L_{n}(g)(\theta )|\\{} & {} \quad \le C\left( \frac{{{\,\textrm{Lip}\,}}_1g}{m_1}+\dots +\frac{{{\,\textrm{Lip}\,}}_n g}{m_n}\right) . \end{aligned}$$

The function \(\prod _j(\sin m_j\theta _j/m_j\sin \theta _j)^4\) is a polynomial of degree \(m_j\) in \(\cos \theta _j\) (cf. [7, p. 3]). If we replace f with its Lipschitz extension to \([-2,2]^n\) and apply the results above to \(g(\theta )=f(2\cos \theta _1,\dots ,2\cos \theta _n)\) we get a polynomial \(L_n(g)(\theta )\) in \(\cos \theta _1,\dots ,\cos \theta _n\) satisfying the above inequality. Thus

$$\begin{aligned} p(x)=L_n(g)(\arccos (x_1/2),\dots ,\arccos (x_n/2)),\quad x\in [-2,2], \end{aligned}$$

is a polynomial with \({{\,\mathrm{\overline{\deg }}\,}}p\le \textbf{m}\) that satisfies (cf. [7, p. 13–14])

$$\begin{aligned} |f(x)-p(x)|\le C\left( \frac{{{\,\textrm{Lip}\,}}_1 g}{m_1}+\dots +\frac{{{\,\textrm{Lip}\,}}_n g}{m_n}\right) \le 2C\left( \frac{{{\,\textrm{Lip}\,}}_1 f}{m_1}+\dots +\frac{{{\,\textrm{Lip}\,}}_n f}{m_n}\right) , \end{aligned}$$

since \({{\,\textrm{Lip}\,}}_i g\le 2{{\,\textrm{Lip}\,}}_i f\). This proves the first statement, setting \(C_{\textrm{Jac}}=2C\). We also have

$$\begin{aligned} \left| \frac{d}{dx}\arccos (x/2)\right| =\frac{1}{2\sqrt{1-(x/2)^2}}\le \frac{1}{\sqrt{3}}\quad \text {for}\quad x\in [-1,1], \end{aligned}$$

and, by linearity and monotonicity of \(L_n\),

$$\begin{aligned}{} & {} |L_ng(\theta )-L_ng(\theta _1,\dots ,\theta _{i-1},\theta _i+t,\theta _{i+1},\dots ,\theta _n)| \\{} & {} \quad \le \left| L_n(|t|{{\,\textrm{Lip}\,}}_i g)(\theta )\right| = |t|{{\,\textrm{Lip}\,}}_i g\le 2|t|{{\,\textrm{Lip}\,}}_i f, \end{aligned}$$

whence

$$\begin{aligned} {{\,\textrm{Lip}\,}}_ip= & {} {{\,\textrm{Lip}\,}}_iL_n(g)(\arccos (x_1/2),\dots ,\arccos (x_n/2))\\\le & {} {{\,\textrm{Lip}\,}}_i L_n(g)\left| \frac{d}{dx_i}\arccos \frac{x_i}{2}\right| \le {{\,\textrm{Lip}\,}}_iL_n(g)\le 2{{\,\textrm{Lip}\,}}_if. \square \end{aligned}$$

Lemma 15

(a version of [6, Lemma 3]) Let \(J_1,\dots , J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP). Suppose \(f=f_1+\dots +f_\ell \) with \(\ell \ge 2\), \(f_j\in \mathbb {R}[x_{J_j}]_{}\). Let \(\varepsilon >0\) be such that \(f\ge \varepsilon \) on \(S(\textbf{g})\subseteq [-1,1]^n\). Pick numbers \(\epsilon ,\eta >0\) so that

$$\begin{aligned} \varepsilon =(\ell -1)\epsilon -(\ell -2)\eta \quad \text {and}\quad \epsilon >2\eta . \end{aligned}$$

Set, for \(2\le l\le \ell \),

(11)

with \(\mathcal {J}_l\) as in (2), and \(D_{1,m}=D_{2,m}\).

Then \(f=h_1+\dots +h_\ell \) for some \(h_j\in \mathbb {R}[x_{J_j}]_{}\) with \(h_j\ge \eta \) on \(S(\textbf{g})\subseteq [-1,1]^n\) and

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}h_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,\bar{D}_{j,\ell },\bar{D}_{j+1,\ell },\dots , \bar{D}_{\ell ,\ell })_{J_j} \end{aligned}$$
(12)

where \(\bar{D}_{j,m}\) is the multi-index whose k-th entry equals \(D_{j,m}\) if \(k\in \mathcal {J}_j=J_j\cap \bigcup _{k<j}J_i\) and 0 otherwise, and the maximum is taken entry-wise.

Additionally, if \({{\,\textrm{Lip}\,}}f\) denotes the Lipschitz constant of f on \([-1,1]^n\), then

$$\begin{aligned} {{\,\textrm{Lip}\,}}h_j\le 3\sum _{k=j}^\ell {{\,\textrm{Lip}\,}}f_k. \end{aligned}$$

Finally, we have

$$\begin{aligned} \Vert h_j\Vert _\infty \le 3\times 2^{\ell -1}\sum _{j=1}^\ell \Vert f_j\Vert _\infty . \end{aligned}$$

Remark 16

If \(S(\textbf{g})=[-1,1]^n\), we also have the obvious estimate \(\Vert h_j\Vert _\infty \le \Vert f\Vert _\infty \), that follows from \(0\le h_j\le f\).

Proof

In order to prove the result by induction, let us first consider the case \(\ell =2\). In this case, \(\varepsilon =\epsilon \) and \(\epsilon >2\eta \). Assume that \(J_1\cap J_2\ne \emptyset \). For a subset \(J\subset \{1,\dots ,n\}\), let \(\pi _J\) denote the projection onto the variables with indices in J, that is, \(\pi _J(x)=(x_i)_{i\in J}\in [-1,1]^J\) for \(x\in [-1,1]^n\).

Define \(g:[-1,1]^{J_1\cap J_2}\rightarrow \mathbb {R}\) by

$$\begin{aligned} g(x):=\min _{y\in \pi _{J_1{\setminus }J_2}(S(\textbf{g}))\subseteq [-1,1]^{J_1{\setminus }J_2}} f_2(x,y)-\frac{\varepsilon }{2}, \qquad x\in [-1,1]^{J_1\cap J_2}. \end{aligned}$$

The function g is Lipschitz continuous on \([-1,1]^{J_1\cap J_2}\). To see why, let \(x,x'\in [-1,1]^{J_1\cap J_2}\) and pick \(y,y'\in \pi _{J_1{{\setminus }}J_2}(S(\textbf{g}))\subseteq [-1,1]^{J_1{{\setminus }}J_2}\) minimizing \(f_2(x,y)\) and \(f_2(x',y')\), respectively. Then

$$\begin{aligned}{} & {} |g(x)-g(x')|=|f_2(x,y)-f_2(x',y')|\\{} & {} \quad \le \max (|f_2(x,y)-f_2(x',y)|,|f_2(x,y')-f_2(x',y')|)\le {{\,\textrm{Lip}\,}}(f_2)|x-x'|, \end{aligned}$$

where \({{\,\textrm{Lip}\,}}(f_2)\) denotes the Lipschitz constant of \(f_2\) on \([-1,1]^{n}\).

The function g also satisfies

$$\begin{aligned} f_1+g\ge \frac{\varepsilon }{2} \quad \text {and}\quad f_2-g\ge \frac{\varepsilon }{2} \end{aligned}$$

on \(S(\textbf{g})\). The second inequality follows from the definition of g, and the first one can be shown taking \((x,y,z)\in S(\textbf{g})\) with \(x\in [-1,1]^{J_1\cap J_2}\), \(y\in [-1,1]^{J_1{\setminus } J_2}\), and \(z\in [-1,1]^{J_2{\setminus }J_1}\), taking care to pick y only after x has been chosen, in such a way that the minimum is in the definition of g is realized there, that is, \(g(x)=f_2(x,y)-\varepsilon /2\) holds (this is possible by compactness of \(S(\textbf{g})\) and continuity of f); then we have

$$\begin{aligned} f_1(x,z)+g(x)=f_1(x,z)+f_2(x,y)-\frac{\varepsilon }{2}=f(x,y,z)-\frac{\varepsilon }{2}\ge \frac{\varepsilon }{2}. \end{aligned}$$

For \(j\in J_1\cap J_2\), let

Set \(m_j=0\) for all other \(0\le j\le n\), and \(\textbf{m}=(m_1,\dots ,m_n)=\bar{D}_{2,2}\). Then Theorem 14 gives a polynomial \(p_2\) such that

$$\begin{aligned} \Vert g-p_2\Vert _\infty = C_{\textrm{Jac}}\sum _{j\in J_1\cap J_2}\frac{{{\,\textrm{Lip}\,}}_jg}{m_j}\le C_{\textrm{Jac}}|J_1\cap J_2|{{\,\textrm{Lip}\,}}(f_2)\frac{2}{D_{2,2}} \le \frac{\varepsilon }{2}-\eta . \end{aligned}$$

Also,

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}p_2\le \textbf{m}=\bar{D}_{2,2}. \end{aligned}$$
(13)

Let

$$\begin{aligned} h_1{:}{=}f_1+p_2\qquad \text {and}\qquad h_2{:}{=}f_2-p_2, \end{aligned}$$

so that \(f=h_1+h_2\), \(h_1\ge \eta \) and \(h_2\ge \eta \) on \(S(\textbf{g})\), and \(h_j\in \mathbb {R}[x_{J_j}]_{}\).

The bound (12) follows from the definition of \(h_j\) and (13). Observe also that, by the last part of Theorem 14,

$$\begin{aligned} {{\,\textrm{Lip}\,}}p_2\le 2{{\,\textrm{Lip}\,}}g\le 2{{\,\textrm{Lip}\,}}(f_2). \end{aligned}$$

Finally, we have

$$\begin{aligned} \Vert p_2\Vert _\infty \le \Vert g\Vert _\infty +\frac{\varepsilon }{2}-\eta \le \Vert f_2\Vert _\infty +\varepsilon -\eta \le 2\Vert f_2\Vert _\infty , \end{aligned}$$
(14)

so

$$\begin{aligned} \Vert h_j\Vert _\infty \le \Vert f_j\Vert _\infty +\Vert p_2\Vert _\infty \le \Vert f_j\Vert _\infty +2\Vert f_2\Vert _\infty \le 3(\Vert f_1\Vert _\infty +\Vert f_2\Vert _\infty ). \end{aligned}$$

For the induction step, let \(\ell \ge 3\) and set \(\tilde{f}=f_1+\dots +f_{\ell -1}-(\ell -2)(\epsilon -\eta )\), so that we have \(f-(\ell -2)(\epsilon -\eta )=\tilde{f}+f_\ell \ge \epsilon \) since \(f\ge \varepsilon =(\ell -1)\epsilon -(\ell -2)\eta \). The proof for the case \(\ell =2\) with \(\varepsilon =\varepsilon _{\ell -1}\) gives a polynomial \(p_\ell \in \mathbb {R}[x_{\mathcal {J}_\ell }]_{}\) such that

$$\begin{aligned} \tilde{f} -p_\ell \ge \eta \quad \text {and}\quad f_\ell +p_\ell \ge \eta \end{aligned}$$

on \(S(\textbf{g})\), and with \({{\,\mathrm{\overline{\deg }}\,}}p_\ell =\bar{D}_{\ell ,\ell }\), \({{\,\textrm{Lip}\,}}p_\ell \le 2{{\,\textrm{Lip}\,}}f_\ell \).and, analogously to (14),

$$\begin{aligned} \Vert p_\ell \Vert _\infty \le 2\Vert f_\ell \Vert _\infty . \end{aligned}$$
(15)

Write

$$\begin{aligned} f_1'+\dots +f'_{\ell -1}=f_1+\dots +f_{\ell -1}-p_\ell , \end{aligned}$$

where \(f_j'=f_j-p_\ell \) for the largest j with \(\mathcal {J}_\ell \subset J_j\) (which must happen for some j, by (RIP)) and \(f_k'=f_k\) for all other \(k\ne j\). Thus \(f'_j\in \mathbb {R}[x_{J_j}]_{}\),

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}f_j'\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,{{\,\mathrm{\overline{\deg }}\,}}p_\ell )=\max ({{\,\mathrm{\overline{\deg }}\,}}f_j,\bar{D}_{\ell ,\ell }), \qquad {{\,\textrm{Lip}\,}}f_j'\le {{\,\textrm{Lip}\,}}f_j+{{\,\textrm{Lip}\,}}p_\ell , \end{aligned}$$
(16)

The induction hypothesis applies to the polynomial

$$\begin{aligned} f_1'+\dots +f_{\ell -1}'+(\ell -2)(\epsilon -\eta )=\tilde{f}+(\ell -2)(\epsilon -\eta )-p_\ell \ge (\ell -2)\epsilon -(\ell -3)\eta . \end{aligned}$$

This means that there are polynomials \(h_1,\dots ,h_{\ell -1}\) such that

  • \(f_1'+\dots +f_{\ell -1}'=f_1+\dots +f_{\ell -1}-p_\ell =h_1+\dots +h_{\ell -1}\),

  • \(h_j\in \mathbb {R}[x_{J_j}]_{}\) for all \(1\le j\le \ell -1\),

  • \(h_j\ge \eta \) for all \(1\le j\le \ell -1\),

  • We have, for all \(1\le j\le \ell -1\),

    $$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}h_j&\le \max ({{\,\mathrm{\overline{\deg }}\,}}f'_j,\bar{D}_{j,\ell }\dots ,\bar{D}_{\ell -1,\ell })\\&\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,{{\,\mathrm{\overline{\deg }}\,}}p_\ell ,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell -1,\ell })\\&= \max ({{\,\mathrm{\overline{\deg }}\,}}f_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell }). \end{aligned}$$

    Observe that the second index in each \(\bar{D}_{k,\ell }\) is \(\ell \) because of the accumulation of Lipschitz constants resulting from the estimate (16).

  • We have, for all \(1\le j\le \ell -1\), again because of (16),

    $$\begin{aligned} {{\,\textrm{Lip}\,}}h_j\le 3\sum _{k=j}^\ell {{\,\textrm{Lip}\,}}f_k. \end{aligned}$$
  • We have, for all \(1\le j\le \ell -1\), using (15),

    $$\begin{aligned}{} & {} \Vert h_j\Vert _\infty \le 3\times 2^{\ell -2}\sum _{k=1}^{\ell -1} \Vert f'_k\Vert _\infty \le 3\times 2^{\ell -2} \left( \sum _{k=1}^{\ell -1} \Vert f_k\Vert _\infty +\Vert p_\ell \Vert _\infty \right) \\{} & {} \le 3\times 2^{\ell -1}\sum _{k=1}^\ell \Vert f_k\Vert _\infty . \end{aligned}$$

Let

$$\begin{aligned} h_\ell =f_\ell +p_\ell . \end{aligned}$$

Then again \(f_1+\dots +f_\ell =h_1+\dots +h_\ell \), \(h_\ell \in \mathbb {R}[x_{J_\ell }]_{}\), \(h_\ell \ge \eta \) on \(S(\textbf{g})\), \({{\,\mathrm{\overline{\deg }}\,}}h_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}f_\ell ,\bar{D}_{\ell ,\ell })\), \({{\,\textrm{Lip}\,}}h_\ell \le {{\,\textrm{Lip}\,}}f_\ell +{{\,\textrm{Lip}\,}}p_\ell \le 3{{\,\textrm{Lip}\,}}f_\ell \), \(\Vert h_\ell \Vert _\infty \le \Vert f_\ell \Vert _\infty +\Vert p_\ell \Vert \le 3\Vert f_\ell \Vert \le 3\times 2^{\ell -1} \sum _{j=1}^\ell \Vert f_j\Vert _\infty \), so the lemma is proven. \(\square \)

4 Proofs

4.1 Proof of Theorem 6

Overview Theorem 6 follows from Theorem 17, which presents a more detailed bound, together with the definitions of \(\overline{L},M,\overline{J}\). To prove the latter theorem, we first use the sparse approximation theory developed in Sect. 3 to represent the sparse polynomial p as a sum of positive polynomials \(h_1+\dots +h_{\ell }\), each of them depending on a clique of variables \(J_j\). We then use Corollary 13 to see that each \(h_j\) belongs to the preordering.

Theorem 17

Let \(n>0\) and \(\ell \ge 2\), and let \({\textbf{r}}_1,{\textbf{r}}_2,\dots ,{\textbf{r}}_\ell \in \mathbb {N}^n\), \({\textbf{r}}_j=(r_{j,1},\dots ,r_{j,n})\), be nowhere-vanishing multi-indices. Let also \(J_1,\dots ,J_\ell \) be subsets of \(\{1,\dots ,n\}\) satisfying (RIP). Let \(p=p_1+p_2+\dots +p_\ell \) be a polynomial that is the sum of finitely many polynomials \(p_j\in \mathbb {R}[x_{J_j}]_{{\textbf{r}}_j}\). Then if \(p\ge \varepsilon \) on \([-1,1]^n\), we have

$$\begin{aligned} p\in \mathcal P_{{\textbf{r}}_1,J_1}(\{1-x_i^2\}_{i\in J_1})+\dots +\mathcal P_{{\textbf{r}}_\ell ,J_\ell }(\{1-x_i^2\}_{i\in J_\ell }) \end{aligned}$$

as long as, for all \(1\le j\le \ell \) and all \(1\le i\le n\),

$$\begin{aligned} (r_{j,k}+2)^2&\ge \frac{ 2^{\frac{|J_j|}{2}+2}(\ell +2)\Vert p\Vert _\infty n\pi ^2}{\varepsilon }\nonumber \\&\qquad \cdot \prod _{1\le m\le n}\left( \max \left[ ({{\,\mathrm{\overline{\deg }}\,}}p_j)_m,\max _{\begin{array}{c} j\le l\le \ell \\ m\in \mathcal {J}_l \end{array}}\frac{4C_{\textrm{Jac}}(\ell +2)|\mathcal {J}_l|\sum _{t=l}^\ell {{\,\textrm{Lip}\,}}p_t}{\varepsilon } \right] +2\right) \nonumber \\&\qquad \cdot \max _{l\in J_j}\left[ ({{\,\mathrm{\overline{\deg }}\,}}p_j)_l,\max _{\begin{array}{c} j\le q\le \ell \\ l\in \mathcal {J}_q \end{array}}\frac{4C_{\textrm{Jac}}(\ell +2)|\mathcal {J}_q|\sum _{t=q}^\ell {{\,\textrm{Lip}\,}}p_t}{\varepsilon } \right] ^{2}, \end{aligned}$$
(17)

and

$$\begin{aligned} {(r_{j,i}+2)^2}\ge 2\pi ^2n\max \left[ \max _{1\le m\le n}({{\,\mathrm{\overline{\deg }}\,}}p_j)_m,\max _{j\le k\le \ell }\frac{4C_{\textrm{Jac}}(\ell +2)|\mathcal {J}_k|\sum _{t=k}^\ell {{\,\textrm{Lip}\,}}p_t}{\varepsilon } \right] ^2 . \end{aligned}$$
(18)

Proof of Theorem 17

Let

$$\begin{aligned} \epsilon = \frac{\varepsilon +(\ell -2)\eta }{\ell }\qquad \text {and}\qquad \eta =\frac{\varepsilon }{2(\ell +2)}. \end{aligned}$$
(19)

Apply Lemma 15 with \(\textbf{g}=0\), so that \(S(\textbf{g})=[-1,1]^n\). From the lemma, we get polynomials \(h_1,\dots ,h_\ell \) with

$$\begin{aligned}{} & {} \quad h_j\in \mathbb {R}[x_{J_j}]_{},\\{} & {} \quad p=h_1+\dots +h_\ell ,\\{} & {} \quad h_j\ge \eta \text { on }[-1,1]\text { for }1\le j\le \ell ,\\{} & {} \quad {{\,\mathrm{\overline{\deg }}\,}}h_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}p_j, \bar{D}_j,\bar{D}_{j+1},\dots ,\bar{D}_\ell ). \end{aligned}$$

Here,

since \(\varepsilon _j-2\eta =\eta (\ell +2)/2\ell \). We also set

$$\begin{aligned} D_1:=D_2. \end{aligned}$$

Thus

$$\begin{aligned} |\bar{D}_l|=|\mathcal {J}_l|D_l\quad \text {for}\quad 2\le l\le \ell . \end{aligned}$$

Apply Corollary 13 to each of the polynomials

$$\begin{aligned} H_j=\frac{h_j-\min _{[-1,1]^n}h_j}{\max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j} \end{aligned}$$

to see that, for

$$\begin{aligned} \varepsilon _j\ge 2n\pi ^2\left( \prod _{1\le k\le n}\left( ({{\,\mathrm{\overline{\deg }}\,}}H_j)_k+1\right) \right) \max _{I\in {\mathcal I_{H_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) , \end{aligned}$$
(20)

(recall that \({\mathcal I_{H_j}}\) is the set of multiindices \(I=(i_1, \dots ,i_n)\) corresponding to exponents of \(x_1, \dots ,x_n\) in the terms appearing in \(H_j\) and \(w(I)\) is the number of nonzero entries in I) we have

$$\begin{aligned} H_j+\epsilon _j\in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j}); \end{aligned}$$
(21)

when applying the corollary, note that (18) implies (9) in this case because, if \(I=(i_1,\dots ,i_n)\in {\mathcal I_{H_j}}\), then

$$\begin{aligned} i_k\le & {} ({{\,\mathrm{\overline{\deg }}\,}}H_j)_k\le ({{\,\mathrm{\overline{\deg }}\,}}h_j)_k \le \max ({{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_j,\dots ,\bar{D}_\ell )\\\le & {} \max \left[ \max _{1\le m\le n}({{\,\mathrm{\overline{\deg }}\,}}p_j)_m,\max _{j\le k\le \ell }\frac{4C_{\textrm{Jac}}(\ell +2)|\mathcal {J}_k|\sum _{t=k}^\ell {{\,\textrm{Lip}\,}}p_t}{\varepsilon }\right] , \end{aligned}$$

by the definition of \(\bar{D}_l\). Observe that (21) means also that

$$\begin{aligned} h_j-\min _{[-1,1]^n}h_j+\epsilon _j\left( \max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j\right) \in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j}). \end{aligned}$$
(22)

Note that we have \({{\,\mathrm{\overline{\deg }}\,}}H_j={{\,\mathrm{\overline{\deg }}\,}}h_j\), \({\mathcal I_{H_j}}{\setminus }{\mathcal I_{h_j}}=\emptyset \), and \({\mathcal I_{h_j}}{\setminus }{\mathcal I_{H_j}}\subseteq \{(0,\dots ,0)\}\) since the powers of all terms in \(h_j\) and in \(H_j\) are the same, with the only possible exception of the constant term, which may appear in one of these and vanish in the other. Now, going back to our choice (19) of \(\eta \) and using (17), we have

for all \(j \in \{1,\ldots ,\ell \}\). Notice that after separating two of the \(|J_j|+2\) terms in the product and removing the \(+1\) factor from them, we obtain

where we have used the definition of \(\bar{D}_l\), as well as the fact that each factor has been replaced by one that is smaller or equal, the original expression containing the maximum of them on each factor. Next, use \({{\,\mathrm{\overline{\deg }}\,}}H_j\le \max ({{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_j,\dots , \bar{D}_\ell )\) as well as \(w(I)\le |J_j|\) for every multi-index I in \({\mathcal I_{H_j}}\), which is true because \(H_j\in \mathbb {R}[x_{J_j}]_{}\), yielding

$$\begin{aligned} \eta&\ge \Vert p\Vert _\infty \left( 2n\pi ^2\left( \prod _{1\le k\le n}\left( \left( {{\,\mathrm{\overline{\deg }}\,}}H_j\right) _k+1\right) \right) \max _{I\in {\mathcal I_{H_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) \right) \\&\ge \left( \max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j\right) \\&\qquad \cdot \left( 2n\pi ^2\left( \prod _{1\le k\le n}\left( ({{\,\mathrm{\overline{\deg }}\,}}H_j)_k+1\right) \right) \max _{I\in {\mathcal I_{H_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) \right) , \end{aligned}$$

since we have \(\max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j\le \Vert p\Vert _\infty \). With this bound for \(\eta \), together with the fact that \(\min _{[-1,1]^n}h_j\ge \eta \), we get

$$\begin{aligned} h_j&\ge h_j-\min _{[-1,1]^n}h_j+\eta \\&\ge h_j-\min _{[-1,1]^n}h_j +\left( \max _{[-1,1]^n}h_j-\min _{[-1,1]^n}h_j\right) \\&\qquad \cdot \left( 2n\pi ^2\left( \prod _{1\le k\le n}\left( ({{\,\mathrm{\overline{\deg }}\,}}H_j)_k+1\right) \right) \max _{I\in {\mathcal I_{H_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) \right) , \end{aligned}$$

so that, by (20) and (22), \(h_j\in \mathcal P_{{\textbf{r}},J_j}(\{1-x_i^2\}_{i\in J_j})\) and hence

$$\begin{aligned} p=h_1+\dots +h_\ell \in \mathcal P_{{\textbf{r}},J_1}(\{1-x_i^2\}_{i\in J_1})+\dots +\mathcal P_{{\textbf{r}},J_\ell }(\{1-x_i^2\}_{i\in J_\ell }). \end{aligned}$$

\(\square \)

4.2 Proof of Theorem 8

Overview For this proof, we first use the sparse approximation theory developed in Sect. 3 to represent the sparse polynomial p as a sum of positive polynomials \(h_1+\dots +h_{\ell }\), each of them depending on a clique of variables \(J_j\). We then work with each of these polynomials \(h_j\) using the tools developed by Baldi–Mourrain [2] to write \(h_j={\hat{f}}_j+\hat{q}_j\), where \(\hat{q}_j\) is by construction obviously an element of the corresponding quadratic module, and \({\hat{f}}_j\) is strictly positive on \([-1,1]^n\). Thus Corollary 13 can be applied to \({\hat{f}}_j\), which shows that it belongs to the preordering, and then one argues (also following the ideas of [2]) that the preordering is contained in the quadratic module, hence giving that \({\hat{f}}_j\) is contained in the latter as well. In sum, this shows that \(h_j\) is in the quadratic module, which is what want. Most of the heavy lifting goes to estimating the minimum of \({\hat{f}}_j\) to justify the application of Corollary 13.

Proof of Theorem 8

For each \(j=1,\dots , \ell \), pick \({\textbf{C}_{j}}>0\) such that the following two bounds are satisfied:

$$\begin{aligned} {\textbf{C}_{j}}&\ge 2\pi ^2|J_j|^{1+\frac{16{\mathsf L}_j}{3}}{C_d}^2 C_{\textrm{Jac}}^{\frac{16{\mathsf L}_j}{3}}2^{1+2(4+3\frac{8}{3}){\mathsf L}_j}3^{\frac{(16+8\ell ) {\mathsf L}_j+2}{3}}{\bar{k}}^{-\frac{2}{3}}{\mathsf c}_j^{\frac{8}{3}}(\max _{i\in K_j}\deg g_i)^2(2(\ell +2))^{8{\mathsf L}_j},\end{aligned}$$
(23)
$$\begin{aligned} {\textbf{C}_{j}}&\ge {C_f}(C_{\textrm{Jac}}{C_m})^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})} |J_j|\pi ^22^{ 4{\mathsf L}_j+\frac{|J_j|}{2}+1+(1+\frac{4{\mathsf L}_j+1}{3})(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3}) }\nonumber \\&\qquad \times 3^{ \ell ({\mathsf L}_j+1)+(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3}) }(\ell +2)^{1+{\mathsf L}_j+\frac{4{\mathsf L}_j+1}{3}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}{\bar{k}}\nonumber \\&\qquad \times {\mathsf c}_j^{ 1+ \frac{3}{4}(2{\mathsf L}_j + |J_j|+2)(1+\frac{8{\mathsf L}_j}{3})} \left( \sum _{i=j}^\ell |\mathcal {J}_i|^{2(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\right) \nonumber \\&\qquad \cdot (\max _{k\in K_j}\deg g_k+1)^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}. \end{aligned}$$
(24)

Note that these only depend on \(\textbf{g}\) and \(J_1,\dots ,J_\ell \).

Apply Lemma 15 to \(f=p\), \(f_i=p_i\), \(\epsilon =3\varepsilon /2\ell \), \(\eta =\varepsilon /2(\ell +2)\) to get polynomials \(h_1,\dots , h_\ell \) such that

$$\begin{aligned} p=h_1+\dots +h_\ell ,\quad h_i\in \mathbb {R}[x_{J_i}]_{},\quad h_i(x)\ge \eta =\frac{\varepsilon }{2(\ell +2)} \;\text {for}\;x\in S(\textbf{g}), \end{aligned}$$
(25)

and

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}h_i\le \max ({{\,\mathrm{\overline{\deg }}\,}}p_i,\bar{D}_{i,\ell },\dots ,\bar{D}_{\ell ,\ell })_{J_i}. \end{aligned}$$
(26)

In the dense setting, Baldi–Mourrain [2] construct a family of single-variable polynomials

$$\begin{aligned} ({\mathsf h}_{t,m})_{(t,m)\in \mathbb {N}\times \mathbb {N}} \end{aligned}$$

providing useful approximation properties that we have adapted to the (separated-variables) sparse setting and collected in Lemma 18. To state this, we set, for all \(j=1,\dots ,\ell \) and for \((t_j,m_j)\in \mathbb {N}\times \mathbb {N}\) as well as for \(s_j>0\),

$$\begin{aligned}{} & {} q_{j, t_j, m_j}(x):=\sum _{i\in K_j}{\mathsf h}_{t_i,m_i}\left( g_i(x)\right) ^2g_i(x), \end{aligned}$$
(27)
$$\begin{aligned}{} & {} f_{j,s_j, t_j, m_j}(x):=h_j(x)-s_j \,q_{j, t_j,m_j}(x). \end{aligned}$$
(28)

Let us give an idea of what these functions do. The single-variable polynomial \({\mathsf h}_{t_j,m_j}\) is of degree \(m_j\) and roughly speaking approximates the function that equals 1 on \((-\infty ,0)\) and \(1/{t_j}\) elsewhere. Thus \(q_{j,t_j,m_j}\) almost vanishes (for large \(t_j\)) on \(S(\textbf{g}_{K_j})\), and outside of this domain it is roughly a sum of multiples of the negative parts of \(\textbf{g}_{K_j}\)’s entries. The definition of \(f_{j,s_j,t_j,m_j}\) is engineered to obtain a polynomial that is almost equal to \(h_j\) in \(S(\textbf{g}_{K_j})\) yet remains positive throughout \([-1,1]^n\). Instead of going into the details of the construction, we record the properties we need in the following lemma. \(\square \)

Lemma 18

(a version of [2, Props. 2.13, 3.1, and 3.2, Lem. 3.5]) Assume (3) and the Archimedean conditions (4) are satisfied. Then for each \(j=1,\dots ,\ell \) there are values \(s_j,t_j, m_j\) of the parameters involved in Definition (27) and Definition (28), such that the following holds with the shorthands

$$\begin{aligned} {\hat{f}}_j=f_{j,s_j,t_j,m_j}\qquad \text {and}\qquad {\hat{q}}_j=s_jq_{j,t_j,m_j}: \end{aligned}$$
(29)
  1. i.

    [2, Prop. 3.1] gives

    $$\begin{aligned} {\hat{f}}_j(x)\ge \frac{1}{2}\min _{y\in S(\textbf{g}_{K_j})}h_j(y)\ge \frac{\eta }{2}=\frac{\varepsilon }{4(\ell +2)}\quad \text {for all}\quad x\in [1,1]^n. \end{aligned}$$
  2. ii.

    We have \(\hat{q}_j\in \mathcal Q_{\textbf{r},J_j}(\textbf{g})\) for all multi-indices \(\textbf{r}=(r_1,\dots ,r_n)\) with

    $$\begin{aligned} r_i\ge (2m_{j}+ 1)\max _{k\in K_j}({{\,\mathrm{\overline{\deg }}\,}}g_k)_i. \end{aligned}$$
  3. iii.

    [2, eq. (20)] gives the existence of a constant \({C_m}>0\) such that

    $$\begin{aligned} m_{j}\le {C_m}{\mathsf c}_j^{\frac{4}{3}}{\bar{k}}^{\frac{1}{3}}2^{4{\mathsf L}_j}( \deg h_j )^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}. \end{aligned}$$
  4. iv.

    [2, eq. (16)] gives the existence of a constant \({C_f}>0\) such that

    $$\begin{aligned} \Vert {\hat{f}}_{j}\Vert _\infty \le {C_f}\Vert h_j\Vert _\infty 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j(\deg h_j)^{2{\mathsf L}_j}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-{\mathsf L}_j}. \end{aligned}$$
  5. v.

    [2, eq. (17)] gives the existence of a constant \({C_d}>0\) such that

    $$\begin{aligned} \deg {\hat{f}}_{j}\le {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)(\deg h_j)^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}. \end{aligned}$$

Item (ii.) followsFootnote 5 from \(\deg {\mathsf h}_{t_j,m_j}=m_j\) and the definition of \(q_{j,t_j,m_j}\). The proofs of the other items can be found in the indicated sources.

Take \(s_j,t_j,m_j,{\hat{f}}_j, \hat{q}_j\) for \(j=1,\dots ,\ell \) satisfying the properties (i)–(v) collected in Lemma 18.

Continuing with the proof of Theorem 8, denote

$$\begin{aligned} F_j{:}{=}\frac{{\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}}{\max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}}. \end{aligned}$$

Since \({\hat{f}}_{j}\ge \varepsilon /4(\ell +2)\) on \([-1,1]^n\), we may apply Corollary 13 with \(p=F_{j}\) to get that

$$\begin{aligned} F_j+\epsilon _j\in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j}) \end{aligned}$$
(30)

as long as

$$\begin{aligned} \epsilon _j\ge 2|J_j|\pi ^2\left( \prod _{i\in J_j}(({{\,\mathrm{\overline{\deg }}\,}}F_j)_i+1)\right) \max _{I=(i_1,\dots ,i_n)\in {\mathcal I_{F_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) , \end{aligned}$$
(31)

and (9) are verified.

In this context, the condition (9) required in Corollary 13 is equivalent to the theorem’s assumption (6); let us show how this works: First, using \(i_k\le ({{\,\mathrm{\overline{\deg }}\,}}F_j)_k\), \({{\,\mathrm{\overline{\deg }}\,}}F_j\le {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_j\le (\deg {\hat{f}}_j)\textbf{1}\), Lemma 18(v.), we get

$$\begin{aligned} 2\pi ^2 |J_j|i_k^2\le & {} 2\pi ^2|J_j|\left( {{\,\mathrm{\overline{\deg }}\,}}F_j\right) ^2_k\le 2\pi ^2|J_j|(\deg {\hat{f}}_j)^2\\\le & {} 2\pi ^2|J_j|\left( {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)(\deg h_j)^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}\right) ^2. \end{aligned}$$

Now use Eq. (26) to get that this is

$$\begin{aligned}&\le 2\pi ^2|J_j| \\&\qquad \cdot \left( {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)\max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell }\right) ^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}\right) ^2 \\&\le 2\pi ^2|J_j|\left( {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)\max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell }\right) ^{\frac{8{\mathsf L}_j}{3}}\left( \frac{3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)}\right) ^{\frac{4{\mathsf L}_j+1}{3}}\right) ^2, \end{aligned}$$

where we have also used the fact that

$$\begin{aligned} \Vert h_j\Vert _\infty \le 3\times 2^{\ell -1}\sum _{i=1}^\ell \Vert p_i\Vert _\infty \le 3^\ell \sum _{i=1}^\ell \Vert p_i\Vert _\infty , \end{aligned}$$
(32)

and the last estimate from (25). Next, use (11), \(|\mathcal {J}_j|\le |J_j|\), \(\varepsilon _i-2\eta _i=\frac{\ell +6}{2\ell (\ell +2)}\varepsilon \) to get

$$\begin{aligned}&2\pi ^2|J_j|i^2_k \\&\quad \le 2\pi |J_j| \left( {C_d}2^{4{\mathsf L}_j}{\bar{k}}^{-\frac{1}{3}}{\mathsf c}_j^{\frac{4}{3}}(\max _{i\in K_j}\deg g_i)(\deg p_j)^{\frac{8{\mathsf L}_j}{3}}\left( \frac{2C_{\textrm{Jac}}|J_j|(3\sum _{i=1}^\ell {{\,\textrm{Lip}\,}}p_i)}{\frac{\ell +6}{2\ell (\ell +2)}\varepsilon }\right) ^{\frac{8{\mathsf L}_j}{3}}\right. \\&\qquad \ \cdot \left. \left( \frac{ 3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)}\right) ^{\frac{4{\mathsf L}_j+1}{3}}\right) ^2 \\&\quad \le {\textbf{C}_{j}}\left( \left( \sum _{i=1}^\ell \Vert p_i\Vert _\infty \right) ^{\frac{4{\mathsf L}_j+1}{3}}(\deg p_j)^{\frac{8{\mathsf L}_j}{3}}\frac{\left( \sum _{i=1}^{\ell }{{\,\textrm{Lip}\,}}p_i\right) ^{\frac{8{\mathsf L}_j}{3}}}{\varepsilon ^{\frac{(8+4){\mathsf L}_j+1}{3}}}\right) ^2\le (r_{j,k}+2)^2, \end{aligned}$$

where we have additionally used Eq. (23) and our assumption (6); this is precisely (9).

We would next like to show that

$$\begin{aligned} {\hat{f}}_{j}\in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j}). \end{aligned}$$
(33)

Let us first explain why this will be enough to prove the theorem. Once we have (33), by Lemma 19\({\hat{f}}_j\) is also contained in \(\mathcal Q_{\textbf{r}_j+\textbf{2},J_j}(1-\sum _{i\in J_j}x_i^2)\), and it is our assumption (4) that \(\mathcal Q_{\textbf{r}_j+\textbf{2},J_j}(1-\sum _{i\in J_j}x_i^2)\subseteq \mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j})\). In other words, we have

$$\begin{aligned} {\hat{f}}_j\in \mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j}). \end{aligned}$$

By Lemma 18(ii.), \(\hat{q}_{j}\) also belongs to \(\mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j})\), so we can conclude that

$$\begin{aligned} h_j\in \mathcal Q_{{\textbf{r}}_j+\textbf{2},J_j}(\textbf{g}_{K_j}), \end{aligned}$$

which is equivalent to the conclusion of the theorem.

Thus we need to prove (33). Let us first show that

$$\begin{aligned} \frac{\eta }{2}= & {} \frac{\varepsilon }{4(\ell +2)} \ge (\max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}) \;2|J_j|\pi ^2\left( \prod _{i\in J_j}(({{\,\mathrm{\overline{\deg }}\,}}F_j)_i+1)\right) \nonumber \\{} & {} \times \max _{I=(i_1,\dots ,i_n)\in {\mathcal I_{F_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) \end{aligned}$$
(34)

implies (33) Observe that (30) is equivalent to

$$\begin{aligned} {\hat{f}}_j-\min _{[-1,1]^n}{\hat{f}}_j+\epsilon _j(\max _{[-1,1]^n}{\hat{f}}_j-\min _{[-1,1]^n}{\hat{f}}_j)\in \mathcal P_{{\textbf{r}}_j ,J_j}(\{1-x_i^2\}_{i\in J_j}). \end{aligned}$$
(35)

If (34) were true, we would then have

$$\begin{aligned} {\hat{f}}_j&\ge {\hat{f}}_j-\min _{[-1,1]^n}{\hat{f}}_j+\frac{\varepsilon }{4(\ell +2)}\\&\ge {\hat{f}}_j-\min _{[-1,1]^n}{\hat{f}}_j + (\max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}) \;2|J_j|\pi ^2\left( \prod _{i\in J_j}(({{\,\mathrm{\overline{\deg }}\,}}F_j)_i+1)\right) \\&\quad \times \max _{I=(i_1,\dots ,i_n)\in {\mathcal I_{F_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) . \end{aligned}$$

So in view of (31) and (35), we would indeed have \({\hat{f}}_j\in \mathcal P_{{\textbf{r}}_j,J_j}(\{1-x_i^2\}_{i\in J_j})\), which is (33).

Let us now collect some preliminary estimates that will help us to prove (34). For \(I\in {\mathcal I_{F_j}}\) we have \(w(I)\le |J_j|\) so we estimate

$$\begin{aligned} 2^{w(I)/2}\le 2^{|J_j|/2}. \end{aligned}$$
(36)

We also estimate

$$\begin{aligned} \frac{i^2_k}{(r_{j,k}+2)^2}\le \frac{\left( {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_{j}\right) ^2_k}{\min _{1\le l\le n}(r_{j,l}+2)^2}. \end{aligned}$$
(37)

Now we will estimate \( \max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}\) from above. Using Lemma 18(i.) and (iv.), we get

$$\begin{aligned} \max _{[-1,1]^n} {\hat{f}}_{j} -\min _{[-1,1]^n} {\hat{f}}_{j}&\le {C_f}\Vert h_j\Vert _\infty 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j(\deg h_j)^{2{\mathsf L}_j}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-{\mathsf L}_j}\\&\quad -\frac{\varepsilon }{4(\ell +2)}. \end{aligned}$$

Use (25), (26) and (32) to see that this is

$$\begin{aligned}&\max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}\nonumber \\&\quad \le {C_f}\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j\max (\deg p_j,|\bar{D}_{j,\ell }|,\dots ,|\bar{D}_{\ell ,\ell }|)^{2{\mathsf L}_j}\nonumber \\&\qquad \cdot \left( \frac{\varepsilon }{2(\ell +2)3^\ell \sum _i\Vert p_i\Vert _\infty }\right) ^{-{\mathsf L}_j}\nonumber \\&\quad = {C_f}\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j\max (\deg p_j,|\mathcal {J}_j| D_{j,\ell },\dots ,|\mathcal {J}_\ell | D_{\ell ,\ell })^{2{\mathsf L}_j}\nonumber \\&\qquad \cdot \left( \frac{\varepsilon }{2(\ell +2)3^\ell \sum _i\Vert p_i\Vert _\infty }\right) ^{-{\mathsf L}_j}. \end{aligned}$$
(38)

For the last line, we have used the definition of \(\bar{D}_{l,m}\) as in Lemma 15.

Additionally, we obtain the following estimate

$$\begin{aligned} {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_{j}&\le \nonumber \max \left( {{\,\mathrm{\overline{\deg }}\,}}h_j,{{\,\mathrm{\overline{\deg }}\,}}\hat{q}_{j} \right) \nonumber \\&\le \max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell },(2m_j+1)\max _{k\in K_j}{{\,\mathrm{\overline{\deg }}\,}}g_k\right) \nonumber \\&\le \max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell },\right. \nonumber \\&\left. \nonumber \qquad \left( 2\left( {C_m}{\mathsf c}_j^{\frac{3}{4}}{\bar{k}}^{-\frac{1}{3}}2^{4{\mathsf L}_j}( \deg h_j )^{\frac{8{\mathsf L}_j}{3}}\left( \frac{\min _{x\in S(\textbf{g})}h_j(x)}{\Vert h_j\Vert _\infty }\right) ^{-\frac{4{\mathsf L}_j+1}{3}}\right) +1\right) \max _{k\in K_j}{{\,\mathrm{\overline{\deg }}\,}}g_k\right) \\&\le \max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell },\right. \nonumber \\&\left. \qquad \left( {C_m}2^{4{\mathsf L}_j+1}{\mathsf c}^{\frac{3}{4}}{\bar{k}}^{-\frac{1}{3}}(\deg h_j)^{\frac{8{\mathsf L}_j}{3}}\left( \frac{3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)} \right) ^{\frac{4{\mathsf L}_j+1}{3}}+1\right) \max _{k\in K_j} {{\,\mathrm{\overline{\deg }}\,}}g_k\right) . \end{aligned}$$
(39)

The first inequality comes from (28), the second one from (26) and Lemma 18(ii.), the third one from Lemma 18(iii.), and the last one from (25), (32), and (26). Compare with Lemma 18(v.).

With those estimates under our belt, we now turn to showing that (34) is true. Using (38), (36), (37), as well as \({{\,\mathrm{\overline{\deg }}\,}}F_j\le {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_j\), we can start to estimate the right-hand side of (34) by

$$\begin{aligned}&(\max _{[-1,1]^n} {\hat{f}}_{j}-\min _{[-1,1]^n} {\hat{f}}_{j}) \;2|J_j|\pi ^2\left( \prod _{i\in J_j}(({{\,\mathrm{\overline{\deg }}\,}}F_j)_i+1)\right) \\&\qquad \quad \times \max _{I=(i_1,\dots ,i_n)\in {\mathcal I_{F_j}}}\left( 2^{w(I)/2}\max _{1\le k\le n}\frac{i_k^2}{(r_{j,k}+2)^2}\right) \\&\qquad \le {C_f}\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) 2^{3{\mathsf L}_j}{\bar{k}}{\mathsf c}_j\max (\deg p_j,|\mathcal {J}_j| D_{j,\ell },\dots ,|\mathcal {J}_\ell | D_{\ell ,\ell })^{2{\mathsf L}_j}\\&\qquad \quad \times \left( \frac{\varepsilon }{2(\ell +2)3^\ell \sum _i\Vert p_i\Vert _\infty }\right) ^{-{\mathsf L}_j}\\&\qquad \quad \times 2|J_j|\pi ^2\left( \max _{1\le i\le n}({{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_{j})_i+1\right) ^{|J_j|}\frac{2^{|J_j|/2}\max _{1\le i\le n}\left( {{\,\mathrm{\overline{\deg }}\,}}{\hat{f}}_{j}\right) ^2_{i}}{\min _{1\le k\le n}(r_{j,k}+2)^2}. \end{aligned}$$

Next, denote

$$\begin{aligned} m_{j,\ell }=\max (\deg p_j,|\mathcal {J}_j|D_{j,\ell },\dots ,|\mathcal {J}_\ell |D_{\ell ,\ell }), \end{aligned}$$

This will help us to reorganize and consolidate the terms. Use (39) to see that this is

$$\begin{aligned}\le & {} \varepsilon ^{-{\mathsf L}_j} {C_f}|J_j|\pi ^2\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}2^{4{\mathsf L}_j+\frac{|J_j|}{2}+1}(\ell +2)^{{\mathsf L}_j}{\bar{k}}{\mathsf c}_jm_{j,\ell }^{2{\mathsf L}_j}\\{} & {} \times \left( \max _{1\le i\le n}({{\,\mathrm{\overline{\deg }}\,}}f_{j})_i+1\right) ^{|J_j|+2}\frac{1}{\min _{1\le k\le n}(r_{j,k}+2)^2}\\\le & {} \varepsilon ^{-{\mathsf L}_j} {C_f}|J_j|\pi ^2\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}2^{4{\mathsf L}_j+\frac{|J_j|}{2}+1}(\ell +2)^{{\mathsf L}_j}{\bar{k}}{\mathsf c}_jm_{j,\ell }^{2{\mathsf L}_j}\\{} & {} \times \left( \max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell },\right. \right. \\{} & {} \left( {C_m}2^{4{\mathsf L}_j+1}{\mathsf c}^{\frac{3}{4}}{\bar{k}}^{-\frac{1}{3}} ( \deg h_j )^{\frac{8{\mathsf L}_j}{3}}\left( \frac{3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)}\right) ^{\frac{4{\mathsf L}_j+1}{3}}+1\right) \\{} & {} \left. \left. \times \max _{k\in K_j}{{\,\mathrm{\overline{\deg }}\,}}g_k\right) +1\right) ^{|J_j|+2}\frac{1}{\min _{1\le k\le n}(r_{j,k}+2)^2}\\\le & {} \varepsilon ^{-{\mathsf L}_j} {C_f}|J_j|\pi ^2\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}2^{4{\mathsf L}_j+\frac{|J_j|}{2}+1}(\ell +2)^{{\mathsf L}_j}{\bar{k}}{\mathsf c}_jm_{j,\ell }^{2{\mathsf L}_j}\\{} & {} \times \left( \max \left( {{\,\mathrm{\overline{\deg }}\,}}p_j,\bar{D}_{j,\ell },\dots ,\bar{D}_{\ell ,\ell },\right. \right. \\{} & {} \left( {C_m}2^{4{\mathsf L}_j+1}{\mathsf c}^{\frac{3}{4}}{\bar{k}}^{-\frac{1}{3}} m_{j,\ell }^{\frac{8{\mathsf L}_j}{3}}\left( \frac{3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)}\right) ^{\frac{4{\mathsf L}_j+1}{3}}+1\right) \\{} & {} \left. \left. \times \max _{k\in K_j}{{\,\mathrm{\overline{\deg }}\,}}g_k\right) +1\right) ^{|J_j|+2}\frac{1}{\min _{1\le k\le n}(r_{j,k}+2)^2}. \end{aligned}$$

Now use (11) as well as \(\varepsilon _i-2\eta _i=\frac{\ell +6}{2\ell (\ell +2)}\varepsilon \) to see that the above is bounded by

$$\begin{aligned}&\varepsilon ^{-{\mathsf L}_j} {C_f}|J_j|\pi ^2\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}2^{4{\mathsf L}_j+\frac{|J_j|}{2}+1}(\ell +2)^{{\mathsf L}_j}{\bar{k}}{\mathsf c}_j\\&\quad \times \Big (\max \Biggl [\deg p_j,|\mathcal {J}_j|^2 \frac{2\ell (\ell +2)}{\ell +6}\frac{2C_{\textrm{Jac}}\sum _{k=j}^\ell {{\,\textrm{Lip}\,}}{\hat{f}}_k}{\varepsilon }+1,\dots ,\\&\qquad |\mathcal {J}_\ell |^2 \frac{2\ell (\ell +2)}{\ell +6}\frac{2C_{\textrm{Jac}}{{\,\textrm{Lip}\,}}{\hat{f}}_\ell }{\varepsilon }+1,\\&\qquad \quad \left( {C_m}2^{4{\mathsf L}_j+1}{\mathsf c}^{\frac{3}{4}}{\bar{k}}^{-\frac{1}{3}} \left( \frac{3^\ell \sum _i\Vert p_i\Vert _\infty }{\varepsilon /2(\ell +2)}\right) ^{\frac{4{\mathsf L}_j+1}{3}}+1\right) \max _{k\in K_j}{{\,\mathrm{\overline{\deg }}\,}}g_k\Biggr ]\\&\qquad \quad +1\Big )^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\frac{1}{\min _{1\le k\le n}(r_{j,k}+2)^2}\\&\le \varepsilon ^{-{\mathsf L}_j-\frac{4{\mathsf L}_j+1}{3}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})} {C_f}(C_{\textrm{Jac}}{C_m})^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})} |J_j|\pi ^2\left( 3^\ell \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}\\&\qquad \times 2^{4{\mathsf L}_j+\frac{|J_j|}{2}+1+(1+\frac{4{\mathsf L}_j+1}{3})(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3}) }(\ell +2)^{1+{\mathsf L}_j+\frac{4{\mathsf L}_j+1}{3}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}{\bar{k}}\\&\qquad \times {\mathsf c}_j^{1+\frac{3}{4}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\left( \sum _{i=j}^\ell |\mathcal {J}_i|^{2(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\right) \\&\qquad \times (3\deg p_j\sum _{i=1}^\ell {{\,\textrm{Lip}\,}}p_i)^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\\&\qquad \times (\max _{k\in K_j}\deg g_k+1)^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\frac{1}{\min _{1\le k\le n}(r_{j,k}+2)^2}. \end{aligned}$$

Finally use (24) and then (5) to get that the above is less than

$$\begin{aligned}{} & {} {\textbf{C}_{j}}\frac{\varepsilon ^{-{\mathsf L}_j-\frac{4{\mathsf L}_j+1}{3}(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}\left( \sum _i\Vert p_i\Vert _\infty \right) ^{{\mathsf L}_j+1}(\deg p_j\sum _{i=1}^\ell {{\,\textrm{Lip}\,}}p_i)^{(2{\mathsf L}_j+|J_j|+2)(1+\frac{8{\mathsf L}_j}{3})}}{\min _{1\le k\le n}(r_{j,k}+2)^2}\\{} & {} \quad \le \frac{\varepsilon }{4(\ell +2)}. \end{aligned}$$

This shows that (34) holds, and hence also (33), which proves the theorem. \(\square \)

Lemma 19

([2, Lemma 3.8]) Let \(J\subset \{1,\dots ,n\}\), and let \({\textbf{r}}=(r_1,\dots ,r_n)\) be a multi-index such that \(r_i>0\) only if \(i\in J\).

The quadratic module \(\mathcal Q_{{\textbf{r}}+\textbf{2},J}(1-\sum _{i\in J}x_i^2)\) contains the preordering \(\mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J})\),

$$\begin{aligned} \mathcal P_{{\textbf{r}},J}(\{1-x_i^2\}_{i\in J})\subseteq \mathcal Q_{{\textbf{r}}+\textbf{2},J}(1-\textstyle \sum _{i\in J}x_i^2). \end{aligned}$$

Proof

This follows from

$$\begin{aligned} 1\pm x_i=\frac{1}{2}(1-x_i^2+(1\pm x_i)^2)=\frac{1}{2}((1-\Vert x\Vert ^2)+\sum _{\begin{array}{c} j\in J\\ j\ne i \end{array}}x_j^2+(1\pm x_i)^2) \end{aligned}$$

and

$$\begin{aligned} 1-x_i^2&=(1-x_i)(1+x_i)\\&=\frac{1}{4}(1-\Vert x\Vert _J^2)\left( 2\sum _{j\ne i}x_j^2+(1-x_i)^2+(1+x_i)^2\right) \\&\qquad +\frac{1}{4}(1-\Vert x\Vert _J^2)^2+\left( \sum _{j\ne i}x_j^2\right) ^2+\sum _{j\ne i}x_j^2((1-x_i)^2+(1+x_i)^2). \end{aligned}$$

The increase of \(\textbf{2}\) in \({\textbf{r}}\) stems from the fact that \(\deg (1-x_i^2)=2\) while the degree of the right-hand side above is 4. \(\square \)

4.3 An asymptotic lemma

Lemma 20

For \(a,b,c,d,p,q>0\), with \(cq-ap\ne 0\), we have

$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{\displaystyle \log \left[ \left( {\begin{array}{c}a+b\varepsilon ^{-p}\\ b\varepsilon ^{-p}\end{array}}\right) \Bigg /\left( {\begin{array}{c}c+d\varepsilon ^{-q}\\ d\varepsilon ^{-q}\end{array}}\right) \right] }{(cq-ap)\log \varepsilon } =1. \end{aligned}$$

In other words, as \(\varepsilon \searrow 0\),

$$\begin{aligned} \frac{\left( {\begin{array}{c}a+b\varepsilon ^{-p}\\ b\varepsilon ^{-p}\end{array}}\right) }{\left( {\begin{array}{c}c+d\varepsilon ^{-q}\\ d\varepsilon ^{-q}\end{array}}\right) }\approx \varepsilon ^{cq-ap}\rightarrow 0. \end{aligned}$$

Proof

Recall the Stirling series can be used [19, p. 293–294] to see that

$$\begin{aligned} \log (n!)=n\log n+n+\frac{1}{2}\log (2\pi n)+O(1/n) \end{aligned}$$

for all large \(n>0\). We use it to get that,

$$\begin{aligned}&\frac{\log \left[ \left( {\begin{array}{c}a+b\varepsilon ^{-p}\\ b\varepsilon ^{-p}\end{array}}\right) \Bigg /\left( {\begin{array}{c}c+d\varepsilon ^{-q}\\ d\varepsilon ^{-q}\end{array}}\right) \right] }{(cq-ap)\log \varepsilon }\nonumber \\&\quad =\frac{\log \left[ \frac{(a+b\varepsilon ^{-p})!}{a!(b\varepsilon ^{-p})!}\Bigg /\frac{(c+d\varepsilon ^{-q})!}{c!(d\varepsilon ^{-q})!}\right] }{(cq-ap)\log \varepsilon }\nonumber \\&\quad =\frac{\log (a+b\varepsilon ^{-p})!-\log a!-\log (b\varepsilon ^{-p})!-\log (c+d\varepsilon ^{-q})!+\log c!+\log (d\varepsilon ^{-q})!}{(cq-ap)\log \varepsilon }\nonumber \\&\quad =\frac{1}{(cq-ap)\log \varepsilon }\Big [(a+b\varepsilon ^{-p})\log (a+b\varepsilon ^{-p}) -b\varepsilon ^{-p}\log (b\varepsilon ^{-p}) \end{aligned}$$
(40)
$$\begin{aligned}&\quad \qquad -(c+d\varepsilon ^{-q})\log (c+d\varepsilon ^{-q})+d\varepsilon ^{-q}\log (d\varepsilon ^{-q}) \end{aligned}$$
(41)
$$\begin{aligned}&\quad \qquad +(a+b\varepsilon ^{-p})-(b\varepsilon ^{-p})-(c+d\varepsilon ^{-q})+(d\varepsilon ^{-q}) \end{aligned}$$
(42)
$$\begin{aligned}&\quad \qquad +\frac{1}{2}\big (\log (2\pi (a+b\varepsilon ^{-p}))-\log (2\pi b\varepsilon ^{-p})-\log (2\pi (c+d\varepsilon ^{-q}))+\log (2\pi d\varepsilon ^{-q})\big ) \end{aligned}$$
(43)
$$\begin{aligned}&\quad \qquad +\log c!-\log a!+O(\varepsilon ^{p}+\varepsilon ^q)\Big ]. \end{aligned}$$
(44)

Notice that the terms in both (42)—where the non-constant terms cancel out—and in (44) are asymptotically much smaller than the absolute value of the denominator, which tends to \(+\infty \). Line (43) is

$$\begin{aligned}{} & {} \frac{1}{2}\frac{\log (2\pi (a+b\varepsilon ^{-p}))-\log (2\pi b\varepsilon ^{-p})-\log (2\pi (c+d\varepsilon ^{-q}))+\log (2\pi d\varepsilon ^{-q})}{(cq-ap)\log \varepsilon }\\{} & {} \qquad =\frac{\log (\frac{a+b\varepsilon ^{-p}}{b\varepsilon ^{-p}}\frac{d\varepsilon ^{-q}}{c+d\varepsilon ^{-q}})}{2(cq-ap)\log \varepsilon }=\frac{\log (\frac{a\varepsilon ^{p}+b}{b}\frac{d}{c\varepsilon ^{q}+d})}{2(cq-ap)\log \varepsilon } \end{aligned}$$

As \(\varepsilon \searrow 0\), the factors inside the logarithm tend to 1, so the numerator tends to 0, while the denominator tends to \(\pm \infty \), and the quotient tends to 0. Let us now show that the remaining two lines (40)–(41) together tend to 1 in the limit. Now,

$$\begin{aligned}{} & {} \lim _{\varepsilon \searrow 0} b\varepsilon ^{-p}\left( \log (a+b\varepsilon ^{-p})-\log (b\varepsilon ^{-p})\right) =a\quad \text {and} \\{} & {} \quad \lim _{\varepsilon \searrow 0} d\varepsilon ^{-p}\left( \log (c+d\varepsilon ^{-p})-\log (d\varepsilon ^{-p})\right) =c, \end{aligned}$$

so we get

$$\begin{aligned}&\lim _{\varepsilon \searrow 0}\frac{1}{(cq-ap)\log \varepsilon }\Big ((a+b\varepsilon ^{-p})\log (a+b\varepsilon ^{-p}) -b\varepsilon ^{-p}\log (b\varepsilon ^{-p})\\&\quad \qquad -(c+d\varepsilon ^{-q})\log (c+d\varepsilon ^{-q}) +d\varepsilon ^{-q}\log (d\varepsilon ^{-q})\Big )\\&\quad = \lim _{\varepsilon \searrow 0}\frac{ a-c + a\log (a+b\varepsilon ^{-p})-c\log (c+d\varepsilon ^{-q})}{(cq-ap)\log \varepsilon }\\&\quad =\lim _{\varepsilon \searrow 0}\frac{a\log (a+b\varepsilon ^{-p})-c\log (c+d\varepsilon ^{-q})}{(cq-ap)\log \varepsilon }. \end{aligned}$$

In this quotient, both the numerator and the denominator tend to \(\pm \infty \), so we can apply a version of the l’Hôpital rule, which states that, if the limit of the quotient of their derivatives exists, then the original limit above equals that limit. Taking the limit of the quotient of the derivatives gives

$$\begin{aligned} \lim _{\varepsilon \searrow 0}\frac{\frac{cdq\varepsilon ^{-1-q}}{c+d\varepsilon ^{-q}} -\frac{abp\varepsilon ^{-1-p}}{a+b\varepsilon ^{-p}}}{(cq-ap)/\varepsilon }&=\lim _{\varepsilon \searrow 0} \frac{cdq\varepsilon ^{-q}(a+b\varepsilon ^{-p})-abp\varepsilon ^{-p}(c+d\varepsilon ^{-q})}{(cq-ap)(a+b\varepsilon ^{-p})(c+d\varepsilon ^{-q})}\\&=\lim _{\varepsilon \searrow 0} \frac{cdq(a\varepsilon ^{p}+b)-abp(c\varepsilon ^{q}+d)}{(cq-ap)(a\varepsilon ^{q}+b)(c\varepsilon ^{p}+d)}\\&=\lim _{\varepsilon \searrow 0} \frac{cdqb-abpd}{(cq-ap)bd}=1. \end{aligned}$$

\(\square \)