1 Introduction

Consider the problem of computing the global minimum:

$$\begin{aligned} f_{\min }:= \min _{{\mathbf {x}}\in \mathrm{B}^{n}} f({\mathbf {x}}) \end{aligned}$$
(1)

of a polynomial f of degree \(d \in \mathbb {N}\) over the hypercube \(\mathrm{B}^{n} := [-1,1]^n \subseteq \mathbb {R}^n\). The program (1) can be reformulated as finding the largest \(\lambda \in \mathbb {R}\) for which the function \(f - \lambda\) is nonnegative on \(\mathrm{B}^{n}\). That is, writing \({\mathcal {P}}(\mathrm{B}^{{n}}) \subseteq \mathbb {R}[{x}]\) for the cone of all polynomials that are nonnegative on \(\mathrm{B}^{n}\), we have:

$$\begin{aligned} f_{\min }= \max \{ \lambda \in \mathbb {R}: f - \lambda \in {\mathcal {P}}(\mathrm{B}^{{n}}) \}. \end{aligned}$$
(2)

By replacing \({\mathcal {P}}(\mathrm{B}^{{n}})\) in (2) by a smaller subset of \(\mathbb {R}[{\mathbf {x}}]\) one may obtain lower bounds on \(f_{\min }\). One way of obtaining such subsets is based on the following description of \(\mathrm{B}^{n}\) as a semialgebraic set:

$$\begin{aligned} \mathrm{B}^{n} = \{ {\mathbf {x}}\in \mathbb {R}^n : g_{i}({\mathbf {x}}) := (1-x_i^2) \ge 0 \quad \forall i \in [n] \}. \end{aligned}$$
(3)

In light of this description, we see that the preordering \({Q}(\mathrm{B}^{n})_{r}\), truncated at degree \(r\), defined byFootnote 1:

$$Q({\text{B}}^{n} )_{r} : = \left\{ {\sum\limits_{{J \subseteq [n]}} {\sigma _{J} } g_{J} :\sigma _{J} \in \Sigma [{\mathbf{x}}],~\deg (\sigma _{J} g_{J} ) \le r} \right\}\quad \left( {g_{J} : = \prod\limits_{{j \in J}} {g_{j} } } \right),$$
(4)

satisfies \({Q}(\mathrm{B}^{n})_{r} \subseteq {\mathcal {P}}(\mathrm{B}^{{n}})\) for all \(r \in \mathbb {N}\). Here, \(\Sigma [{{\mathbf {x}}}]\) is the set of sum-of-squares polynomials (i.e., of the form \(p = p_1^2 + p_2^2 + \ldots + p_m^2\) for certain \(p_i \in \mathbb {R}[{\mathbf {x}}]\)). When no degree bounds are imposed (i.e., \(r=\infty\)) we obtain the full preordering \({Q}(\mathrm{B}^{n})_{}\) generated by the polynomials \(g_i({\mathbf {x}})=1-x_i^2\) (\(i\in [n])\), which coincides with the quadratic module generated by the products \(\prod _{i\in I}g_i({\mathbf {x}})\) (\(I\subseteq [n]\)). We thus obtain the following hierarchy of lower bounds for \(f_{\min }\), due to Lasserre [1]:

$$\begin{aligned} {f_{({r})}} := \max \{ \lambda \in \mathbb {R}: f - \lambda \in {Q}(\mathrm{B}^{n})_{r} \}. \end{aligned}$$
(5)

If the program (5) is feasible, its maximum is attained. By definition, we have \(f_{\min }\ge {f_{({r + 1})}} \ge {f_{({r})}}\) for all \(r \in \mathbb {N}\). Furthermore, we have \(\lim _{r \rightarrow \infty } {f_{({r})}} = f_{\min }\), which follows directly from the following special case of Schmüdgen’s Positvstellensatz.

Theorem 1

(Special case of Schmüdgen’s Positivstellensatz [2]) Let \(f \in {\mathcal {P}}(\mathrm{B}^{{n}})\) be a polynomial. Then for any \(\eta > 0\) there exists an \(r \in \mathbb {N}\) such that \(f + \eta \in {Q}(\mathrm{B}^{n})_{r}\).

1.1 Main result

We show a bound on the convergence rate of the lower bounds \({f_{({r})}}\) to the global minimum \(f_{\min }\) of \(f\) over \(\mathrm{B}^{n}\) in \(O(1/r^2)\). Alternatively, our result can be interpreted as a bound on the degree r in Schmüdgen’s Positivstellensatz of the order \(O(1/\sqrt{\eta })\) of a positivity certificate for \(f+\eta\) when \(f \in {\mathcal {P}}(\mathrm{B}^{{n}})\).

Theorem 2

Let f be a polynomial of degree \(d \in \mathbb {N}\). Then there exists a constant \(C(n, d) > 0\), depending only on n and d, such that:

$$\begin{aligned} f_{\min }- {f_{({{(r+1)}n})}} \le \frac{C(n, d)}{r^2} \cdot (f_{\max }- f_{\min })\quad \text { for all } r \ge \pi d \sqrt{2n}. \end{aligned}$$
(6)

Furthermore, the constant C(nd) may be chosen such that it either depends polynomially on n (for fixed d) or it depends polynomially on d (for fixed n), see relation (20) for details.

Corollary 3

Let \(f\in {\mathcal {P}}(\mathrm{B}^{{n}})\) with degree d. Then, for any \(\eta >0\), we have:

$$\begin{aligned} f+\eta \in {Q}(\mathrm{B}^{n})_{{(r+1)}n} \quad \text { for all } r \ge \max \Big \{ \pi d \sqrt{2n},{1\over \sqrt{\eta }} \sqrt{C(n,d) (f_{\max }- f_{\min })}\Big \}, \end{aligned}$$

where C(nd) is the constant from Theorem 2. Hence we have \(f+\eta \in {Q}(\mathrm{B}^{n})_{r}\) for \(r=O(1/\sqrt{\eta })\).

Proof

Let \(\eta >0\) and set \(C_f:= C(n,d) \cdot (f_{\max }- f_{\min })\). Pick an integer \(r \ge \max \{ \pi d \sqrt{2n}, \sqrt{C_f/\eta }\}\). Then we have:

$$\begin{aligned} f+\eta = \underbrace{f- {f_{({{(r+1)}n})}}}_{\in {Q}(\mathrm{B}^{n})_{{(r+1)}n}} + \big (\underbrace{{f_{({{(r+1)}n})}}-f_{\min }+{C_f\over r^2}}_{\ge 0 \text { by Theorem 2}}\big ) +\underbrace{ f_{\min }}_{\ge 0 } + \big ( \underbrace{\eta - {C_f\over r^2}}_{\ge 0}\big ), \end{aligned}$$

which shows \(f+\eta \in {Q}(\mathrm{B}^{n})_{{(r+1)}n}\). \(\square\)

1.2 Outline of the proof

Let \(f \in \mathbb {R}[{\mathbf {x}}]\) be a polynomial of degree d. To simplify our arguments and notation, we will work with the scaled function:

$$\begin{aligned} F:= \frac{f- f_{\min }}{f_{\max }- f_{\min }}, \end{aligned}$$

for which \(F_{\min }= 0\) and \(F_{\max }= 1\). Since the inequality (6) is invariant under a positive scaling of f and adding a constant, it indeed suffices to show the result for the function F.

The idea of the proof is as follows. Let \(\epsilon > 0\) and consider the polynomial \({\tilde{F}} := F+ \epsilon\). Let \(r\ge d\). Suppose that we are able to construct a (nonsingular) linear operator \({\mathbf {K}}_r : \mathbb {R}[{{\mathbf {x}}}]_r \rightarrow \mathbb {R}[{{\mathbf {x}}}]_r\) which has the following two properties:

$$\begin{aligned}&{\mathbf {K}}_r p \in {Q}(\mathrm{B}^{n})_{{(r+1)}n} \quad \text { for all } p \in {\mathcal {P}}(\mathrm{B}^{{n}})_{r}, \end{aligned}$$
(P1)
$$\begin{aligned}&\Vert {{\mathbf {K}}_r^{-1} {\tilde{F}} - {\tilde{F}}} \Vert _{\infty } := \max _{{\mathbf {x}}\in \mathrm{B}^{n}} |{\mathbf {K}}_r^{-1} {\tilde{F}}({\mathbf {x}}) - {\tilde{F}}({\mathbf {x}})| \le \epsilon . \end{aligned}$$
(P2)

Then, by (P2), we have \({\mathbf {K}}_r^{-1} \tilde{F}\in {\mathcal {P}}(\mathrm{B}^{{n}})_{r}\). Indeed, as F is nonnegative on \(\mathrm{B}^{n}\), \({\tilde{F}}({\mathbf {x}}) = F({\mathbf {x}}) + \epsilon\) is greater than or equal to \(\epsilon\) for all \({\mathbf {x}}\in \mathrm{B}^{n}\), and so (P2) tells us that after application of the operator \({\mathbf {K}}_r^{-1}\), the resulting polynomial \({\mathbf {K}}_r^{-1} {\tilde{F}}\) is nonnegative on \(\mathrm{B}^{n}\). Using (P1), we may then conclude that \(\tilde{F}= {\mathbf {K}}_r ({\mathbf {K}}_r^{-1} {\tilde{F}}) \in {Q}(\mathrm{B}^{n})_{{(r+1)}n}\). It follows that \(-\epsilon \le F_{({(r+1)}n)}\), i.e., \(F_{\min }-F_{({(r+1)}n)} \le \epsilon\), and thus \(f_{\min }- {f_{({{(r+1)}n})}} \le \epsilon \cdot (f_{\max }- f_{\min })\). We collect this in the next lemma for future reference.

Lemma 4

Assume that for some \(r \ge d\) and \(\epsilon > 0\) there exists a nonsingular operator \({\mathbf {K}}_r : \mathbb {R}[{{\mathbf {x}}}]_r \rightarrow \mathbb {R}[{{\mathbf {x}}}]_r\) which satisfies the properties (P1) and (P2). Then we have

$$\begin{aligned} f_{\min }- {f_{({{(r+1)}n})}} \le \epsilon \cdot (f_{\max }- f_{\min }). \end{aligned}$$

In what follows, we will construct such an operator \({\mathbf {K}}_r\) for each \(r \ge \pi d \sqrt{2n}\) and the parameter \(\epsilon := C(n,d) / r^2\), where the constant C(nd) will be specified later. Our main Theorem 2 then follows after applying Lemma 4.

We make use of the polynomial kernel method for our construction: after choosing a suitable kernel \({K_r:\mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {R}}\), we define the linear operator \({{\mathbf {K}}_r:\mathbb {R}[{\mathbf {x}}]_r\rightarrow \mathbb {R}[{\mathbf {x}}]_r}\) via the integral transform:

$$\begin{aligned} {\mathbf {K}}_r p({\mathbf {x}}) := \int _{\mathrm{B}^{n}} K_r({\mathbf {x}}, {\mathbf {y}}) p({\mathbf {y}}) d\mu ({\mathbf {y}}) \quad (p \in \mathbb {R}[{\mathbf {x}}]_r). \end{aligned}$$

Here, \(\mu\) is the Chebyshev measure on \(\mathrm{B}^{n}\) as defined in (7) below. A good choice for the kernel \(K_r\) is a multivariate version (see Sect. 3.1) of the well-known Jackson kernel \(K^{\mathrm{ja}}_r\) of degree r (see Sect. 2.3). For this choice of kernel, the operator \({\mathbf {K}}_r\) naturally satisfies (P1) (see Sect. 3.2). Furthermore, it diagonalizes with respect to the basis of \(\mathbb {R}[{\mathbf {x}}]\) given by the (multivariate) Chebyshev polynomials (see Sect. 2.2). Property (P2) can then be verified by analyzing the eigenvalues of \({\mathbf {K}}_r\), which are closely related to the expansion of \(K^{\mathrm{ja}}_r\) in the basis of (univariate) Chebyshev polynomials (see Sect. 3.3). We end this section by illustrating our method of proof with a small example.

Example 5

Consider the polynomial \(f(x) = 1 - x^2 - x^3 + x^4\), which is nonnegative on \([-1,1]\). For \(r \in \mathbb {N}\), let \({\mathbf {K}}_r\) be the operator associated to the univariate Jackson kernel (11) of degree r, which satisfies (P1) (see Sect. 3.2). For \(\eta = 0.1\), we observe that applying \({\mathbf {K}}_7^{-1}\) to \(f + \eta\) yields a nonnegative function on \([-1, 1]\), whereas applying \({\mathbf {K}}_5^{-1}\) does not (see Fig. 1). Applying the arguments of Sect. 1.2, we may thus conclude that \(f + \eta \in {Q}(\mathrm{B}^{n})_{{8}}\), but not that \(f + \eta \in {Q}(\mathrm{B}^{n})_{{6}}\).

Fig. 1
figure 1

The polynomial \(f(x) + \eta\) of Example 5 and its transformations under the inverse operators \({\mathbf {K}}_5^{-1}\) and \({\mathbf {K}}_7^{-1}\) associated to the Jackson kernels of degree 5 and 7

1.3 Related work

The polynomial kernel method, which forms the basis of our analysis, is widely used in functional approximation, see, e.g., [3]. In the present context, the method has already been employed for the analysis of the sum-of-squares hierarchy for optimization over the hypersphere \(S^{n-1}\) in [4] (where a rate in \(O(1/r^2)\) was shown as well) and for optimization over the binary cube \(\{-1, 1\}^n\) in [5]. There, the authors use kernels that are invariant under the symmetry of \(S^{n-1}\) and \(\{-1, 1\}^n\), respectively.

In [6], the polynomial kernel method, and the Jackson kernel in particular, were used to analyze the quality of a related Lasserre-type hierarchy of upper bounds on \(f_{\min }\) over \(\mathrm{B}^{n}=[-1, 1]^n\), where one searches for a density in the truncated preordering \({Q}(\mathrm{B}^{n})_{r}\) minimizing the expected value of f over \(\mathrm{B}^{n}\) (showing again a convergence rate in \(O(1/r^2)\)).

For a general compact semialgebraic set S, a polynomial f nonnegative on S and \(\eta >0\), existence of Schmüdgen-type certificates of positivity for \(f + \eta\) with degree bounds in \(O(1/\eta ^c)\) was shown in [7], where \(c > 0\) is a constant depending on S. This result uses different tools, including in particular a representation result for polynomial optimization over the simplex by Pólya [8] and the effective degree bounds by Powers and Reznick [9].

For the case of the hypercubeFootnote 2 a degree bound in \(O(1/\eta )\) for Schmüdgen-type certificates is obtained in [10], thus showing that one can take \(c\le 1\) in the above mentioned result of [7]. This result holds in fact for a weaker hierarchy of bounds obtained by restricting in (5) to decompositions of the polynomial \(f-\lambda\) involving factors \(\sigma _J\) that are nonnegative scalars (instead of sums of squares), also known as Handelman-type decompositions (thus replacing the preordering \({Q}(\mathrm{B}^{n})_{r}\) by its subset \(H_r\) of polynomials having a Handelman-type decomposition). The analysis in [10] relies on employing the Bernstein operator \({\mathbf {B}}_r\), which has the property of mapping a polynomial nonnegative over the hypercube to a polynomial in the set \(H_{rn}\subseteq {Q}(\mathrm{B}^{n})_{rn}\).

In this paper, we can show a further improvement by using a different type of kernel operator; namely we show that we can take the constant \(c \le 1/2\) in the special case \(S = [-1, 1]^n\).

2 Preliminaries

2.1 Notation

Throughout, \(\mathrm{B}^{n} := [-1, 1]^n \subseteq \mathbb {R}^n\) is the n-dimensional hypercube. We write \(\mathbb {R}[x]\) for the univariate polynomial ring, while reserving the bold-face notation \(\mathbb {R}[{\mathbf {x}}] = \mathbb {R}[x_1, x_2, \dots , x_n]\) to denote the ring of polynomials in n variables. Similarly, \(\Sigma [x] \subseteq \mathbb {R}[x]\) and \(\Sigma [{\mathbf {x}}] \subseteq \mathbb {R}[{\mathbf {x}}]\) denote the sets of univariate and n-variate sum-of-squares polynomials, respectively, consisting of all polynomials of the form \(p = p_1^2 + p_2^2 + \dots + p_m^2\) for certain polynomials \(p_1,\ldots ,p_m\) and \(m\in \mathbb {N}\). For a polynomial \(p \in \mathbb {R}[{\mathbf {x}}]\), we write \(p_{\min }, p_{\max }\) for its minimum and maximum over \(\mathrm{B}^{n}\), respectively, and \(\Vert {p} \Vert _{\infty } := \sup _{{\mathbf {x}}\in \mathrm{B}^{n}} |p({\mathbf {x}})|\) for its sup-norm on \(\mathrm{B}^{n}\).

2.2 Chebyshev polynomials

Let \(\mu\) be the normalized Chebyshev measure on \(\mathrm{B}^{n} = [-1, 1]^n\), defined by:

$$\begin{aligned} d\mu ({\mathbf {x}}) = \frac{dx_1}{\pi \sqrt{1 - x_1^2}} \ldots \frac{dx_n}{\pi \sqrt{1 - x_n^2}}. \end{aligned}$$
(7)

Note that \(\mu\) is a probability measure on \(\mathrm{B}^{n}\), meaning that \(\int _{\mathrm{B}^{n}}d\mu = 1\). We write \(\langle {\cdot }, {\cdot } \rangle _\mu\) for the corresponding inner product on \(\mathbb {R}[{\mathbf {x}}]\), given by:

$$\begin{aligned} \langle {f}, {g} \rangle _\mu := \int _{\mathrm{B}^{n}} f({\mathbf {x}}) g({\mathbf {x}}) d\mu ({\mathbf {x}}). \end{aligned}$$

For \(k \in \mathbb {N}\), let \(T_{k}\) be the univariate Chebyshev polynomial (see, e.g., [11]) of degree k, defined by:

$$\begin{aligned} T_{k}(\cos \theta ) := \cos (k\theta ) \quad (\theta \in \mathbb {R}). \end{aligned}$$

Note that \(|T_k(x)| \le 1\) for all \(x \in [-1, 1]\) and that \(T_0 = 1\). The Chebyshev polynomials satisfy the orthogonality relations:

$$\begin{aligned} \langle {T_a}, {T_b} \rangle _\mu = \int _{-1}^{1} T_{a}(x) T_{b}(x) d\mu (x) = {\left\{ \begin{array}{ll} 0 \quad a \ne b, \\ 1 \quad a=b=0, \\ \frac{1}{2} \quad a=b\ne 0. \end{array}\right. } \end{aligned}$$
(8)

A univariate polynomial p may therefore be expanded as:

$$\begin{aligned} p = p_0 + \sum _{k = 1}^{\deg (p)} 2 p_k T_k, \quad \text {where } p_k := \langle {T_k}, {p} \rangle _\mu . \end{aligned}$$

For \(\kappa \in \mathbb {N}^n\), we consider the multivariate Chebyshev polynomial \(T_{\kappa }\), defined by setting:

$$\begin{aligned} T_{\kappa }({\mathbf {x}}) := \prod _{i = 1}^n T_{\kappa _i}(x_i). \end{aligned}$$

The multivariate Chebyshev polynomials form a basis for \(\mathbb {R}[{\mathbf {x}}]\) and satisfy the orthogonality relations:

$$\begin{aligned} \langle {T_\alpha }, {T_\beta } \rangle _\mu = \int _{\mathrm{B}^{n}} T_{\alpha }({\mathbf {x}}) T_{\beta }({\mathbf {x}}) d\mu ({\mathbf {x}}) = {\left\{ \begin{array}{ll} 0 \quad &{}\alpha \ne \beta , \\ 1 \quad &{}\alpha =\beta =0, \\ 2^{-w(\alpha )} \quad &{}\alpha =\beta \ne 0. \end{array}\right. } \end{aligned}$$
(9)

Here, \(w(\alpha ) := |\{i \in [n] : \alpha _i \ne 0 \}|\) denotes the Hamming weight of \(\alpha \in \mathbb {N}^n\).

We use the notation \(\mathbb {N}^n_{d} \subseteq \mathbb {N}^n\) to denote the set of n-tuples \(\alpha \in \mathbb {N}^n\) with \(|\alpha |=\sum _{i=1}^n\alpha _i\le d\). As in the univariate case, we may expand any n-variate polynomial p as:

$$\begin{aligned} p = \sum _{\kappa \in \mathbb {N}^n_{\deg (p)}} 2^{w(\kappa )} p_\kappa T_\kappa , \quad \text {where } p_\kappa := \langle {T_\kappa }, {p} \rangle _\mu . \end{aligned}$$
(10)

2.3 The Jackson kernel

For \(r\in \mathbb {N}\) and for coefficients \(\lambda _{k}^{r} \in \mathbb {R}\) to be specified below in (12), consider the kernel \(K^{\mathrm{ja}}_r : \mathbb {R}\times \mathbb {R}\rightarrow \mathbb {R}\) given by:

$$\begin{aligned} K^{\mathrm{ja}}_r(x, y) := 1 + 2 \sum _{k=1}^r \lambda _{k}^{r} T_{k}(x) T_{k}(y). \end{aligned}$$
(11)

We associate a linear operator \({\mathbf {K}}^{\mathrm{ja}}_r : \mathbb {R}[{x}]_r \rightarrow \mathbb {R}[{x}]_r\) to this kernel by setting:

$$\begin{aligned} {\mathbf {K}}^{\mathrm{ja}}_r p(x) := \int _{-1}^{1} K^{\mathrm{ja}}_r(x, y) p(y) d\mu (y) \quad (p \in \mathbb {R}[x]_r). \end{aligned}$$

Using the orthogonality relations (8), and writing \(\lambda _0^r :=1\), we see that:

$$\begin{aligned} {\mathbf {K}}^{\mathrm{ja}}_r T_{k}(x) := \int _{-1}^1 K^{\mathrm{ja}}_r(x, y) T_{k}(y) d\mu (y) = \lambda ^r_k T_{k}(x) \quad (0 \le k \le r). \end{aligned}$$

In other words, \({\mathbf {K}}^{\mathrm{ja}}_r\) is a diagonal operator with respect to the Chebyshev basis of \(\mathbb {R}[{x}]_r\), and its eigenvalues are given by \(\lambda ^r_0 = 1, \lambda _{1}^{r}, \dots , \lambda _{r}^{r}\). In what follows, we set:

$$\begin{aligned} \lambda _{k}^{r} = \frac{1}{r+2}\big ((r+2-k) \cos (k \theta _r) + \frac{\sin (k\theta _r)}{\sin (\theta _r)} \cos (\theta _r) \big ) \quad (1 \le k \le r), \end{aligned}$$
(12)

with \(\theta _r = \frac{\pi }{r+2}\). We then obtain the so-called Jackson kernel (see, e.g., [3]). The following properties of the Jackson kernel are crucial to our analysis.

Proposition 6

For every \(d, r \in \mathbb {N}\) with \(d \le r\), we have:

  1. (i)

    \(K^{\mathrm{ja}}_r(x, y) \ge 0\) for all \(x, y \in [-1, 1]\),

  2. (ii)

    \(1 \ge \lambda _{k}^{r} > 0\) for all \(0 \le k \le r\), and

  3. (iii)

    \(|1 - \lambda _{k}^{r}| = 1 - \lambda _{k}^{r} \le {\pi ^2 d^2\over (r+2)^2}\) for all \(0 \le k \le d\).

Proof

Nonnegativity of the Jackson kernel is a well-known fact, and is verified, e.g., in [6]. We check that the other properties (ii)-(iii) hold as well.

Second property (ii): Note that when \(k \le (r+2) / 2\), both terms of (12) are positive, and so certainly \(\lambda _{k}^{r} > 0\). So assume \((r+2) / 2 < k \le r\). Set \(h = r+2 - k\), so that \(k \theta _r = \pi - h \theta _r\), \(2\le h<(r+2)/2\), and

$$\begin{aligned} (r+2) \lambda _{k}^{r} = - h \cos ( h \theta _r) + \frac{\sin (h \theta _r)}{\sin (\theta _r)}\cos (\theta _r). \end{aligned}$$
(13)

It remains to show that the RHS of (13) is positive for all \(2 \le h < (r+2)/2\). Note that \(1> \cos (\theta _r) > 0\), \(\sin (\theta _r) > 0\) and that \(\sin (h \theta _r) \ge 0\) for all \(2 \le h < (r+2)/2\). We proceed by induction on h. For \(h = 2\), we compute:

$$\begin{aligned} \begin{aligned} - h \cos ( h \theta _r) + \frac{\sin (h \theta _r)}{\sin (\theta _r)}\cos (\theta _r)&= -2 (2\cos ^2(\theta _r) -1) + 2 \cos ^2(\theta _r) \\&= - 2 \cos ^2(\theta _r) + 2 > 0, \end{aligned} \end{aligned}$$
(14)

which settles the base of induction. For \(h \ge 2\), we compute:

$$\begin{aligned} -(h+1)&\cos ( (h+1) \theta _r) + \sin ((h+1)\theta _r)\frac{\cos (\theta _r)}{\sin (\theta _r)} \\&= -(h+1) \big (\cos (h \theta _r)\cos (\theta _r) - \sin (h \theta _r)\sin (\theta _r) \big ) \\&\quad + \big (\sin (h\theta _r)\cos (\theta _r) + \cos (h\theta _r) \sin (\theta _r)\big ) \frac{\cos (\theta _r)}{\sin (\theta _r)} \\&= - h \cos (h \theta _r) \cos (\theta _r) + (h+1) \sin (h\theta _r)\sin (\theta _r) + \frac{\sin (h \theta _r)}{\sin (\theta _r)}\cos ^2(\theta _r) \\&= \underbrace{\cos (\theta _r)}_{>0} \underbrace{\Big ( -h \cos (h\theta _r) + {\sin (h\theta _r)\over \sin (\theta _r)}\cos (\theta _r)\Big )}_{\ge 0 \text { by the induction assumption}} + (h+1)\underbrace{\sin (h\theta _r)\sin (\theta _r)}_{\ge 0}\\&\ge 0. \end{aligned}$$

We conclude that \(\lambda _{k}^{r} > 0\) for all \(k \in [r]\). To see that \(\lambda _{k}^{r} \le 1\), note that for all \(k \in \mathbb {N}\), \(T_{k}(x) \le 1\) for \(-1 \le x \le 1\) and \(T_{k}(1) = 1\). We can thus compute:

$$\begin{aligned} \lambda _{k}^{r} = \lambda _{k}^{r} T_{k}(1) = \int _{-1}^1 K^{\mathrm{ja}}_r(1, y) T_{k}(y) d\mu (y) \le \int _{-1}^1 K^{\mathrm{ja}}_r(1, y) d\mu (y) = \lambda _{0}^{r} = 1, \end{aligned}$$

making use of the nonnegativity of \(K^{\mathrm{ja}}_r(x, y)\) on \([-1,1]^2\) for the inequality.

Third property (iii): Using the expression of \(\lambda ^k_r\) in (12) we have

$$\begin{aligned} 1-\lambda ^k_r= 1- {r+2-k\over r+2}\cos (k\theta _r) - {1\over r+2}{\sin (k\theta _r)\cos (\theta _r)\over \sin (\theta _r)}. \end{aligned}$$

We now bound each trigonometric term using the fact that:

$$\begin{aligned} \cos (x)\ge 1-{1\over 2}x^2,\quad x-{1\over 6}x^3\le \sin (x)\le x\quad (x \in \mathbb {R}).\end{aligned}$$
(15)

When \(k=1\) we immediately get:

$$\begin{aligned} 1-\lambda ^1_r= 1-\cos (\theta _r) \le {1\over 2}\theta _r^2= {\pi ^2\over 2 (r+2)^2} \le {d^2\pi ^2\over (r+2)^2}. \end{aligned}$$

Assume now \(2\le k\le d\). Using (15) combined with \(\cos (\theta _r), \sin (\theta _r), \sin (k\theta _r)>0\) we obtain:

$$\begin{aligned} {\sin (k\theta _r)\cos (\theta _r)\over \sin (\theta _r)} \ge \big ( k\theta _r-{1\over 6}k^3\theta _r^3\big ) \big (1-{1\over 2}\theta _r^2\big ){1\over \theta _r} \ge k -{k\over 2}\theta _r^2\big (1+{k^2\over 3}\big ) \end{aligned}$$

and thus:

$$\begin{aligned} 1-\lambda ^k_r&\le 1-{r+2-k\over r+2}\big (1-{k^2\theta _r^2\over 2}\big ) -{1\over r+2} \Big (k -{k\over 2}\theta _r^2\big (1+{k^2\over 3}\big )\Big )\\&=\underbrace{{r+2-k\over r+2}}_{\le 1} {k^2\theta _r^2\over 2} + \underbrace{{k\over 2(r+2)}}_{\le 1/2} \theta _r^2 \underbrace{\big (1+{k^2\over 3}\big )}_{\le {2\over 3}k^2 \text { if } k\ge 2}\\&\le k^2\theta _r^2 \le {d^2\pi ^2\over (r+2)^2}. \end{aligned}$$

This concludes the proof if \(k\ge 2\). \(\square\)

3 Proof of the main theorem

3.1 Construction of the linear operator \({\mathbf {K}}_r\)

As noted before, in order to prove Theorem 2 it suffices to construct a linear operator \({\mathbf {K}}_r : \mathbb {R}[{{\mathbf {x}}}]_r \rightarrow \mathbb {R}[{{\mathbf {x}}}]_r\) that is nonsingular and satisfies (P1) and (P2). For this purpose we define the multivariate Jackson kernel \(K_r : \mathbb {R}^n \times \mathbb {R}^n \rightarrow \mathbb {R}\) by setting:

$$\begin{aligned} K_r({\mathbf {x}}, {\mathbf {y}}) := \prod _{i = 1}^n K^{\mathrm{ja}}_r(x_i, y_i), \end{aligned}$$
(16)

where \(K^{\mathrm{ja}}_r\) is the (univariate) Jackson kernel from (11). Now let \({\mathbf {K}}_r\) be the corresponding kernel operator defined by:

$$\begin{aligned} {\mathbf {K}}_r p({\mathbf {x}}) = \int _{{\mathbf {y}}\in \mathrm{B}^{n}} K_r({\mathbf {x}},{\mathbf {y}}) p({\mathbf {y}}) d \mu ({\mathbf {y}}) \quad (p \in \mathbb {R}[{\mathbf {x}}]_r). \end{aligned}$$

The operator \({\mathbf {K}}_r\) is diagonal w.r.t. the (multivariate) Chebyshev basis, and its eigenvalues can be expressed in terms of the coefficients \(\lambda _{k}^{r}\) of the Jackson kernel, as the following lemma shows.

Lemma 7

The operator \({\mathbf {K}}_r\) is diagonal w.r.t. the Chebyshev basis for \(\mathbb {R}[{{\mathbf {x}}}]_r\), and its eigenvalues are given by:

$$\begin{aligned} \lambda _{\kappa }^{r} := \prod _{i = 1}^n \lambda _{\kappa _i}^{r} \quad (\kappa \in \mathbb {N}^n_r). \end{aligned}$$

Proof

For \(\kappa \in \mathbb {N}^n_r\), we see that:

$$\begin{aligned} {\mathbf {K}}_r T_{\kappa }({\mathbf {x}})&= \int _{{\mathbf {y}}\in \mathrm{B}^{n}} K_r({\mathbf {x}},{\mathbf {y}}) T_{\kappa }({\mathbf {y}}) d \mu ({\mathbf {y}}) \\&= \prod _{i = 1}^n \bigg (\int _{y_i \in [-1, 1]} K^{\mathrm{ja}}_r(x_i, y_i) T_{\kappa _i}(y_i) d \mu (y_i)\bigg ) =\prod _{i = 1}^n \lambda _{\kappa _i}^{r} T_{\kappa _i}(x_i) = \lambda _{\kappa }^{r} T_{\kappa }({\mathbf {x}}), \end{aligned}$$

as required. \(\square\)

It follows immediately from Proposition 6(ii) that \({\mathbf {K}}_r\) has only nonzero eigenvalues and thus is non-singular. We show that \({\mathbf {K}}_r\) further satisfies (P1) and (P2).

3.2 Verification of property (P1)

Consider the following strengthening of Schmüdgen’s Positivstellensatz in the univariate case.

Theorem 8

(Fekete, Markov-Lukácz (see [12])) Let p be a univariate polynomial of degree r, and assume that \(p \ge 0\) on the interval \([-1, 1]\). Then p admits a representation of the form:

$$\begin{aligned} p(x) = \sigma _0(x) + \sigma _1(x) (1-x^2), \end{aligned}$$
(17)

where \(\sigma _0, \sigma _1 \in \Sigma [{x}]\) and \(\sigma _0\) and \(\sigma _1 \cdot (1-x^2)\) are of degree at most \(r+1\). In other words, in view of (4), we have \(p\in Q([-1, 1])_{{r+1}}\).

By Proposition 6(i), for any \(y \in [-1, 1]\), the polynomial \(x \mapsto K^{\mathrm{ja}}_r(x, y)\) is nonnegative on \([-1,1]\) and thus, by Theorem 8, it belongs to \(Q([-1, 1])_{{r+1}}\). This implies directly that the multivariate polynomial \({\mathbf {x}}\mapsto K_r({\mathbf {x}}, {\mathbf {y}}) = \prod _{i=1}^n K^{\mathrm{ja}}_r(x_i, y_i)\) belongs to \({Q}(\mathrm{B}^{n})_{{(r+1)}n}\) for all \({\mathbf {y}}\in [-1, 1]^n\).

Lemma 9

The operator \({\mathbf {K}}_r\) satisfies property (P1), that is, we have \({\mathbf {K}}_r p \in {Q}(\mathrm{B}^{n})_{{(r+1)}n}\) for all \(p \in {\mathcal {P}}(\mathrm{B}^{{n}})_{{r}}\).

Proof

One way to see this is as follows. Let \(\{{\mathbf {y}}_i: i\in [N]\} \subseteq \mathrm{B}^{n}\) and \(w_i>0\) (\(i\in [N]\)) form a quadrature rule for integration of degree 2r polynomials over \(\mathrm{B}^{n}\); that is, \(\int _{\mathrm{B}^{n}} p({\mathbf {x}})d\mu ({\mathbf {x}})=\sum _{i=1}^N w_ip({\mathbf {y}}_i)\) for any \(p\in \mathbb {R}[{\mathbf {x}}]_{2r}\). Then, for any \(p\in {\mathcal {P}}(\mathrm{B}^{{n}})_r\), we have \({\mathbf {K}}_r p({\mathbf {x}})=\sum _{i=1}^N K_r({\mathbf {x}},{\mathbf {y}}_i)p({\mathbf {y}}_i) w_i\) with \(p({\mathbf {y}}_i)w_i\ge 0\) for all i, which shows that \({\mathbf {K}}_r p\in {Q}(\mathrm{B}^{n})_{{(r+1)}n}\). \(\square\)

3.3 Verification of property (P2)

We may decompose the polynomial \({{\tilde{F}} = F + \epsilon }\) into the multivariate Chebyshev basis (10):

$$\begin{aligned} {\tilde{F}} = \epsilon + \sum _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa )} F_\kappa T_{\kappa }, \quad \text {where } F_{\kappa } = \langle {F}, {T_\kappa } \rangle _\mu . \end{aligned}$$

By Lemma 7, we then have:

$$\begin{aligned} \begin{aligned} \Vert {{\mathbf {K}}_r^{-1} {\tilde{F}} - {\tilde{F}}} \Vert _{\infty }&= \Vert {\sum _{\kappa \in \mathbb {N}^n_d} (1/\lambda _{\kappa }^{r}) 2^{w(\kappa )} F_\kappa T_{\kappa } - 2^{w(\kappa )}F_\kappa T_{\kappa }} \Vert _{\infty } \\&\le \sum _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa )} |F_\kappa | |1-1/\lambda _{\kappa }^{r}|, \end{aligned} \end{aligned}$$
(18)

making use of the fact that \(\lambda _0 = 1\) and \(|T_{\kappa }(x)| \le 1\) for all \(x \in \mathrm{B}^{n}\). It remains to analyze the expression at the right-hand side of (18). First, we bound the size of \(|F_\kappa |\) for \(\kappa \in \mathbb {N}^n\).

Lemma 10

We have \(|F_\kappa | = |\langle {F}, {T_{\kappa }} \rangle _\mu | \le 2^{-w(\kappa ) / 2}\) for all \(\kappa \in \mathbb {N}^n\).

Proof

Since \(\mu\) is a probability measure on \(\mathrm{B}^{n}\), we have \(\Vert {F} \Vert _{\mu } \le \Vert {F} \Vert _{\infty } \le 1\). Using the Cauchy-Schwarz inequality and (9), we then find:

$$\begin{aligned} \langle {F}, {T_{\kappa }} \rangle _\mu \le \Vert {F_\kappa } \Vert _{\mu } \Vert {T_{\kappa }} \Vert _{\mu } \le \Vert {T_{\kappa }} \Vert _{\mu } = 2^{-w(\kappa ) / 2}. \end{aligned}$$

\(\square\)

To bound the parameter \(|1-1/\lambda _{\kappa }^{r}|\), we first prove a bound on \(|1-\lambda _{\kappa }^{r}|\), which we obtain by applying Bernoulli’s inequality.

Lemma 11

(Bernoulli’s inequality) For any \(x \in [0, 1]\) and \(t \ge 1\), we have:

$$\begin{aligned} 1 - (1-x)^t \le tx. \end{aligned}$$
(19)

Lemma 12

For any \(\kappa \in \mathbb {N}^n_d\) and \(r \ge \pi d\), we have:

$$\begin{aligned} |1 - \lambda _{\kappa }^{r}| \le \frac{n \pi ^2 d^2}{r^2}. \end{aligned}$$

Proof

By Proposition 6, we know that \(0 \le \gamma _k := (1 - \lambda _{k}^{r}) \le \pi ^2d^2 / r^2 \le 1\) for \(0 \le k \le d\). Writing \(\gamma := \max _{0 \le k \le d} \gamma _k\), we compute:

$$\begin{aligned} 1 - \lambda _{\kappa }^{r} = 1 - \prod _{i=1}^n \lambda _{\kappa _i}^{r} = 1 - \prod _{i=1}^n (1 - \gamma _{\kappa _i}) \le 1 - (1 - \gamma )^n \le n \gamma \le \frac{n \pi ^2 d^2}{r^2}, \end{aligned}$$

making use of (19) for the second to last inequality. \(\square\)

Lemma 13

Assuming that \(r \ge \pi d \sqrt{2n}\), we have:

$$\begin{aligned} | 1 - 1/\lambda _{\kappa }^{r}| \le \frac{2n \pi ^2 d^2}{r^2}. \end{aligned}$$

Proof

Under the assumption, and using the previous lemma, we have \(| 1 - \lambda _{\kappa }^{r} | \le 1/2\), which implies that \(\lambda _{\kappa }^{r} \ge 1/2\). We may then bound:

$$\begin{aligned} |1 - 1/\lambda _{\kappa }^{r}| = | \frac{1 - \lambda _{\kappa }^{r}}{\lambda _{\kappa }^{r}}| \le 2 |1 - \lambda _{\kappa }^{r}| \le \frac{2n \pi ^2 d^2}{r^2}. \end{aligned}$$

\(\square\)

Putting things together and using (18), Lemma 10 and Lemma 12 we find that:

$$\begin{aligned} \Vert {{\mathbf {K}}_r^{-1} {\tilde{F}} - {\tilde{F}}} \Vert _{\infty }&\le \sum _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa )} |F_\kappa | |1-1/\lambda _{\kappa }^{r}| \\&\le \sum _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa ) / 2} \cdot \frac{2n \pi ^2 d^2}{r^2} \le |\mathbb {N}^n_d| \cdot \max _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa ) / 2} \cdot \frac{2{n}\pi ^2 d^2}{r^2}. \end{aligned}$$

Hence \({\mathbf {K}}_r\) satisfies (P2) with \(\epsilon =C(n,d)/r^2\), where:

$$\begin{aligned} C(n, d) := |\mathbb {N}^n_d| \cdot \max _{\kappa \in \mathbb {N}^n_d} 2^{w(\kappa ) / 2} \cdot 2{n} \pi ^2 d^2. \end{aligned}$$

In view of Lemma 4, we have thus proven Theorem 2. Finally, we can bound the constant C(nd) in two ways. On the one hand, we have:

$$\begin{aligned} |\mathbb {N}^n_d|= {n+d\atopwithdelims ()n} = \prod _{i=1}^n {d+i\over i}\le (d+1)^n \text { and } \max _{\kappa \in \mathbb {N}^n_d} w(\kappa ) \le n, \end{aligned}$$

resulting in a polynomial dependence of C(nd) on d for fixed n. On the other hand, we have:

$$\begin{aligned} |\mathbb {N}^n_d|={n+d\atopwithdelims ()d} \le (n+1)^d \text { and } \max _{\kappa \in \mathbb {N}^n_d} w(\kappa ) \le d, \end{aligned}$$

resulting in a polynomial dependence of C(nd) on n for fixed d. Namely, we have:

$$\begin{aligned} C(n,d)\le 2\pi ^2 d^2 {n} 2^{n/2}(d+1)^n \quad \text { and } \quad C(n,d)\le 2\pi ^2 d^2{n} 2^{d/2} (n+1)^d. \end{aligned}$$
(20)

4 Concluding remarks

We have shown that the error of the degree r Lasserre-type bound (5) for the minimization of a polynomial over the hypercube \([-1,1]^n\) is of the order \(O(1/r^2\)) when using a sum-of-squares decomposition in the truncated preordering. Alternatively, if f is a polynomial nonnegative on \([-1, 1]^n\) and \(\eta > 0\), our result may be interpreted as showing a bound in \(O(1/\sqrt{\eta })\) on the degree of a Schmüdgen-type certificate of positivity for \(f+\eta\). The dependence on the dimension n and the degree d of f in the constants of our result is both polynomial in n (for fixed d), and polynomial in d (for fixed n).

4.1 The constant C(nd)

A question left open in this work is whether it is possible to show Theorem 2 with a constant C(d) that only depends on the degree d of f, and not on the number of variables n (cf. (20)). This question is motivated by the fact that for the analysis of the analogous hierarchies for the unit sphere in [4] and for the boolean hypercube in [5] the existence of such a constant (depending only on d) was in fact shown.

4.2 Relation to recent developments

Recently, there has been growing interest in obtaining a sharper convergence analysis for various Lasserre-type hierarchies for the minimization of a polynomial f over a semialgebraic set \(S = \{{{\mathbf {x}}\in \mathbb {R}^n} : g_j({\mathbf {x}}) \ge 0 ~~ (j\in [m]) \}\). Our work thus contributes to this research area. We outline some recent developments.

We refer to the works [13, 14] (and further references therein) for the analysis of hierarchies of upper bounds (obtained by minimizing the expected value of f on S with respect to a sum-of-squares density).

The most commonly used hierarchies of lower bounds are defined in terms of sums-of-squares decompositions in the quadratic module of S, being the set of conic combinations of the form \(\sigma _0+\sum _{j=1}^m \sigma _jg_j\) with \(\sigma _j\in \Sigma [{\mathbf {x}}]\). Such decompositions are called Putinar-type certificates. In comparison, the preordering Q(S) also involves conic combinations of the products of the \(g_j\). In [15] a degree bound in \(O(\exp (\eta ^{-c}))\) is given for the quadratic module, where \(c>0\) is a constant depending on S.

In a recent work [16], Baldi & Mourrain are able to improve this result to obtain a bound with a polynomial dependency on \(\eta\). Roughly speaking, their method of proof relies on embedding the semialgebraic set S in a box \([-R, R]^n\) of large enough size \(R>0\), and then relating positivity certificates on S to those on \([-R, R]^n\). Our present result on \([-1, 1]^n\) then allows them to conclude their analysis. Their argument relies on the fact the constant C(nd) in Theorem 2 may be chosen to depend polynomially on the degree d of f. Such a dependence was not shown in the earlier work [10].

Note that it has been shown in [17] that the hierarchies of bounds based on Putinar type representations have finite convergence for generic problems. However, and perhaps somewhat surprisingly, their convergence analysis (for general problems) has remained a challenging problem.

We also wish to note that a polynomial degree bound was shown already in [18] for a slightly different hierarchy, based on Putinar-Vasilescu type representations, which give a decomposition in the quadratic module after multiplying the polynomial \(f+\eta\) by a suitable power \((1+\sum _{i=1}^n\Vert {\mathbf {x}}\Vert ^2)^k\) (under some conditions).

4.3 Putinar vs. Schmüdgen on the hypercube

As mentioned, Putinar-type hierarchies (making use of the quadratic module) are more commonly applied in practice than the Schmüdgen-type hierarchy (making use of the preordering) that we consider in this paper. It is therefore natural to consider the status of convergence results for Putinar-type hierarchies on the hypercube \(\mathrm{B}^{n}\).

Magron [19] shows a degree bound in \(O(\exp (c \eta ^{-1}))\) for Putinar-type certificates of \(f + \eta\) on \(\mathrm{B}^{n}\), improving the general result of [15] in this special caseFootnote 3. His result relies on the degree bound in \(O(\eta ^{-1})\) for Schmüdgen-type certificates on \(\mathrm{B}^{n}\) shown in [10]. Importantly, it is contingent on an unresolved conjecture also posed in [10]: For each \(n \in \mathbb {N}\) even, the polynomial \(2^{-n} (1-x_1)(1-x_2) \ldots (1-x_n) + \eta\) lies in the quadratic module of \(B^n\) truncated at degree n for \(\eta = \frac{1}{n(n+2)}\). This open question, which asks for an exact estimation of the constant that needs to be added to each generator of the preordering of \(Q([-1,1]^n)\) in order to ensure membership in the quadratic module, remains interesting in itself.

In principle, our new degree bounds for Schmüdgen-type certificates on \(\mathrm{B}^{n}\) could (slightly) improve the result of Magron (which relies on the weaker bounds of [10]). However, such an improvement would still depend exponentially on \(1/\eta\), in addition to being contingent on a conjecture. Furthermore, it seems to us that it is in any case superseded by the new result of Baldi & Mourrain [16] mentioned above, which (when specialized to the hypercube) shows degree bounds for Putinar-type certificates with polynomial dependency on \(1/\eta\). It is an open question whether the degree bound in \(O(1/\sqrt{\eta })\) we have shown here for Schmüdgen-type certificates on \(\mathrm{B}^{n}\) may be extended to Putinar-type certificates.

Lastly, we wish to mention that error bounds for the Putinar-type Lasserre hierarchy on the hypercube \(\mathrm{B}^{n}\) were already provided in [20]. There, however, the author considers a regime where the order r of the relaxation is fixed, while the dimension n tends to infinity. His results are therefore not directly comparable to those of the present paper or to those discussed above.

4.4 Negative results

We have so far focused our discussion on positive results concerning sum-of-squares representations. That is, results that give upper bounds on the error of Lasserre’s bound (5); or equivalently on the required degree of Schmüdgen-type positivity certificates. In order to put these results in context, it would be interesting to have complementary negative results, thus giving lower bounds on the convergence rate of the Lasserre hierarchy.

The only applicable negative result known to the authors is due to Stengle [21]. He considers the interval \([-1, 1] \subseteq \mathbb {R}\) with the semialgebraic description:

$$\begin{aligned}{}[-1, 1] = \{ x \in \mathbb {R}: (1-x^2)^3 \ge 0\}. \end{aligned}$$

Note that this description is different from the (more natural) description (3) that we have used in this paper. In particular, Theorem 8 does not apply to it. Writing \(Q((1-x^2)^3)_r\) for the corresponding (truncated) preordering, Stengle shows that

$$\begin{aligned} 1-x^2 + \eta \in Q((1-x^2)^3)_r \end{aligned}$$

only when \(r = \Omega (1/\sqrt{\eta })\). In other words, he shows for \(f(x) = 1-x^2\) that the Lasserre-type bound \({f_{({r})}}\) obtained by replacing \(Q(1-x^2)_r\) in (5) by \(Q((1-x^2)^3)_r\) satisfies:

$$\begin{aligned} f_{\min }- {f_{({r})}} = \Omega (1/r^2). \end{aligned}$$

On the one hand, it is remarkable that Stengle’s lower bound in \(\Omega (1/r^2)\) matches the upper bound in \(O(1/r^2)\) we show in this paper exactly. On the other hand, we emphasize that Stengle’s result relies heavily on the nonstandard description of \([-1, 1]\) as a semialgebraic set. We leave the question of proving negative results for the standard description (3) for future research.