An effective version of Schm\"udgen's Positivstellensatz for the hypercube

Let $S \subseteq \mathbb{R}^n$ be a compact semialgebraic set and let $f$ be a polynomial nonnegative on $S$. Schm\"udgen's Positivstellensatz then states that for any $\eta>0$, the nonnegativity of $f + \eta$ on $S$ can be certified by expressing $f + \eta$ as a conic combination of products of the polynomials that occur in the inequalities defining $S$, where the coefficients are (globally nonnegative) sum-of-squares polynomials. It does not, however, provide explicit bounds on the degree of the polynomials required for such an expression. We show that in the special case where $S = [-1, 1]^n$ is the hypercube, a Schm\"udgen-type certificate of nonnegativity exists involving only polynomials of degree $O(1 / \sqrt{\eta})$. This improves quadratically upon the previously best known estimate in $O(1/\eta)$. Our proof relies on an application of the polynomial kernel method, making use in particular of the Jackson kernel on the interval $[-1, 1]$.


Introduction
Consider the problem of computing the global minimum: of a polynomial f of degree d ∈ N over the hypercube B n := [−1, 1] n ⊆ R n . The program (1) can be reformulated as finding the largest λ ∈ R for which the function f − λ is nonnegative on B n . That is, writing P(B n ) ⊆ R[x] for the cone of all polynomials that are nonnegative on B n , we have: By replacing P(B n ) in (2) by a smaller subset of R[x] one may obtain lower bounds on f min . One way of obtaining such subsets is based on the following description of B n as a semialgebraic set: In light of this description, we see that the preordering Q(B n ) r , truncated at degree r, defined by 1 : satisfies Q(B n ) r ⊆ P(B n ) for all r ∈ N. Here, Σ[x] is the set of sum-of-squares polynomials (i.e., of the form p = p 2 1 + p 2 2 + . . . + p 2 m for certain p i ∈ R[x]). When no degree bounds are imposed (i.e., r = ∞) we obtain the full preordering Q(B n ) generated by the polynomials g i (x) = 1 − x 2 i (i ∈ [n]), which coincides with the quadratic module generated by the products i∈I g i (x) (I ⊆ [n]). We thus obtain the following hierarchy of lower bounds for f min , due to Lasserre [6]: If the program (5) is feasible, its maximum is attained. By definition, we have f min ≥ f (r+1) ≥ f (r) for all r ∈ N. Furthermore, we have lim r→∞ f (r) = f min , which follows directly from the following special case of Schmüdgen's Positvstellensatz.
Theorem 1 (Special case of Schmüdgen's Positivstellensatz [18]). Let f ∈ P(B n ) be a polynomial. Then for any η > 0 there exists an r ∈ N such that f +η ∈ Q(B n ) r .

Main result.
We show a bound on the convergence rate of the lower bounds f (r) to the global minimum f min of f over B n in O(1/r 2 ). Alternatively, our result can be interpreted as a bound on the degree r in Schmüdgen's Positivstellensatz of the order O(1/ √ η) of a positivity certificate for f + η when f ∈ P(B n ).
Theorem 2. Let f be a polynomial of degree d ∈ N. Then there exists a constant C(n, d) > 0, depending only on n and d, such that: Furthermore, the constant C(n, d) may be chosen such that it either depends polynomially on n (for fixed d) or it depends polynomially on d (for fixed n), see relation (20) for details.
1 Sometimes the index r is used in the literature to denote the truncation where all summands have degree at most 2r. For our treatment here it is more convenient to let r denote the truncation where all summands have degree at most r, the main reason being our use later of Theorem 8.

1.2.
Outline of the proof. Let f ∈ R[x] be a polynomial of degree d. To simplify our arguments and notation, we will work with the scaled function: for which F min = 0 and F max = 1. Since the inequality (6) is invariant under a positive scaling of f and adding a constant, it indeed suffices to show the result for the function F . The idea of the proof is as follows. Let ǫ > 0 and consider the polynomial F := F + ǫ. Let r ≥ d. Suppose that we are able to construct a (nonsingular) linear operator K r : R[x] r → R[x] r which has the following two properties: Then, by (P2), we have K −1 rF ∈ P(B n ) r . Indeed, as F is nonnegative on B n , F (x) = F (x) + ǫ is greater than or equal to ǫ for all x ∈ B n , and so (P2) tells us that after application of the operator K −1 r , the resulting polynomial K −1 rF is nonnegative on B n . Using (P1), we may then conclude thatF = We collect this in the next lemma for future reference. In what follows, we will construct such an operator K r for each r ≥ πd √ 2n and the parameter ǫ := C(n, d)/r 2 , where the constant C(n, d) will be specified later. Our main Theorem 2 then follows after applying Lemma 4.
We make use of the polynomial kernel method for our construction: after choosing a suitable kernel K r : R n × R n → R, we define the linear operator K r : R[x] r → R[x] r via the integral transform: Here, µ is the Chebyshev measure on B n as defined in (7) below. A good choice for the kernel K r is a multivariate version (see Section 3.1) of the well-known Jackson kernel K ja r of degree r (see Section 2.3). For this choice of kernel, the operator K r naturally satisfies (P1) (see Section 3.2). Furthermore, it diagonalizes with respect to the basis of R[x] given by the (multivariate) Chebyshev polynomials (see Section 2.2). Property (P2) can then be verified by analyzing the eigenvalues of K r , which are closely related to the expansion of K ja r in the basis of (univariate) Chebyshev polynomials (see Section 3.3). We end this section by illustrating our method of proof with a small example.

Example 5.
Consider the polynomial f (x) = 1−x 2 −x 3 +x 4 , which is nonnegative on [−1, 1]. For r ∈ N, let K r be the operator associated to the univariate Jackson kernel (11) of degree r, which satisfies (P1) (see Section 3.2). For η = 0.1, we observe that applying K −1 7 to f +η yields a nonnegative function on [−1, 1], whereas applying K −1 5 does not (see Figure 1). Applying the arguments of Section 1.2, we may thus conclude that f + η ∈ Q(B n ) 8 , but not that f + η ∈ Q(B n ) 6 . 1.3. Related work. The polynomial kernel method, which forms the basis of our analysis, is widely used in functional approximation, see, e.g., [21]. In the present context, the method has already been employed for the analysis of the sum-ofsquares hierarchy for optimization over the hypersphere S n−1 in [2] (where a rate in O(1/r 2 ) was shown as well) and for optimization over the binary cube {−1, 1} n in [16]. There, the authors use kernels that are invariant under the symmetry of S n−1 and {−1, 1} n , respectively.
In [3], the polynomial kernel method, and the Jackson kernel in particular, were used to analyze the quality of a related Lasserre-type hierarchy of upper bounds on f min over B n = [−1, 1] n , where one searches for a density in the truncated preordering Q(B n ) r minimizing the expected value of f over B n (showing again a convergence rate in O(1/r 2 )).
For a general compact semialgebraic set S, a polynomial f nonnegative on S and η > 0, existence of Schmüdgen-type certificates of positivity for f + η with degree bounds in O(1/η c ) was shown in [19], where c > 0 is a constant depending on S. This result uses different tools, including in particular a representation result for polynomial optimization over the simplex by Pólya [12] and the effective degree bounds by Powers and Reznick [14].
For the case of the hypercube 2 a degree bound in O(1/η) for Schmüdgen-type certificates is obtained in [4], thus showing that one can take c ≤ 1 in the above mentioned result of [19]. This result holds in fact for a weaker hierarchy of bounds obtained by restricting in (5) to decompositions of the polynomial f − λ involving factors σ J that are nonnegative scalars (instead of sums of squares), also known as Handelman-type decompositions (thus replacing the preordering Q(B n ) r by its subset H r of polynomials having a Handelman-type decomposition). The analysis in [4] relies on employing the Bernstein operator B r , which has the property of mapping a polynomial nonnegative over the hypercube to a polynomial in the set H rn ⊆ Q(B n ) rn .
In this paper, we can show a further improvement by using a different type of kernel operator; namely we show that we can take the constant c ≤ 1/2 in the special case S = [−1, 1] n .

Preliminaries
for the univariate polynomial ring, while reserving the bold-face notation R[x] = R[x 1 , x 2 , . . . , x n ] to denote the ring of polynomials in n variables.
denote the sets of univariate and n-variate sum-of-squares polynomials, respectively, consisting of all polynomials of the form p = p 2 1 + p 2 2 + · · · + p 2 m for certain polynomials p 1 , . . . , p m and m ∈ N. For a polynomial p ∈ R[x], we write p min , p max for its minimum and maximum over B n , respectively, and p ∞ := sup x∈B n |p(x)| for its sup-norm on B n .

Chebyshev polynomials.
Let µ be the normalized Chebyshev measure on B n = [−1, 1] n , defined by: Note that µ is a probability measure on B n , meaning that B n dµ = 1. We write ·, · µ for the corresponding inner product on R[x], given by: For k ∈ N, let T k be the univariate Chebyshev polynomial (see, e.g., [20]) of degree k, defined by: and that T 0 = 1. The Chebyshev polynomials satisfy the orthogonality relations: A univariate polynomial p may therefore be expanded as: For κ ∈ N n , we consider the multivariate Chebyshev polynomial T κ , defined by setting: The multivariate Chebyshev polynomials form a basis for R[x] and satisfy the orthogonality relations: Here, w(α) := |{i ∈ [n] : α i = 0}| denotes the Hamming weight of α ∈ N n . We use the notation N n d ⊆ N n to denote the set of n-tuples α ∈ N n with |α| = n i=1 α i ≤ d. As in the univariate case, we may expand any n-variate polynomial p as: 2.3. The Jackson kernel. For r ∈ N and for coefficients λ r k ∈ R to be specified below in (12), consider the kernel K ja r : R × R → R given by: We associate a linear operator K ja r : R[x] r → R[x] r to this kernel by setting: Using the orthogonality relations (8), and writing λ r 0 := 1, we see that: In other words, K ja r is a diagonal operator with respect to the Chebyshev basis of R[x] r , and its eigenvalues are given by λ r 0 = 1, λ r 1 , . . . , λ r r . In what follows, we set: with θ r = π r+2 . We then obtain the so-called Jackson kernel (see, e.g., [21]). The following properties of the Jackson kernel are crucial to our analysis. Proposition 6. For every d, r ∈ N with d ≤ r, we have: Proof. Nonnegativity of the Jackson kernel is a well-known fact, and is verified, e.g., in [3]. We check that the other properties (ii)-(iii) hold as well.

>0
− h cos(hθ r ) + sin(hθ r ) sin(θ r ) cos(θ r ) ≥0 by the induction assumption We conclude that λ r k > 0 for all k ∈ [r]. To see that λ r k ≤ 1, note that for all k ∈ N, T k (x) ≤ 1 for −1 ≤ x ≤ 1 and T k (1) = 1. We can thus compute: making use of the nonnegativity of K ja r (x, y) on [−1, 1] 2 for the inequality. Third property (iii): Using the expression of λ k r in (12) we have We now bound each trigonometric term using the fact that: When k = 1 we immediately get: Assume now 2 ≤ k ≤ d. Using (15) combined with cos(θ r ), sin(θ r ), sin(kθ r ) > 0 we obtain: and thus: This concludes the proof if k ≥ 2.

Proof of the main theorem
3.1. Construction of the linear operator K r . As noted before, in order to prove Theorem 2 it suffices to construct a linear operator K r : R[x] r → R[x] r that is nonsingular and satisfies (P1) and (P2). For this purpose we define the multivariate Jackson kernel K r : R n × R n → R by setting: where K ja r is the (univariate) Jackson kernel from (11). Now let K r be the corresponding kernel operator defined by: The operator K r is diagonal w.r.t. the (multivariate) Chebyshev basis, and its eigenvalues can be expressed in terms of the coefficients λ r k of the Jackson kernel, as the following lemma shows.
Lemma 7. The operator K r is diagonal w.r.t. the Chebyshev basis for R[x] r , and its eigenvalues are given by: Proof. For κ ∈ N n r , we see that: as required.
It follows immediately from Proposition 6(ii) that K r has only nonzero eigenvalues and thus is non-singular. We show that K r further satisfies (P1) and (P2).

3.2.
Verification of property (P1). Consider the following strengthening of Schmüdgen's Positivstellensatz in the univariate case.
By Proposition 6(i), for any y ∈ [−1, 1], the polynomial x → K ja r (x, y) is nonnegative on [−1, 1] and thus, by Theorem 8, it belongs to Q([−1, 1]) r+1 . This implies directly that the multivariate polynomial x → K r (x, y) = n i=1 K ja r (x i , y i ) belongs to Q(B n ) (r+1)n for all y ∈ [−1, 1] n . Lemma 9. The operator K r satisfies property (P1), that is, we have K r p ∈ Q(B n ) (r+1)n for all p ∈ P(B n ) r . Proof. One way to see this is as follows. Let {y i : i ∈ [N ]} ⊆ B n and w i > 0 (i ∈ [N ]) form a quadrature rule for integration of degree 2r polynomials over B n ; that is, 3.3. Verification of property (P2). We may decompose the polynomialF = F + ǫ into the multivariate Chebyshev basis (10): By Lemma 7, we then have: making use of the fact that λ 0 = 1 and |T κ (x)| ≤ 1 for all x ∈ B n . It remains to analyze the expression at the right-hand side of (18). First, we bound the size of |F κ | for κ ∈ N n .
Proof. Since µ is a probability measure on B n , we have F µ ≤ F ∞ ≤ 1. Using the Cauchy-Schwarz inequality and (9), we then find: To bound the parameter |1 − 1/λ r κ |, we first prove a bound on |1 − λ r κ |, which we obtain by applying Bernoulli's inequality.
Lemma 13. Assuming that r ≥ πd √ 2n, we have: Proof. Under the assumption, and using the previous lemma, we have |1−λ r κ | ≤ 1/2, which implies that λ r κ ≥ 1/2. We may then bound: Putting things together and using (18), Lemma 10 and Lemma 12 we find that: Hence K r satisfies (P2) with ǫ = C(n, d)/r 2 , where: In view of Lemma 4, we have thus proven Theorem 2. Finally, we can bound the constant C(n, d) in two ways. On the one hand, we have: resulting in a polynomial dependence of C(n, d) on d for fixed n. On the other hand, we have: resulting in a polynomial dependence of C(n, d) on n for fixed d. Namely, we have:

Concluding remarks
We have shown that the error of the degree r Lasserre-type bound (5) (20)). This question is motivated by the fact that for the analysis of the analogous hierarchies for the unit sphere in [2] and for the boolean hypercube in [16] the existence of such a constant (depending only on d) was in fact shown.
Relation to recent developments. Recently, there has been growing interest in obtaining a sharper convergence analysis for various Lasserre-type hierarchies for the minimization of a polynomial f over a semialgebraic set S = {x ∈ R n : g j (x) ≥ 0 (j ∈ [m])}. Our work thus contributes to this research area. We outline some recent developments.
We refer to the works [5,15] (and further references therein) for the analysis of hierarchies of upper bounds (obtained by minimizing the expected value of f on S with respect to a sum-of-squares density).
The most commonly used hierarchies of lower bounds are defined in terms of sums-of-squares decompositions in the quadratic module of S, being the set of conic combinations of the form σ 0 + m j=1 σ j g j with σ j ∈ Σ[x]. Such decompositions are called Putinar-type certificates. In comparison, the preordering Q(S) also involves conic combinations of the products of the g j . In [11] a degree bound in O(exp(η −c )) is given for the quadratic module, where c > 0 is a constant depending on S.
In a recent work [1], Baldi & Mourrain are able to improve this result to obtain a bound with a polynomial dependency on η. Roughly speaking, their method of proof relies on embedding the semialgebraic set S in a box [−R, R] n of large enough size R > 0, and then relating positivity certificates on S to those on [−R, R] n . Our present result on [−1, 1] n then allows them to conclude their analysis. Their argument relies on the fact the constant C(n, d) in Theorem 2 may be chosen to depend polynomially on the degree d of f . Such a dependence was not shown in the earlier work [4].
Note that it has been shown in [10] that the hierarchies of bounds based on Putinar type representations have finite convergence for generic problems. However, and perhaps somewhat surprisingly, their convergence analysis (for general problems) has remained a challenging problem.
We also wish to note that a polynomial degree bound was shown already in [8] for a slightly different hierarchy, based on Putinar-Vasilescu type representations, which give a decomposition in the quadratic module after multiplying the polynomial f + η by a suitable power (1 + n i=1 x 2 ) k (under some conditions). Putinar vs. Schmüdgen on the hypercube. As mentioned, Putinar-type hierarchies (making use of the quadratic module) are more commonly applied in practice than the Schmüdgen-type hierarchy (making use of the preordering) that we consider in this paper. It is therefore natural to consider the status of convergence results for Putinar-type hierarchies on the hypercube B n . Magron [7] shows a degree bound in O(exp(cη −1 )) for Putinar-type certificates of f + η on B n , improving the general result of [11] in this special case 3 . His result relies on the degree bound in O(η −1 ) for Schmüdgen-type certificates on B n shown in [4]. Importantly, it is contingent on an unresolved conjecture also posed in [4]: For each n ∈ N even, the polynomial 2 −n (1 − x 1 )(1 − x 2 ) . . . (1 − x n ) + η lies in the quadratic module of B n truncated at degree n for η = 1 n(n+2) . This open question, which asks for an exact estimation of the constant that needs to be added to each generator of the preordering of Q([−1, 1] n ) in order to ensure membership in the quadratic module, remains interesting in itself.
In principle, our new degree bounds for Schmüdgen-type certificates on B n could (slightly) improve the result of Magron (which relies on the weaker bounds of [4]). However, such an improvement would still depend exponentially on 1/η, in addition to being contingent on a conjecture. Furthermore, it seems to us that it is in any case superseded by the new result of Baldi & Mourrain [1] mentioned above, which (when specialized to the hypercube) shows degree bounds for Putinar-type certificates with polynomial dependency on 1/η. It is an open question whether the degree bound in O(1/ √ η) we have shown here for Schmüdgen-type certificates on B n may be extended to Putinar-type certificates.
Lastly, we wish to mention that error bounds for the Putinar-type Lasserre hierarchy on the hypercube B n were already provided in [9]. There, however, the author considers a regime where the order r of the relaxation is fixed, while the dimension n tends to infinity. His results are therefore not directly comparable to those of the present paper or to those discussed above.
Negative results. We have so far focused our discussion on positive results concerning sum-of-squares representations. That is, results that give upper bounds on the error of Lasserre's bound (5); or equivalently on the required degree of Schmüdgen-type positivity certificates. In order to put these results in context, it would be interesting to have compl ementary negative results, thus giving lower bounds on the convergence rate of the Lasserre hierarchy.
The only applicable negative result known to the authors is due to Stengle [17]. He considers the interval [−1, 1] ⊆ R with the semialgebraic description: Note that this description is different from the (more natural) description (3) that we have used in this paper. In particular, Theorem 8 does not apply to it. Writing Q((1 − x 2 ) 3 ) r for the corresponding (truncated) preordering, Stengle shows that In other words, he shows for f (x) = 1 − x 2 that the Lasserre-type bound f (r) obtained by replacing Q(1 − x 2 ) r in (5) by Q((1 − x 2 ) 3 ) r satisfies: f min − f (r) = Ω(1/r 2 ).
On the one hand, it is remarkable that Stengle's lower bound in Ω(1/r 2 ) matches the upper bound in O(1/r 2 ) we show in this paper exactly. On the other hand, we emphasize that Stengle's result relies heavily on the nonstandard description of [−1, 1] as a semialgebraic set. We leave the question of proving negative results for the standard description (3) for future research.