Convergence rates for sums-of-squares hierarchies with correlative sparsity

This work derives upper bounds on the convergence rate of the moment-sum-of-squares hierarchy with correlative sparsity for global minimization of polynomials on compact basic semialgebraic sets. The main conclusion is that both sparse hierarchies based on the Schm\"udgen and Putinar Positivstellens\"atze enjoy a polynomial rate of convergence that depends on the size of the largest clique in the sparsity graph but not on the ambient dimension. Interestingly, the sparse bounds outperform the best currently available bounds for the dense hierarchy when the maximum clique size is sufficiently small compared to the ambient dimension and the performance is measured by the running time of an interior point method required to obtain a bound on the global minimum of a given accuracy.


Introduction
This work provides rates of convergence for the sums-of-squares hierarchy with correlative sparsity.For a positive n ∈ N, consider the polynomial optimization problem where f is an element of the ring R[x] of polynomials in x = (x 1 , . . ., x n ), and S(g) is a basic compact semialgebraic set determined by a finite collection of polynomials g = {g 1 , . . ., gk} by S(g) = {x ∈ R n : g i (x) ≥ 0, i = 1, . . ., k}.An approach to attack this problem, first proposed by Lasserre [8] and Parrilo [18], is as follows: Imagine we knew that f (x) − λ could be written as f (x) − λ = k j=0 σ j g j (x) or f (x) − λ = J⊆{1,..., k} σ J j∈J g j (x), with g 0 (x) = 1 and σ j and σ J being sum-of-squares (SOS) polynomials.Then the right-hand sides of each of these equations would be clearly nonnegative on S(g), so we would know that f min ≥ λ.By bounding the degree of the SOS polynomials, we obtain the following two hierarchies of lower bounds: lb q (f, r) = max{λ ∈ R : f − λ = k j=0 σ j g j , deg(σ j g j ) ≤ 2r, σ j ∈ Σ[x]}, lb p (f, r) = max{λ ∈ R : f − λ = J⊆{1,..., k} σ J j∈J g j , deg σ J j∈J g j ≤ 2r, σ J ∈ Σ[x]}, where Σ[x] is the convex cone of all sum-of-squares polynomials.These satisfy lb q (f, r) ≤ lb p (f, r) ≤ f min .The lower bound lb q (f, r) is associated to a so-called quadratic module certificate, while lb p (f, r) corresponds to a preordering certificate; this terminology is justified by the definitions in Section 1.2.The well-known Putinar and Schmüdgen Positivstellensätze [19,21], respectively, guarantee that these bounds converge to f min as r → +∞, the former with the additional assumption that the associated quadratic module be Archimedian 1 .Here we will prove sparse quantitative versions of these results.
Polynomial optimization schemes have generated substantial interest due to their abundant fields of application; see for example [11,10].The first proof of convergence, without a convergence rate, was given by Lasserre [8] using the Archimedian positivstellensatz due to Putinar [19].Eventually, rates of convergence were obtained; initially in [16] these were logarithmic in the degree of the polynomials involved, and later on they were improved [3,12,23,1] (using ideas of [20,2,17]) to polynomial rates; refer to Table 1.The crux of the argument used to obtain those rates is a bound of the deformation incurred by a polynomial strictly-positive on the domain of interest, as it passes through an integral operator that closely approximates the identity and is associated to a strictly-positive polynomial kernel that is itself composed of sums of squares and similar to the Christoffel-Darboux and Jackson kernels (see Definition 9).
The techniques used to obtain these results generally involve linear operators on the space of polynomials (mostly Christoffel-Darboux kernel operators; see [23]) that are close to the identity and that, for positive polynomials, are easily (usually, by construction) proved to output polynomials that are sums of squares and/or of their products with the functions in g.All of these results deal, however, with the dense case.O(1/r2 ) preordering [23] Table 1: Known results on the asymptotic error of Lasserre's hierarchies of lower bounds; based in part on [23, Table 1].The domain S(g) is assumed to be compact in all cases, ǫ ∈ [0, 1/2), c > 0, S n−1 is the unit sphere, B n is the unit ball, ∆ n is the standard simplex.
In this work, we treat the case where the problem possesses the so-called correlative sparsity, where each function g i depends only on a certain subset of variables and the function f decomposes as a sum of functions depending only on these subsets of variables.This structure can be exploited in order to define sparse lower bounds that are cheaper to compute but possibly weaker.Nevertheless, these sparse lower bounds allow one to tackle large-scale polynomial optimization problems arising from various applications including roundoff error bounds in computer arithmetic, quantum correlations and robustness certification of deep networks; see the recent survey [13].In [9] Lasserre proved that these sparse lower bounds converge as the degree of the SOS multipliers tends to infinity provided the variable groups satisfy the so-called running intersection property (RIP).A shorter and more direct proof was provided in [4], and was adapted in [15] to obtain a sparse variant of Reznick's Positivstellensatz.In this work, we show polynomial rates of convergence for sparse hierarchies based on both Schmüdgen and Putinar Positivstellensätze. Importantly, we obtain rates that depend only on the size of the largest clique in the sparsity graph rather than the overall ambient dimension.This allows the perhaps surprising conclusion that, asymptotically, the sparse hierarchy is more accurate than the dense hierarchy for a given computation time of an optimization method, provided that the size of the largest clique is no more than the square root of the ambient dimension.This assumes that the running time of the optimization method is governed by the size of the largest PSD block and the number of such blocks in the semidefinite programming reformulations of the dense and sparse SOS problems which is the case for the interior point method as well as the most commonly used first-order methods.
To the best of our knowledge, these are the first quantitative results of this kind.Our proof techniques rely on an adaption of [4] and utilize heavily the recent results from [12] and [1], and can thus be seen as a generalization of these works to the sparse setting.
The results will be detailed below in Section 1.2 and further discussed in Section 1.2.1, after a brief interlude to establish some notations in Section 1.1.Some machinery will be developed in Sections 2 and 3, regarding variants of the Jackson kernel and some approximation theory, respectively, and the proofs of the main theorems are presented in Section 4.

Notations
Denote by R the set of real numbers, by N the set of positive integers, and by N 0 = {0, 1, . . .} the set of nonnegative integers.Denote by e 1 , . . ., e n the vectors of the standard basis of Euclidean space R n .
For a Lipschitz continuous function f We take this to be at least 1 to simplify estimates below.A multi-index I = (i 1 , . . ., i n ) ∈ N n 0 is an n-tuple of nonnegative integers i k , and its weight is denoted by For a multi-index I = (i 1 , . . ., i n ) ∈ N n 0 and J ⊂ {1, . . ., n}, we will write I ⊆ J to indicate that for all 1 ≤ k ≤ n if i k > 0 then k ∈ J. Similarly, given a multi-index I ∈ N n 0 and a subset J ⊆ N 0 , we let I J be the multi-index whose k-th entry is either i k if k ∈ J or 0 if k / ∈ J.For two multi-indices I and I ′ , we will write I ≤ I ′ if the entrywise inequalities i k ≤ i ′ k hold for all 1 ≤ k ≤ n.We will distinguish two special multi-indices: 1 = (1, 1, . . ., 1) and 2 = (2, 2, . . ., 2).
We will denote x I = x i1 1 x i2 2 . . .x in n .Also, we denote the Hamming weight of I ∈ N n 0 by In other words, w(I) is the number of nonzero entries in I.
We will denote the space of polynomials in n variables by R[x], and within this set we will distinguish the subspace R[x] d of polynomials of total degree at most d.We will denote, for a polynomial p(x) = I c I x I , by deg p the vector whose i-th entry is the degree of p in x i , Given a subset J ⊂ {1, . . ., n}, we let R[x J ] denote the set of polynomials in the variables {x j } j∈J .For a multi-index r = (r 1 , . . ., r n ) ∈ N n 0 , we let R[x] r denote the set of polynomials p such that, if p(x) = I c I x I for some real numbers c I ∈ R, then for each I = (i 1 , . . ., i n ) with c I = 0 we also have Given a set X, we will write X n to denote the product We will denote by • ∞ the supremum norm on [−1, 1] n .The notation ⌈s⌉ stands for the least integer ≥ s.

Results
Let Σ[x J ] denote the set of polynomials p that are sums of squares of polynomials in R[x J ], that is, of the form For convenience, denote also g 0 = 1.To the collection g, a multi-index r, we associate the (variable-and degree-wise truncated) quadratic module associated to the collection g and a multi-index r be Similarly, we have a (variable-and degree-wise truncated) preordering Definition 1.A collection {J 1 , . . ., J ℓ } of subsets of {1, . . ., n} ⊃ J j satisfies the running intersection property if for all 1 ≤ k ≤ ℓ − 1 we have J j ⊂ J s for some s ≤ k.
Discussion.Solving the dense problem considered by [12] using the sum-of-squares hierarchy reduces to a semidefinite program with the largest PSD block of size n+r r that typical optimization methods (e.g., interior point or first order) can solve in an amount of time proportional to a power of The bounds we find in Theorem 2 -in the case in which J j is the largest of the sets J 1 , . . ., J ℓ -give a bound for the complexity of the leading term as (the same power of) The reason we have 2 ) and there are at most |J j | values of i with r j,i = 0.
) for all j = 1, . . ., ℓ, then we have Thus, if the size of the largest clique is of the order of square root of the ambient dimension n or smaller, the sparse bound outperforms the best available dense bound if the performance is measured by the amount of time required by an optimization method to find a bound of a given accuracy ε.
Proof of Proposition 3. By Lemma 16 we have, as ε ց 0, and this tends to 0 if the sparsity of the polynomial p is such that n > |J j |(|J j | + 3).
Assume that S(g) ⊂ [−1, 1] n and that there exist polynomials s j,i ∈ R[x Jj ] rj , j = 1, . . ., ℓ, i ∈ {0} ∪ K j , such that the Archimedean conditions hold; that is to say, we assume that 1 − i∈Jj x 2 i ∈ Q rj ,Jj (g Kj ).Then there are constants C j > 0, depending only on g, J 1 , . . ., J ℓ , such that, if p ≥ ε > 0 on S(g), we have as long as, for all 1 ≤ j ≤ ℓ and 1 ≤ k ≤ n, The proof of the theorem can be found in Section 4.2.
Discussion.By the same arguments we used in the discussion at the end of the previous section, if we assume , the bounds we find in Theorem 4 give a bound for the complexity of the leading term as (a power of) The assumption L 1 = • • • = Lk = 1 is realized for example when the so-called constraint qualification condition that, at each point x ∈ S(g) all the active constraints g i1 , . . ., g i l (i.e., those satisfying g ij (x) = 0) have linearly independent gradients ∇g i1 (x), . . ., ∇g i l (x)), holds; this latter statement is proved in [1,Thm 2.11].
In this case, we have: Again the implication is that the sparse bound asymptotically outperforms the dense bound provided that the largest clique is sufficiently small.

Proof of Proposition 5. Lemma 16 gives
Organization of the paper.The proof of Theorem 2 can be seen as a variable-separated version of the proof in [12], which relies on the Jackson kernel.Therefore in Section 2 we derive the suitable ingredients for sparse Jackson kernels while carefully taking into account each variable separately.
A strategy is also required to write a positive polynomial p that is known to be a sum Section 4 gives the proofs of Theorems 2 and 4, together with the statement and proof of Lemma 16, which was used in the proofs of Propositions 3 and 5 above.

The sparse Jackson kernel
The measure µ n on the box [−1, 1] n defined by and the norm f µn = f, f µn .For k = 0, 1, . . ., we let T k ∈ R[x] be the univariate Chebyshev polynomial of degree k, defined by and For a multi-index I = (i 1 , . . ., i n ), we let be the multivariate Chebyshev polynomials, which then satisfy (see for example This means that, if we set all the nonzero numbers λ equal to 1, then K Λ is the identity operator in the linear span of and Λ r = {(λ r I , I) : I ≤ r}.Then K Jac r = K Λr is the (r-adapted) Jackson kernel, and its associated linear operator K Λr will be denoted K Jac r .
Let us prove property P1.Take a finite subset {z i } i of [−1, 1] n and a corresponding set of positive weights {w i } i ⊂ R giving a quadrature rule for integration of polynomials q ∈ R[x J ] r , so that Then we have, for p as in the statement of P1, with w i p(z i ) ≥ 0. Since, by Lemma 9(viii) and Theorem 8 below, Here, r = (r 1 , . . ., r n ) and I = (i 1 , . . ., i n ).
).The rest of this section is devoted to results used in the proof of Theorem 6.
where Σ d is the cone of sum-of-squareds of polynomials of degree at most d.Lemma 9. Let r ∈ N n 0 be a multi-index.The operator K Jac r defined above has the following properties: In particular, vi.For I = (i 1 , . . ., i n ) and r = (r 1 , . . ., r n ) in N n 0 that verify (7), we have . ., r n ) be a multiindex such that I ≤ r for all I ∈ I p , and assume that, for all is verified.Then we have Proof.Throughout, we follow [12].
Item (ii) is immediate from the definitions and (6).Item (i) follows from item (ii) and the fact that Observe that item (ii) means that K Jac r is diagonal in R[x J ] r , so in order to prove item (iii) it suffices to show that λ r I > 0 for all I ≤ r, I ⊆ J.This follows immediately from item (iv ), which in turn follows from the definition of λ r I and [12, Proposition 6(ii)], which shows that 0 < λ r k ≤ 1 for all 0 ≤ k ≤ r.Similarly, by [12,Proposition 6(iii)] we have that, if k ≤ r, then j /(r j + 2) 2 and γ = max j γ j , we also have, using Bernoulli's inequality [12, Lemma 11] This shows item (v ).Using it, we can prove item (vi) as follows: condition (7) implies, by item (v ), that |1 − λ r I | ≤ 1/2, and hence |λ r leveraging item (v ) again.
Let us show item (vii).From items (ii) and (iii), we have Plugging in the estimate from item (vi), we get where we have also used which follows from (6).

Sparse approximation theory
For 1 ≤ i ≤ n and a function f Theorem 10.There is a constant C Jac > 0 such that the following is true.Let f ∈ C 0 ([−1, 1] n ) be a Lipschitz function with variable-wise Lipschitz constants Lip 1 f, . . ., Lip n f .Then for each multi-index Proof.Jackson [5, p. 2-6] proved that there is a constant C > 0 such that, if g : R → R is Lipschitz and π-periodic, g(0) = g(π), then For a multivariate Lipschitz function g : R n → R and a multi-index m = (m 1 , . . ., m n ) ∈ N n , let .
Then we have, using the triangle inequality and the single-variable inequality (9) at each step, The function j (sin m j θ j /m j sin θ j ) 4 is a polynomial of degree m j in cos θ j (cf.[5, p. 3]).If we replace f with its Lipschitz extension to [−2, 2] n and apply the results above to g(θ) = f (2 cos θ 1 , . . ., 2 cos θ n ) we get a polynomial L n (g)(θ) in cos θ 1 , . . ., cos θ n satisfying the above inequality.Thus is a polynomial with deg p ≤ m that satisfies (cf.[5, p. 13-14]) since Lip i g ≤ 2 Lip i f .This proves the first statement, setting C Jac = 2C.We also have and, by linearity and monotonicity of L n , Lemma 11 (a version of [4, Lemma 3]).Let J 1 , . . ., J ℓ be subsets of {1, . . ., n} satisfying the running intersection property.Suppose with J l as in (1), and where Dj,m is the multi-index whose k-th entry equals D j,m if k ∈ J j = J j ∩ k<j J i and 0 otherwise, and the maximum is taken entry-wise.
Finally, we have Proof.In order to prove the result by induction, let us first consider the case ℓ = 2.In this case, ε = ǫ and ǫ > 2η.Assume that J 1 ∩ J 2 = ∅.For a subset J ⊂ {1, . . ., n}, let π J denote the projection onto the variables with indices in J, that is, The function g is Lipschitz continuous on [−1, 1] J1∩J2 .To see why, let x, x ′ ∈ [−1, 1] J1∩J2 and pick y, y ′ ∈ π J1\J2 (S(g)) ⊆ [−1, 1] J1\J2 minimizing f 2 (x, y) and f 2 (x ′ , y ′ ), respectively.Then where Lip(f 2 ) denotes the Lipschitz constant of f 2 on [−1, 1] n .The function g also satisfies The second inequality follows from the definition of g, and the first one can be shown taking (x, y, z) ∈ S(g) with x ∈ [−1, 1] J1∩J2 , y ∈ [−1, 1] J1\J2 , and z ∈ [−1, 1] J2\J1 , taking care to pick y only after x has been chosen, in such a way that the minimum is in the definition of g is realized there, that is, g(x) = f 2 (x, y) − ε/2 holds (this is possible by compactness of S(g) and continuity of f ); then we have Set m j = 0 for all other 0 ≤ j ≤ n, and m = (m 1 , . . ., m n ) = D2,2 .Then Theorem 10 gives a polynomial p 2 such that Let h 1 := f 1 + p 2 and h 2 := f 2 − p 2 so that f = h 1 + h 2 , h 1 ≥ η and h 2 ≥ η on S(g), and h j ∈ R[x Jj ].The bound (11) follows from the definition of h j and (12).Observe also that, by the last part of Theorem 10, Finally, we have For the induction step, let ℓ ≥ 3 and set f = on S(g), and with deg p ℓ = Dℓ,ℓ , Lip p ℓ ≤ 2 Lip f ℓ .and,analogously to (13), for the largest j with J ℓ ⊂ J j (which must happen for some j, by the running intersection property; see Definition 1) and The induction hypothesis applies to the polynomial This means that there are polynomials h 1 , . . ., h ℓ−1 such that Observe that the second index in each Dk,ℓ is ℓ because of the accumulation of Lipschitz constants resulting from the estimate (15).

Proof of Theorem 2
Theorem 2 will follow from Theorem 13, which presents a more detailed bound, together with the definitions of L, M, J.
Theorem 13.Let n > 0 and ℓ ≥ 2, and let r 1 , r 2 , . . ., r ℓ ∈ N n , r j = (r j,1 , . . ., r j,n ), be nowhere-vanishing multi-indices.Let also J 1 , . . ., J ℓ be subsets of {1, . . ., n} satisfying the running intersection property.Let p = p 1 + p 2 + • • • + p ℓ be a polynomial that is the sum of finitely many polynomials as long as, for all 1 ≤ j ≤ ℓ and all and Proof of Theorem 13.Let and apply Lemma 11 (with g = 0, so that S(g) where Dl := (δ j∈J l D l ) ℓ j=1 , δ j∈J l equals 1 if j ∈ J l and 0 otherwise, and Apply Corollary 7 to each of the polynomials (recall that I Hj is the set of multiindices I = (i 1 , . . ., i n ) corresponding to exponents of x 1 , . . ., x n in the terms appearing in H j and w(I) is the number of nonzero entries in I) we have when applying the corollary, note that ( 17) implies (7) in this case because, if I = (i 1 , . . ., i n ) ∈ I Hj , then by the definition of Dl .Observe that ( 20) means also that Note that we have deg H j = deg h j , I Hj \ I hj = ∅, and I hj \ I Hj ⊆ {(0, . . ., 0)} since the powers of all terms in h j and in H j are the same, with the only possible exception of the constant term, which may appear in one of these and vanish in the other.Now, going back to our choice ( 18) of η and using ( 16), we have for all j ∈ {1, . . ., ℓ}.Notice that after separating two of the |J j | + 2 terms in the product and removing the +1 factor from them, we obtain • max 1≤k≤n max deg p j , Dj , . . ., Dℓ 2 k min 1≤k≤n (r j,k + 2) 2  , where we have used the definition of Dl , as well as the fact that each factor has been replaced by one that is smaller or equal, the original expression containing the maximum of them on each factor.Next, use deg H j ≤ max(deg p j , Dj , . . ., Dℓ ) as well as w(I) ≤ |J j | for every multi-index I in I Hj , which is true because With this bound for η, together with the fact that min so that, by ( 19) and ( 21), h j ∈ P r,Jj ({1 − x 2 i } i∈Jj ) and hence

Proof of Theorem 4
Overview.For this proof, we will first use the sparse approximation theory developed in Section 3 to represent the sparse polynomial p as a sum of positive polynomials h 1 + • • • + h ℓ , each of them depending on a clique of variables J j .We then work with each of these polynomials h j using the tools developed by Baldi-Mourrain [1] to write h j = fj + qj , where qj is by construction obviously an element of the corresponding quadratic module, and fj is strictly positive on [−1, 1] n .Thus Corollary 7 can be applied to fj , which shows that it belongs to the preordering, and then one argues (also following the ideas of [1]) that the preordering is contained in the quadratic module, hence giving that fj is contained in the latter as well.In sum, this shows that h j is in the quadratic module, which is what want.Most of the heavy lifting goes to estimating the minimum of fj to justify the application of Corollary 7.
Proof of Theorem 4. For each j = 1, . . ., ℓ, pick C j > 0 such that the following two bounds are satisfied: Note that these only depend on g and J 1 , . . ., J ℓ .Apply Lemma 11 to f = p, f i = p i , ǫ = 3ε/2ℓ, η = ε/2(ℓ + 2) to get polynomials h 1 , . . ., h ℓ such that and deg In the dense setting, Baldi-Mourrain [1] construct a family of single-variable polynomials providing useful approximation properties that we have adapted to the (separated-variables) sparse setting and collected in Lemma 14.To state this, we set, for all j = 1, . . ., ℓ and for (t j , m j ) ∈ N × N as well as for s j > 0, f j,sj ,tj ,mj (x) := h j (x) − s j q j,tj ,mj (x).
Let us give an idea of what these functions do.The single-variable polynomial h tj ,mj is of degree m j and roughly speaking approximates the function that equals 1 on (−∞, 0) and 1/t j elsewhere.Thus q j,tj ,mj almost vanishes (for large t j ) on S(g Kj ), and outside of this domain it is roughly a sum of multiples of the negative parts of g Kj 's entries.
Item (ii) follows2 from deg h tj ,mj = m j and the definition of q j,tj ,mj .The proofs of the other items can be found in the indicated sources.Take s j , t j , m j , fj , qj for j = 1, . . ., ℓ satisfying the properties (i)-(v) collected in Lemma 14. Continuing with the proof of Theorem 4, denote Since fj ≥ ε/4(ℓ + 2) on [−1, 1] n , we may apply Corollary 7 with p = F j to get that as long as and (7) are verified.In this context, the condition (7) required in Corollary 7 is equivalent to the theorem's assumption (5); let us show how this works: First, using i k ≤ (deg Now use equation (25) to get that this is where we have also used the fact that and the last estimate from (24).Next, use (10), where we have additionally used equation ( 22) and our assumption (5); this is precisely (7).We would next like to show that Let us first explain why this will be enough to prove the theorem.Once we have (32), by Lemma 15 fj is also contained in Q rj +2,Jj (1 − i∈Jj x 2 i ), and it is our assumption (3) that In other words, we have fj ∈ Q rj +2,Jj (g Kj ).
By Lemma 14(ii), qj also belongs to Q rj +2,Jj (g Kj ), so we can conclude that which is equivalent to the conclusion of the theorem.
Thus we need to prove (32).Let us explain why, in order to obtain this conclusion we just need to show that If (33) were true, we would then have So in view of (30) and (34), we would indeed have fj ∈ P rj ,Jj ({1 − x 2 i } i∈Jj ), which is (32).Let us now collect some preliminary estimates that will help us to prove (33).For I ∈ I Fj we have w(I) ≤ |J j | so we estimate 2 w(I)/2 ≤ 2 |Jj|/2 .( We also estimate Now we will estimate max [−1,1] n fj − min [−1,1] n fj from above.Using Lemma 14(i) and (iv), we get Use ( 24), ( 25) and (31) to see that this is max For the last line, we have used the definition of Dl,m as in Lemma 11.
This shows that (33) holds, and hence also (32), which proves the theorem.
Lemma 15 ([1, Lemma 3.8]).Let J ⊂ {1, . . ., n}, and let r = (r 1 , . . ., r n ) be a multi-index such that r i > 0 only if i ∈ J.The quadratic module Q r+2,J (1 − i∈J x 2 i ) contains the preordering P r,J ({1 − x 2 i } i∈J ), Proof.This follows from The increase of 2 in r stems from the fact that deg(1 − x 2 i ) = 2 while the degree of the right-hand side above is 4. − (c + dε −q ) log(c + dε −q ) + dε −q log(dε −q ) (40) In this quotient, both the numerator and the denominator tend to ±∞, so we can apply a version of the l'Hôpital rule, which states that, if the limit of the quotient of their derivatives exists, then the original limit above equals that limit.Taking the limit of the quotient of the derivatives gives lim

[ 25 , 6 )
§II.A.1]), for multiindices I and I ′ , deg T I = |I| and T I , T I ′ µn = 0, I = I ′ , 2 −w(I) , I = I ′ .(Thus p ∈ R[x] d can be expanded as p = |I|≤d 2 w(I) p, T I µn T I .If we let, for a finite collection Λ ⊆ R × N n 0 of pairs (λ, I) of a real number λ and a multi-index I, K Λ (x, y) = (λ,I)∈Λ 2 w(I) λ T I (x)T I (y), x, y ∈ R n , then, for any p ∈ R[x], we have
θ i+1 , . . ., θ n ) The definition of f j,sj ,tj ,mj is engineered to obtain a polynomial that is almost equal to h j in S(g Kj ) yet remains positive throughout [−1, 1] n .Instead of going into the details of the construction, we record the properties we need in the following lemma.Lemma 14 (a version of [1, Props.2.13, 3.1, and 3.2, Lem.3.5]).Assume (2) and the Archimedean conditions (3) are satisfied.Then for each j = 1, . . ., ℓ there are values s j , t j , m j of the parameters involved in Definition (26) and Definition (27), such that the following holds with the shorthands Additionally, we obtain the following estimate deg fj ≤ max deg h j , deg qj ≤ max deg p j , Dj,ℓ , . . ., Dℓ,ℓ , (2m j + 1) max k∈Kj deg g k ≤ max deg p j , Dj,ℓ , . . ., Dℓ,ℓ , (