Convergence rates of RLT and Lasserre-type hierarchies for the generalized moment problem over the simplex and the sphere

We consider the generalized moment problem (GMP) over the simplex and the sphere. This is a rich setting and it contains NP-hard problems as special cases, like constructing optimal cubature schemes and rational optimization. Using the Reformulation-Linearization Technique (RLT) and Lasserre-type hierarchies, relaxations of the problem are introduced and analyzed. For our analysis we assume throughout the existence of a dual optimal solution as well as strong duality. For the GMP over the simplex we prove a convergence rate of $O(1/r)$ for a linear programming, RLT-type hierarchy, where $r$ is the level of the hierarchy, using a quantitative version of P\'olya's Positivstellensatz. As an extension of a recent result by Fang and Fawzi [Math. Program., 2020, https://doi.org/10.1007/s10107-020-01537-7] we prove the Lasserre hierarchy of the GMP [Math. Program., Vol. 112, 65-92, 2008] over the sphere has a convergence rate of $O(1/r^2)$. Moreover, we show the introduced linear RLT-relaxation is a generalization of a hierarchy for minimizing forms of degree $d$ over the simplex, introduced by de Klerk, Laurent and Parrilo [J. Theoretical Computer Science, Vol. 361, 210-225, 2006].


Introduction
For a compact set K ⊂ R n let M(K) denote the (infinite-dimensional) vector space of signed finite Borel measures with support contained in K. Let [m] = {1, . . ., m} for m ∈ N. The generalized moment problem (GMP) is an optimization problem of the following form: where m ∈ N, b i ∈ R for all i ∈ [m], M(K) + is the convex cone of positive finite Borel measures supported on K, and f 0 , f 1 , . . ., f m are integrable over K with respect to all µ ∈ M(K) + .We will always assume the GMP (1) has a feasible solution, which implies that it has an optimal solution as well (see Theorem 1).
The constraint K dµ(x) ≤ 1 essentially means that we know an upper bound on the measure of K for the optimal solution, since, in this case, we may scale the functions f i a priori to satisfy this condition.
The GMP is a conic linear optimization problem whose duality theory is well understood, see e.g.[14].A wide range of optimization problems can be modeled as an instance of the GMP.The list includes problems from optimization, probability, financial economics and optimal control to name only a few, see e.g.[9].For polynomial data, i.e., f i ∈ R[x] for all i = 0, 1, . . ., m and the set K being a basic closed semialgebraic set, Lasserre [8] introduced a monotone nondecreasing hierarchy of semidefinite programming (SDP) relaxations of (1).For a survey on SDP-based approximation hierarchies and their error analysis, see [3].
In this paper, we will consider the case where K is the standard (probability) simplex + is the nonnegative orthant, or the Euclidean sphere Our main result is to establish a rate of convergence for the Lasserre hierarchy [8] for the GPM on the sphere, and for a related, RLT-type linear programming hierarchy for the GPM on the simplex.This RLT hierarchy is in fact a generalisation of LP hierarchies for polynomial optimization on the simplex, as introduced by Bomze and De Klerk [2], and De Klerk, Laurent and Parrilo [4].
Outline of the paper.First we introduce some notation in section 1.1.In section 1.2 we review the duality theory of the GMP.A brief overview of possible applications of our setting is given in section 1.3.For K the simplex we introduce a linear relaxation hierarchy in this setting in section 2 and prove a convergence rate of O(1/r).Section 3 contains the new convergence analysis of the Lasserre [9] SDP hierarchies of the GPM on the sphere.In Section 4 we take a mathematical view of how the optimal measure is obtained in the limit as the level of the hierarchies approaches infinity.In section 5 we explain how our LP hierarchy is a generalization of an approximation hierarchy for the problem of minimizing a form of degree d over the simplex introduced by De Klerk, Laurent and Parrilo [4] based on earlier results obtained by Bomze, De Klerk [2].

Notation
Let N = {0, 1, 2, . . .} denote the set of nonnegative integers, N + = N \ {0} and N n t the set of sequences α ∈ N n for which |α| = n i=1 α i ≤ t for t ∈ N.For α ∈ N n , x α denotes the monomial x α1 1 . . .x αn n and its degree is |α|.The ring of multivariate polynomials in n variables t is its subspace of polynomials of degree at most t.The (total) degree of a polynomial is the maximal degree of its appearing monomials.A monomial basis vector of order t is given by can be written as p = α∈N n p α x α , where only finitely many p α are non-zero.A polynomial p ∈ R[x] is a sum of squares (sos) if p = k j=1 (h j ) 2 for h j ∈ R[x] and k ≥ 1.The set of sos polynomials is denoted by Σ[x] and the set of sos polynomials of degree at most t is denoted by Σ[x] t .

Duality of the generalized problem of moments
We shall briefly discuss the duality theory associated with the GMP.For this, let C(K) denote the space of bounded continuous functions on K endowed with the supremum norm • ∞ .For two vector spaces E, F of arbitrary dimension, a non-degenerate bilinear form : E × F → R is called a duality of E and F .The spaces M(K) and C(K) can be put in duality by defining : Let again f 0 , f 1 , . . ., f m be continuous functions on K and b 1 , . . ., b m ∈ R. The dual of ( 1) is given by Note that the dual problem (3) is always strictly feasible, due to the constraint K dµ ≤ 1 in the primal GMP (1).
Weak duality holds for this pair of problems, meaning val ′ ≤ val.The difference val − val ′ is called duality gap.
In fact, the duality gap is always zero, as the next theorem shows.Note that a zero duality gap does not imply the existence of a dual optimal solution.
then the set of optimal solutions of (3) is nonempty and bounded.
As discussed in Lasserre [8], it is customary in the literature to assume that condition (4) holds, but in practice it may be a non-trivial task to check whether it does.We do stress, however, that condition (4) does hold for the applications discussed in the next subsection.

Applications
Polynomial and rational optimization.Consider the problem of minimizing a rational function over K: where q, p ∈ R[x] are relatively prime and we may assume q(x) > 0 for all x ∈ K. Indeed, if q changes signs on K, Jibetean and De Klerk [7, Corollary 1] showed that p * = −∞.We will in fact make the stronger assumption that q(x) ≥ 1 on K, i.e. that we know a positive lower bound on the minimum of q over K.The optimization problem (5) can be modeled as a GMP: The inequality constraint K dµ(x) ≤ 1 is redundant if q(x) ≥ 1 ∀x ∈ K and can be added to obtain a problem of form (1).
We emphasize that minimizing a quadratic polynomial over the simplex ∆ n−1 is already NP-hard, since it contains the problem of computing the size (α(G)) of a maximum stable set of a graph G. Indeed, for a graph G with adjacency matrix adjacency matrix A, Motzkin and Strauss [15] showed that where I is the identity matrix, which is a quadratic polynomial optimization problem over the simplex.
Similarly, deciding convexity of a homogeneous polynomial f of degree 4 or higher is known to be NP-hard [1].It can be modeled as polynomial optimization problem over the sphere.A homogeneous polynomial f is convex if and only if min which in turn be cast as a GMP over the sphere.
Consider the problem of multivariate numerical integration of a function f over a set K with respect to a given (reference) measure µ 0 ∈ M(K) + .Loosely speaking, a cubature scheme consists of a set of nodes x (ℓ) ∈ K and weights A possibility to mitigate the error in this scheme is to choose the weights and points such that the approximation is exact for polynomials up to some fixed degree.The problem of finding such weights and nodes can be cast as a GMP.Let d ∈ N and β ∈ N n any vector such that |β| > d.Assume the reference measure µ 0 is a probability measure, otherwise set µ 0 ← µ 0 /µ 0 (K).In the GMP given by val := inf the redundant constraint K dµ(x) ≤ 1 can be added to turn it into a GMP of form (1). The solution µ * to (7) will be of the form by Theorem 3.This result is known as Tchakaloff's theorem [16].There is some freedom in the choice of the objective function, however, note that it should be linearly independent of {x α } for α ∈ N n d .Hence, our approach discussed in this paper may be applied to the problem of finding cubature rules for measures on the simplex or sphere.

A linear relaxation hierarchy over the simplex
that maps monomials to their respective moments.Thus, to an optimal solution µ * of a GMP there is an associated linear functional L * such that L * (f 0 ) = val and L * (f i ) = b i for all i ∈ [m] as well as L * (1) ≤ 1.The idea of the relaxation we are about to introduce is to approximate the optimal solution by a sequence (hierarchy) of linear functionals L (r) that depend on r = 1, 2, . . . .Let K = ∆ n−1 .For i = 0, 1, . . ., m let f i be a real homogeneous polynomial of degree d and let r ≥ d.Consider the following linear relaxation of (1): Every feasible solution µ ′ to (1) provides an upper bound for (8) by setting L (r) (x α ) = x α , µ ′ .Hence, f (r) LP ≤ val.To see it is a linear program (LP) note that each L (r) (x α ) can be replaced by a scalar variable y α and the resulting program is an LP.The second last constraint is reflecting the necessary condition for a positive measure µ over the simplex: The last constraint in (8) arises from the fact that Equivalently, defining the ideal We state two lemmas that will come in handy in our later analysis.Lemma 1.Let r, k ∈ N with k ≤ r and let L (r) be a feasible solution to the linear relaxation (8) for some f 0 , f 1 , . . ., f m .Then for all x γ with γ ∈ N n and |γ| ≤ r − k we have Proof.The last equality constraint in the relaxation forces Therefore, noting that x ej = x j we have Hence, Reiterating this procedure leads us to the desired outcome.
Lemma 2. Consider the GMP given in (1) and let (y, t) ∈ R m × R + .Then the pair (y, t) is dual optimal only if Proof.The minimization problem By Theorem 1 there is no duality gap and there exists a primal optimal solution µ * to the GMP (1).Set ν = µ * /µ * (K).
Hence, ν is a probability measure and therefore a feasible solution to (9).We deduce where the first inequality follows from the definition of the dual (3) of the GMP and the last equality from strong duality.
When we consider the case where K = ∆ n−1 , we may, without loss of generality, assume the f i to be homogeneous of the same degree for all i = 0, 1, . . ., m.Indeed, let

Convergence analysis
The following theorem is a refinement of a result by Powers and Reznick [11], obtained by de Klerk, Laurent and Parrilo [4, Theorem 1.1].It is a quantitative version of Pólya's Positivstellensatz (see, e.g.[12] for a survey), and it will be crucial in our analysis of the simplex case.
Then the polynomial We continue by stating and proving one of the main results of this paper.
Proof.By Theorem 1 there is no duality gap.Let r > d(d − 1)/2 + 1 and let L (r) be an optimal solution to (8).Fix some ε > 0.Then, where both inequalities follow from the fact that L (r) (1) ≤ 1.By Lemma 2 we have which is homogeneous as well and its minimum over the simplex is ε.Hence, by Theorem 4 for k as in (11) we have with c β > 0 for all β ∈ N n d+k .To determine the smallest integer k for which the theorem holds we will bound B(f ).For this, set y 0 = 1 and y i = −ȳ i .We may rewrite f as Then, where the last inequality follows from the fact that L (r) (x α ) ≥ 0 for all |α| ≤ r.One may bound r as follows , concluding the proof.

Lasserre hierarchy over the sphere
We now consider the GMP (1) over the sphere, i.e. we consider the case K = S n−1 .Additionally, we assume the f 0 , f 1 , . . ., f m in (1) are homogeneous polynomials of even degree 2d.
The Lasserre hierarchy [9] of semidefinite relaxations of the GMP (1) over the sphere is given by where the L (2r) operator is now applied entry-wise to matrix-valued functions, where needed.
The following lemma enables us to use a quantitative Positivstellensatz by Fang an Fawzi [6] for positive polynomials on the sphere, to obtain a rate of convergence of the Lasserre hierarchy.Proof.Let σ ∈ Σ[x] k be a sum of squares of degree 2k.Then there exists A 0 such that σ • denote the trace inner product.We have The quantitative Positivstellensatz by Fang and Fawzi [6] is as follows.Theorem 6. [6, Theorem 3.8] Assume f is a homogeneous polynomial of degree 2d such that 0 ≤ f (x) ≤ 1 for all x ∈ S n−1 and d ≤ n.There are constants We may now use the theorem by Fang and Fawzi [6] and Lemma 3 to derive a rate of convergence for Lasserre hierarchy [9] of the GMP on the sphere as follows.Theorem 7. Let val be the optimal value of the GMP (1) Let (ȳ, t) be a dual optimal solution and let f m+1 (x) := 1 for every x ∈ S n−1 , set ȳm+1 = −t and set y 0 = 1 and y = −ȳ.Further, let Proof.As in the proof of Theorem 5, Theorem 1 gives us strong duality.Let r ≥ C d n and let L (2r) be an optimal solution to (13).Then by the same reasoning as in Theorem 5, r 2 and applying Theorem 6 we see that where the last inequality follows from Lemma 3. Noting that max we arrive at the result.

Limiting behavior of the hierarchies of linear operators
Consider the case when K = ∆ n−1 .When looking at the linear operators in the relaxation hierarchies (8) one would expect that in the limit, i.e. for r → ∞, the operators L (r) (•) behave like •, µ for some positive measure µ.In the rest of this section we prove that this is in fact the case and we will define the limit in a meaningful way.Consider again the ideal and let L = {L : R[x]/I → R : L fulfills conditions 1. and 2.} be the class of all linear operators that satisfy the conditions above.Note that for every L ∈ L the relation ) Suppose F : X → Y is a linear operator between two normed vector spaces (X, • X ) and (Y, • Y ), then the following are equivalent Using Theorem 8 we can prove that the operators we consider are continuous in the limit.Lemma 4. Every L ∈ L is continuous.
Proof.By Theorem 8 it suffices to show that every L ∈ L satisfies Let L * be the optimizer of min L(f ) s.t.L ∈ L and note that L * (f ) = f min as an immediate consequence of Theorem 5. Hence, for all L ∈ L we have Similarly, let L ′ be the optimizer of max L(f ) s.t.L ∈ L.
By the same reasoning we have L ′ (f ) = f max and it follows that L(f ) ≤ f for all L ∈ L. Hence one can set M = 1 and we see . This means we can employ the following theorem in the next step.Theorem 9. (see, e.g.[10, Theorem 1.9.1])Suppose that M is a dense subspace of a normed space X, that Y is a Banach space, and that T 0 : M → Y is a bounded linear operator.Then there is a unique continuous function T : X → Y that agrees with T 0 on M .This function T is a bounded linear operator and T = T 0 .
Now let L = L : C(∆ n−1 ) → R : L is the continuous linear extension of some L ∈ L .
To see this, note that the space C(∆ n−1 ) can be ordered by the convex cone C(∆ n−1 ) + .Now L(f ) ≥ 0 for all f ∈ C(∆ n−1 ) + implies that L ∈ (C(∆ n−1 ) + ) * , i.e. the dual cone of C(∆ n−1 ) + which is known to be the set of finite Borel measures on ∆ n−1 .Let f be a homogeneous continuous function that is non-negative on the simplex and consider its Bernstein approximation of order r given by The approximation converges uniformly to f as r → ∞ since f is continuous.Using Lemma 4 we see Hence, it follows that L(f ) = f, µ for some positive measure µ, such that µ(∆ n−1 ) ≤ 1.
Remark 1.By the proof given above, it becomes clear that the continuous linear extension can in fact be defined in terms of the limit of the Bernstein approximation, i.e., define L(f ) := lim r→∞ L(B r f ) for f ∈ C(∆ n−1 ) and L ∈ L.
For the sphere case, i.e.K = S n−1 consider the following theorem.Theorem 10. (see, e.g.[9, Theorem 3.8]) Let y = (y α ) α∈N n ⊂ R ∞ be a given infinite real sequence, L : R[x] → R be the linear operator defined by The sequence y has a finite Borel representing measure with support contained in K if and only if where g J (x) = j∈J g j (x).
Now, let L be a linear operator such that Recall that as a semialgebraic set the sphere can be written as Then for K = S n−1 every L ∈ L ′ satisfies all conditions of Theorem 10.To see this, note that the only possibilities for J are {∅, {1}, {2}, {1, 2}}.Because of condition 3 we have that L(±(1 − x 2 2 )p) = 0 for all p ∈ R[x] covering all cases except J = ∅.For J = ∅ the condition reduces to L(p 2 ) ≥ 0 which holds for all p ∈ R[x] because of Lemma 3. Hence, every L ∈ L ′ has a representing measure whose support is contained in S n−1 .

Concluding remarks
In this last section we conclude by outlining the connection of our results to previous work.We show that -in the special case of polynomial optimization on the simplex -our RLT hierarchy reduces to one studied earlier by Bomze and De Klerk [2], and De Klerk, Laurent and Parrilo [4].
De Klerk, Laurent and Parrilo [4] introduced the following hierarchy for minimizing a homogeneous polynomial p ∈ R[x] of degree d over the simplex.Let y be an optimal solution for the dual and define = L (r+d) (1).
Hence, the constructed solution for the LP relaxation is feasible.Further,

Theorem 4 .
Suppose f ∈ R[x] is a homogeneous polynomial of degree d of the form f (x) = |α|=d f α x α .Let ε = min ∆n−1 f (x) and define

Lemma 3 .
Let L : R[x] 2k → R be a linear operator and suppose L [x] k [x] T k 0, where the operator is applied entrywise to the matrix [x] k [x] T k .Then, L(σ) ≥ 0 for all σ ∈ Σ[x] k .