1 Introduction

For a compact set \(K \subset {\mathbb {R}}^n\) let \({\mathcal {M}}(K)\) denote the (infinite-dimensional) vector space of signed finite Borel measures with support contained in K. Let \([m]=\{1, \ldots , m\}\) for \(m \in {\mathbb {N}}\). The generalized moment problem (GMP) is an optimization problem of the following form:

$$\begin{aligned} \text {val} := \inf _{\mu \in {\mathcal {M}}(K)_+}&\int _{K} f_0(\mathbf{x }) \mathrm {d}\mu (\mathbf{x }) \nonumber \\ \text {s.t. }&\int _{K} f_i(\mathbf{x }) \mathrm {d}\mu (\mathbf{x }) = b_i \; \forall i \in [m] \nonumber \\&\int _{K} \mathrm {d}\mu (\mathbf{x }) \le 1, \end{aligned}$$
(1)

where \(m \in {\mathbb {N}}, b_i \in {\mathbb {R}}\) for all \(i \in [m]\), \({\mathcal {M}}(K)_+\) is the convex cone of positive finite Borel measures supported on K, and \(f_0, f_1, \ldots , f_{m}\) are continuous on K. We will always assume the GMP (1) has a feasible solution, which implies that it has an optimal solution as well (see Theorem 1).

The constraint \(\int _{K} \mathrm {d}\mu (\mathbf{x }) \le 1\) essentially means that we know an upper bound on the measure of K for the optimal solution, since, in this case, we may scale the functions \(f_i\) a priori to satisfy this condition.

The GMP is a conic linear optimization problem whose duality theory is well understood, see e.g. [18]. A wide range of optimization problems can be modeled as an instance of the GMP. The list includes problems from optimization, probability, financial economics and optimal control to name only a few, see e.g. [11].

For polynomial data, i.e. when all the \(f_i\)’s are polynomials (\(i \in \{ 0,1, \ldots , m\}\)), and the set K is a basic closed semialgebraic set, Lasserre [10] introduced a monotone nondecreasing hierarchy of semidefinite programming (SDP) relaxations of (1). For a survey on SDP approaches to the GMP with polynomial data and their error analysis, we refer to the survey of De Klerk and Laurent [4].

In this paper, we will consider the case where K is the standard (probability) simplex

$$\begin{aligned} \varDelta _{n-1} = \left\{ \mathbf{x } \in {\mathbb {R}}_+^n : x_1+ \cdots + x_n = 1 \right\} , \end{aligned}$$

where \({\mathbb {R}}^n_+\) is the nonnegative orthant, or the Euclidean sphere

$$\begin{aligned} {\mathcal {S}}^{n-1} = \left\{ \mathbf{x } \in {\mathbb {R}}^n : \Vert \mathbf{x }\Vert _2^2 = x_1^2 + \cdots + x_n^2 = 1 \right\} . \end{aligned}$$

Our main result is to establish a rate of convergence for the Lasserre hierarchy [10] for the GPM with polynomial data on the sphere, and for a related, RLT (reformulation-linearization technique)-type linear programming hierarchy for the GPM with polynomial data on the simplex. This RLT hierarchy is in fact a generalisation of LP hierarchies for polynomial optimization on the simplex, as introduced by Bomze and De Klerk [2], and De Klerk et al. [5], and is closely related to the original work on RLT hierarchies by Sherali and Adams [19].

1.1 Outline of the paper

First we introduce some notation in Sect. 2.1. In Sect. 2.2 we review the duality theory of the GMP. A brief overview of possible applications of our setting is given in Sect. 2.3. For K the simplex we introduce a linear relaxation hierarchy in this setting in Sect. 3 and prove a convergence rate of O(1/r). Section 4 contains the new convergence analysis of the Lasserre [11] SDP hierarchies of the GPM on the sphere. In Sect. 5 we take a mathematical view of how the optimal measure is obtained in the limit as the level of the hierarchies approaches infinity. In Sect. 6 we explain how our LP hierarchy is a generalization of an approximation hierarchy for the problem of minimizing a form of degree d over the simplex introduced by De Klerk et al. [5] based on earlier results obtained by Bomze and De Klerk [2].

2 Preliminaries

2.1 Notation

Let \({\mathbb {N}}= \{ 0, 1, 2, \ldots \}\) denote the set of nonnegative integers, \({\mathbb {N}}_+ = {\mathbb {N}} {\setminus } \{ 0 \}\) and \({\mathbb {N}}_t^n\) the set of sequences \(\alpha \in {\mathbb {N}}^n\) for which \(\vert \alpha \vert = \sum _{i = 1}^{n} \alpha _i \le t\) for \(t \in {\mathbb {N}}\). For \(\alpha \in {\mathbb {N}}^n\), \(\mathbf{x }^\alpha \) denotes the monomial \(x_1^{\alpha _1}\cdots x_n^{\alpha _n}\) and its degree is \(\vert \alpha \vert \). The ring of multivariate polynomials in n variables \(\mathbf{x }= (x_1, \ldots , x_n)\) is denoted by \({\mathbb {R}}[\mathbf{x }]= {\mathbb {R}}[x_1, \ldots , x_n]\) and \({\mathbb {R}}[\mathbf{x }]_t\) is its subspace of polynomials of degree at most t. The (total) degree of a polynomial is the maximal degree of its appearing monomials. A monomial basis vector of order t is given by

$$\begin{aligned}{}[\mathbf{x }]_t = (1, x_1 , \ldots , x_n , x_1^2 , x_1 x_2 , \ldots , x_{n-1} x_n , x_n^2, \ldots , x_1^t , \ldots , x_n^t )^T. \end{aligned}$$

Any polynomial \(p \in {\mathbb {R}}[\mathbf{x }]\) can be written as \(p = \sum _{\alpha \in {\mathbb {N}}^n} p_\alpha \mathbf{x }^\alpha \), where only finitely many \(p_\alpha \) are non-zero. A polynomial \(p\in {\mathbb {R}}[\mathbf{x }]\) is a sum of squares (sos) if \(p = \sum _{j=1}^{k} (h_j)^2\) for \(h_j \in {\mathbb {R}}[\mathbf{x }]\) and \(k \ge 1\). The set of sos polynomials is denoted by \(\varSigma [\mathbf{x }]\) and the set of sos polynomials of degree at most t is denoted by \(\varSigma [\mathbf{x }]_t\).

2.2 Duality of the generalized problem of moments

We shall briefly discuss the duality theory associated with the GMP (1). To this end, let \({\mathcal {C}}(K)\) denote the space of bounded continuous functions on K endowed with the supremum norm \(\Vert \cdot \Vert _\infty \). For two vector spaces EF of arbitrary dimension, a non-degenerate bilinear form \(\langle \rangle : E \times F \rightarrow {\mathbb {R}}\) is called a duality of E and F. The spaces \({\mathcal {M}}(K)\) and \({\mathcal {C}}(K)\) can be put in duality by defining \(\langle \rangle : {\mathcal {C}}(K) \times {\mathcal {M}}(K) \rightarrow {\mathbb {R}}\) as

$$\begin{aligned} \langle f, \mu \rangle = \int _K f(\mathbf{x }) \mathrm {d}\mu (\mathbf{x }). \end{aligned}$$
(2)

Let \(f_0, f_1, \ldots , f_m\) be continuous functions on K and \(b_1, \ldots , b_m \in {\mathbb {R}}\). The dual of (1) is given by

$$\begin{aligned} \text {val}^\prime&= \sup _{(y,t) \in {\mathbb {R}}^m\times {\mathbb {R}}_+} \sum _{i = 1}^m y_i b_i - t \nonumber \\&\quad \text {s.t. } f_0(\mathbf{x })-\sum _{i=1}^m y_i f_i(\mathbf{x }) +t \ge 0 \quad \forall \; \mathbf{x } \in K. \end{aligned}$$
(3)

Note that the dual problem (3) is always strictly feasible, due to the constraint \(\int _{K} \mathrm {d}\mu \le 1\) in the primal GMP (1).

Weak duality holds for this pair of problems, meaning \(\text {val}^\prime \le \text {val}\). The difference \(\text {val}-\text {val}^\prime \) is called duality gap. In fact, the duality gap is always zero, as the next theorem shows. Note that a zero duality gap does not imply the existence of a dual optimal solution.

Theorem 1

(see, e.g. [11, Theorem 1.3]) Assume problem (1) is feasible. Then it has an optimal solution (the \(\inf \) is attained), and \(\mathrm {val} = \mathrm {val}^\prime \).

We continue by recalling a sufficient condition for a dual optimal solution to exist.

Theorem 2

(see, e.g. [18, Proposition 2.8]) Suppose problem (1) is feasible. If

$$\begin{aligned} b \in \text { int} ((\langle f_1, \mu \rangle , \ldots , \langle f_{m} , \mu \rangle ) : \mu \in {\mathcal {M}}(K)_+ ) \end{aligned}$$
(4)

then the set of optimal solutions of (3) is nonempty and bounded.

As discussed in Lasserre [10], it is customary in the literature to assume that condition (4) holds, but in practice it may be a non-trivial task to check whether it does. We do stress, however, that condition (4) does hold for the applications discussed in the next subsection.

Another result worth mentioning is that if the GMP (1) has an optimal solution, it has one which is finite atomic.

Theorem 3

(see, e.g. [4, Theorem 3]) If the GMP (1) has an optimal solution, then it has one which is finite atomic with at most m atoms, i.e., of the form \(\mu ^*= \sum _{\ell = 1}^m \omega _\ell \delta _{ {\mathbf{x }}^{(\ell )}}\) where \(\omega _\ell \ge 0, {\mathbf{x }}^{(\ell )} \in K\) and \(\delta _{ {\mathbf{x }}^{(\ell )}}\) denotes the Dirac measure supported at \( {\mathbf{x }}^{(\ell )} (\ell \in [m])\).

2.3 Applications

We now review some examples of problems which can be formulated as a GMP with polynomial data, and discuss the special cases considered in this paper, namely when the set K is a simplex or sphere.

2.3.1 Polynomial and rational optimization

Consider the problem of minimizing a rational function over K:

$$\begin{aligned} p^*= \inf _{\mathbf{x } \in K} \frac{p(\mathbf{x })}{q(\mathbf{x })}, \end{aligned}$$
(5)

where \(q,p \in \mathbb {R[\mathbf{x }]}\) are relatively prime and we may assume \(q(\mathbf{x }) > 0\) for all \(\mathbf{x } \in K\). Indeed, if q changes signs on K, Jibetean and De Klerk [9, Corollary 1] showed that \(p^*= -\infty \). We will in fact make the stronger assumption that \(q(\mathbf{x }) \ge 1\) on K, i.e. that we know a positive lower bound on the minimum of q over K. The optimization problem (5) can be modeled as a GMP:

$$\begin{aligned} \mathrm {val} = \inf _{\mu \in {\mathcal {M}}(K)_+} \left\{ \int _{K} p(\mathbf{x }) \mathrm {d}\mu (\mathbf{x }) : \int _{K} q(\mathbf{x })\mathrm {d}\mu (\mathbf{x }) =1 \right\} . \end{aligned}$$
(6)

The inequality constraint \(\int _{K}\mathrm {d}\mu (\mathbf{x }) \le 1\) is redundant if \(q(\mathbf{x })\ge 1 \; \forall \mathbf{x } \in K\) and can be added to obtain a problem of form (1). Note that setting \(q(\mathbf{x }) = 1\) for all \(\mathbf{x } \in K\) problem (5) becomes a polynomial optimization problem.

We now consider the special case where K is the simplex. Motzkin and Strauss [12] showed that the maximum stable set problem can be formulated as a quadratic polynomial optimization problem over the simplex. Indeed, for a graph G with adjacency matrix adjacency matrix A,

$$\begin{aligned} \frac{1}{\alpha (G)} = \min _{\mathbf{x } \in \varDelta _{n-1}} \mathbf{x }^T(A + I)\mathbf{x }, \end{aligned}$$

where I is the identity matrix, and \(\alpha (G)\) is the stable set number (independence number) of G. This gives a quadratic polynomial optimization problem over the simplex, that may be written as the GMP (6) with \(p(\mathbf{x }) = \mathbf{x }^T(A + I)\mathbf{x }\) and \(q(\mathbf{x }) \equiv 1\).

To give an example for the special case when K is a sphere, recall that deciding convexity of a homogeneous polynomial f of degree 4 or higher is known to be NP-hard [1]. A homogeneous polynomial f is convex if and only if

$$\begin{aligned} \min _{(\mathbf{x },\mathbf{y }) \in {\mathcal {S}}^{2n-1}} \mathbf{y }^T \nabla ^2 f(\mathbf{x }) \mathbf{y } \ge 0, \end{aligned}$$

which can again be cast as a GMP over the sphere via (6).

2.3.2 Polynomial cubature

Another application that goes beyond polynomial optimization is concerned with polynomial cubature rules, see e.g. [7, 21]. Let \(N \in {\mathbb {N}}\). Consider the problem of multivariate numerical integration of a function f over a set K with respect to a given (reference) measure \(\mu _0 \in {\mathcal {M}}(K)_+\). Loosely speaking, a cubature scheme consists of a set of nodes \(\mathbf{x }^{(\ell )} \in K\) and weights \(\omega _\ell \ge 0\) for \(\ell \in [N]\), respectively, such that

$$\begin{aligned} \int _{K} f(\mathbf{x }) \mathrm {d}\mu (\mathbf{x }) \approx \sum _{\ell = 1}^N\omega _\ell f\left( \mathbf{x }^{(\ell )}\right) . \end{aligned}$$

We call a rule consisting of nodes \(\mathbf{x }^{(\ell )}\) and weights \(\omega _\ell \) for \(\ell \in [N]\) a polynomial cubature scheme of degree d if it is exact for polynomials up to degree d. Finding polynomial cubature rules is NP-hard in general, see [3]. The problem of finding such weights and nodes can be cast as a GMP. Let \(d \in {\mathbb {N}}\) and \(\beta \in {\mathbb {N}}^n\) any vector such that \(\vert \beta \vert > d\). Assume the reference measure \(\mu _0\) is a probability measure, otherwise set \(\mu _0 \leftarrow \mu _0 / \mu _0(K)\). In the GMP given by

$$\begin{aligned} \mathrm {val}:= & {} \inf _{\mu \in {\mathcal {M}}(K)_+} \int _{K} \mathbf{x }^\beta \mathrm {d}\mu (\mathbf{x }) \nonumber \\&\text {s.t. } \int _{K} \mathbf{x }^\alpha \mathrm {d}\mu (\mathbf{x }) = \int _{K} \mathbf{x }^\alpha \mathrm {d}\mu _0(\mathbf{x }) \; \forall \alpha \in {\mathbb {N}}^n_d \end{aligned}$$
(7)

the redundant constraint \(\int _{K}\mathrm {d}\mu (\mathbf{x })\le 1\) can be added to turn it into a GMP of form (1). The the solution \(\mu ^*\) to (7) will be of the form \(\mu ^*= \sum _{\ell = 1}^N \omega _\ell \delta _{x^{(\ell )}}\), where \(N \le \vert {\mathbb {N}}_d^n\vert = {{n+d}\atopwithdelims (){d}}\) by Theorem 3. This result is known as Tchakaloff’s theorem [20]. There is some freedom in the choice of the objective function, however, note that it should be linearly independent of \(\{ \mathbf{x }^\alpha \}\) for \(\alpha \in {\mathbb {N}}_d^n\).

In the special cases where K is a simplex or sphere, many cubature schemes are known, but this remains an active field of study. The interested reader is referred to the book [6] for more details.

3 A linear relaxation hierarchy over the simplex

In the remainder of the paper we will only deal with the GMP (1) with polynomial data, i.e. we assume in what follows that all \(f_i\)’s are polynomials (\(i \in \{0,\ldots ,m\}\)).

A moment sequence \((y_\alpha )_{\alpha \in {\mathbb {N}}^n} \subset {\mathbb {R}}\) of a measure \(\mu \in {\mathcal {M}}(K)\) is an infinite sequence such that

$$\begin{aligned} y_\alpha = \int _K \mathbf{x }^\alpha \mathrm {d}\mu (\mathbf{x }) \; \forall \alpha \in {\mathbb {N}}^n. \end{aligned}$$

Let \(L : \mathbb {R[\mathbf{x }]} \rightarrow {\mathbb {R}}\) be a linear operator

$$\begin{aligned} p(\mathbf{x }) = \sum _{\alpha \in {\mathbb {N}}^n} p_\alpha \mathbf{x }^\alpha \mapsto L(p) = \sum _{\alpha \in {\mathbb {N}}^n} p_\alpha y_\alpha \end{aligned}$$

that maps monomials to their respective moments. Thus, to an optimal solution \(\mu ^*\) of a GMP there is an associated linear functional \(L^*\) such that \(L^*(f_0) = \text {val}\) and \(L^*(f_i)= b_i\) for all \(i \in [m]\) as well as \(L^*(1)\le 1\). The idea of the relaxation we are about to introduce is to approximate the optimal solution by a sequence (hierarchy) of linear functionals \(L^{(r)}\) that depend on \( r = 1, 2, \ldots \). Let \(K = \varDelta _{n-1}\). For \(i = 0, 1, \ldots , m\) let \(f_i\) be a real homogeneous polynomial of degree d and let \(r \ge d\). Let \(L^{(r)}\) be the optimal solution of the following RLT-type relaxation of (1):

$$\begin{aligned} {\underline{f}}_{\text {LP}}^{(r)} = \min _{{\mathop {L\text { linear}}\limits ^{L:{\mathbb {R}}[\mathbf{x }]\rightarrow {\mathbb {R}}}}} \quad&L(f_0) \nonumber \\ \text {s.t.} \quad&L(f_i) = b_i \quad \forall \; i \in [m] \nonumber \\&L(1) \le 1 \nonumber \\&L(\mathbf{x }^\alpha ) \ge 0 \quad \forall \; \vert \alpha \vert \le r \nonumber \\&L(\mathbf{x }^\alpha ) = L \left( \mathbf{x }^{\alpha } \sum _{i=1}^{n} x_i \right) \quad \forall \;\vert \alpha \vert \le r-1. \end{aligned}$$
(8)

Every feasible solution \(\mu ^\prime \) to (1) provides an upper bound for (8) by setting \(L(\mathbf{x }^\alpha ) = \langle \mathbf{x }^\alpha , \mu ^\prime \rangle \). Hence, \({\underline{f}}_{\text {LP}}^{(r)} \le \text {val} \). The second last constraint is reflecting the necessary condition for a positive measure \(\mu \) over the simplex:

$$\begin{aligned} \langle \mathbf{x }^\alpha , \mu \rangle = \int _{\varDelta _{n-1}} \mathbf{x }^\alpha \mathrm {d}\mu \ge 0 \quad \forall \alpha \in {\mathbb {N}}^n. \end{aligned}$$

The last constraint in (8) arises from the fact that

$$\begin{aligned} L(p)=L(q) \text { if } p(\mathbf{x }) = q(\mathbf{x }) \quad \forall \mathbf{x } \in \varDelta _{n-1}. \end{aligned}$$

Equivalently, defining the ideal \({\mathcal {I}} = \{ \mathbf{x } \mapsto p(\mathbf{x })\left( 1 - \sum _{i=1}^{n} x_i \right) \; : \; p \in {\mathbb {R}}[\mathbf{x }] \}\) we require

$$\begin{aligned} L(p)=L(q) \Leftrightarrow p = q \mod \mathcal {I}, \end{aligned}$$

where \(p = q \mod \mathcal {I}\) means \(p(\mathbf{x }) = q(\mathbf{x }) +(1-\sum _{i = 1}^n x_i) h(\mathbf{x })\) for some \(h \in {\mathbb {R}}[\mathbf{x }]\).

Our formulation (8) is closely related to the RLT approach by Sherali and Adams [19], that was originally introduced for 0–1 mixed integer linear programming problems and subsequently extended for more general problems (but not to the GMP, to the best of the authors’ knowledge). In fact, for the special case of polynomial optimization, problem (8) is essentially a Sherali–Adams RLT approach. To see this, note that our linearization operator L corresponds to the approximation \(L(\mathbf{x }^\alpha ) \approx \langle \mathbf{x }^\alpha , \mu ^*\rangle \), where \(\mu ^*\) again denotes an optimal solution to the GMP (1). For the special case of polynomial optimization, we may assume that \(\mu ^*\) is a Dirac delta centered at an optimal solution, say \(\mathbf{x }^*\). In this case, \(L(\mathbf{x }^\alpha ) \approx \langle \mathbf{x }^\alpha , \mu ^*\rangle = {\mathbf{x }^*}^\alpha \), i.e. L corresponds to the type of linearization operator introduced by Sherali and Adams [19].

We now state two lemmas that will come in handy in our later analysis.

Lemma 1

Let \(r,k \in {\mathbb {N}}\) with \(k\le r\) and let L be a feasible solution to the linear relaxation (8) for some \(f_0, f_1, \ldots , f_m\). Then for all \( {\mathbf{x }}^{\gamma }\) with \(\gamma \in {\mathbb {N}}^n \) and \(\vert \gamma \vert \le r-k\) we have

$$\begin{aligned} L\left( {\mathbf{x }}^\gamma \right) = L\left( {\mathbf{x }}^{\gamma } \left( \sum _{i=1}^{n} x_i \right) ^k \right) . \end{aligned}$$

Proof

The proof is immediate, by using induction on k.\(\square \)

Lemma 2

Consider the GMP given in (1) and let \((y,t) \in {\mathbb {R}}^m \times {\mathbb {R}}_+\). Then the pair (yt) is dual optimal only if

$$\begin{aligned} 0 = \min _{ {\mathbf{x }} \in K} \left( f_0( {\mathbf{x }}) - \sum _{i=1}^m y_i f_i( {\mathbf{x }})+t \right) . \end{aligned}$$

Proof

The proof is a direct consequence of the GPM duality theory, and is omitted here. \(\square \)

When we consider the case where \(K = \varDelta _{n-1}\), we may, without loss of generality, assume the \(f_i\) to be homogeneous of the same degree for all \(i = 0, 1, \ldots , m\). Indeed, let \(f(\mathbf{x }) = \sum _{j = 0}^d f_j(\mathbf{x })\), where \(\text {deg}(f_j)=j\). Then, \(g(\mathbf{x }) := \sum _{j=0}^d f_j(\mathbf{x })\left( \sum _{i=1}^n x_i \right) ^{d-j}\) is homogeneous of degree d and \(f(\mathbf{x })=g(\mathbf{x })\) for all \(\mathbf{x } \in \varDelta _{n-1}\).

3.1 Convergence analysis

The following theorem is a refinement of a result by Powers and Reznick [15], obtained by de Klerk et al. [5, Theorem 1.1]. It is a quantitative version of Pólya’s Positivstellensatz (see, e.g. [17] for a survey), and it will be crucial in our analysis of the simplex case.

Theorem 4

Suppose \(f \in {\mathbb {R}}[ {\mathbf{x }}]\) is a homogeneous polynomial of degree d of the form \( f( {\mathbf{x }}) = \sum _{\vert \alpha \vert = d} f_{\alpha } {\mathbf{x }}^\alpha . \) Let \(\varepsilon = \min _{\varDelta _{n-1}} f( {\mathbf{x }})\) and define

$$\begin{aligned} B(f)= \max _{\vert \alpha \vert = d} \frac{\alpha _1 ! \cdots \alpha _n ! }{d!} f_{\alpha }. \end{aligned}$$
(9)

Then the polynomial \((x_1 + \cdots + x_n)^k f( {\mathbf{x }}) \) has only positive coefficients if

$$\begin{aligned} k > \frac{d(d-1)}{2} \frac{B(f)}{\varepsilon }-d. \end{aligned}$$
(10)

We continue by stating and proving one of the main results of this paper.

Theorem 5

Let val be the optimal value of the GMP (1) for input data \(K = \varDelta _{n-1}, f_0, f_1, \ldots , f_m \in {\mathbb {R}}[ {\mathbf{x }}]\) homogeneous of degree d and \(b_1, \ldots , b_m \in {\mathbb {R}}\). Assume there exists a dual optimal solution \(({\bar{y}},t)\) and let \(f_{m+1}( {\mathbf{x }}):=1\) for every \( {\mathbf{x }} \in \varDelta _{n-1}\) and set \({\bar{y}}_{m+1}=-t\). Then, setting \(y_0 = 1\) and \(y_i = -{\bar{y}}_i\) for \(i \in [m+1]\) we have

$$\begin{aligned} 0 \le \text {{ val}} - {\underline{f}}_{\text { {LP}}}^{(r)} \le \frac{\left( \sum _{i = 0}^{m+1} B(y_if_i) +t \right) d(d-1)}{2(r-1)-d(d-1)}, \end{aligned}$$
(11)

for \(B(\cdot )\) as in (9) and \(r > d(d-1)/2+1\).

Remark 1

The bound we give in Theorem 5 depends on the dual optimal solution (yt). We cannot bound the dual variables in terms of the problem data a priori in general, as they may become arbitrarily large. There are, however, cases in which one can bound the variables in terms of the problem data. An example of this case can be found in Sect. 6.

Proof

By Theorem 1 there is no duality gap. Let \(r > d(d-1)/2+1\) and let \(L^{(r)}\) be an optimal solution to (8). Fix some \(\varepsilon >0\). Then,

$$\begin{aligned} 0 \le \text {val}-{\underline{f}}_{\text {LP}}^{(r)}&= \text {val} - L^{(r)}\left( \sum _{i = 1}^m {\bar{y}}_if_i-t + f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t \right) \\&= \text {val} - \sum _{i=1}^{m}{\bar{y}}_i L^{(r)}(f_i)+tL^{(r)}(1) -L^{(r)}\left( f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t \right) \\&\le \text {val} - \sum _{i=1}^{m}{\bar{y}}_i b_i+t -L^{(r)} \left( f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t\right) \\&= -L^{(r)}\left( f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t \right) \\&= - L^{(r)}\left( f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t+\varepsilon \right) +\varepsilon L^{(r)}(1) \\&\le - L^{(r)}\left( f_0 - \sum _{i=1}^m {\bar{y}}_if_i +t+\varepsilon \right) +\varepsilon , \end{aligned}$$

where both inequalities follow from the fact that \(L^{(r)}(1)\le 1\). By Lemma 2 we have \( \min _{\mathbf{x } \in \varDelta _{n-1}} f_0(\mathbf{x }) - \sum _{i = 1}^{m+1} {\bar{y}}_if_i(\mathbf{x }) + \varepsilon = \varepsilon \). We assume wlog that \(f_0 - \sum _{i=1}^{m+1} {\bar{y}}_if_i\) is homogeneous of degree d. Define

$$\begin{aligned} f := f_0 - \sum _{i=1}^{m+1} {\bar{y}}_if_i + \varepsilon \left( \sum _{i=1}^{n} x_i \right) ^d, \end{aligned}$$

which is homogeneous as well and its minimum over the simplex is \(\varepsilon \). The aim now is to show that \(L^{(r)}(f) \ge 0\) for the appropriate choice of r and then bound r in terms of \(\varepsilon \). By Theorem 4 for k as in (10) we have

$$\begin{aligned} f(\mathbf{x }) \left( \sum _{i=1}^{n} x_i \right) ^k = \sum _{\beta \in {\mathbb {N}}^n_{d+k}}c_{\beta }x^{\beta } \end{aligned}$$

with \(c_{\beta }> 0\) for all \(\beta \in {\mathbb {N}}^n_{d+k}\). To determine the smallest integer k for which the theorem holds we will first bound B(f). For this, set \(y_0 = 1\) and \(y_i = -{\bar{y}}_i\). We may rewrite f as

$$\begin{aligned} f&= \sum _{i = 0}^{m+1} y_i f_i +\varepsilon \left( \sum _{i=1}^{n} x_i \right) ^d \\&= \sum _{i=0}^{m+1} y_i f_i +\varepsilon \left( \sum _{\vert \alpha \vert = d}\left( {\begin{array}{c}d\\ \alpha _1 \cdots \alpha _n \end{array}}\right) x^\alpha \right) \\&= \sum _{\vert \alpha \vert = d} \left( \sum _{i=0}^{m+1} y_i f_{i,\alpha } +\varepsilon \left( {\begin{array}{c}d\\ \alpha _1 \cdots \alpha _n \end{array}}\right) \right) x^\alpha . \end{aligned}$$

Then,

$$\begin{aligned} B(f)&= \max _{ \alpha }\left[ \left( \sum _{i=0}^{m+1} y_i f_{i,\alpha } + \frac{d!}{\alpha _1 ! \cdots \alpha _n !} \varepsilon \right) \frac{\alpha _1 ! \cdots \alpha _n !}{d!} \right] \\&= \left( \max _{ \alpha } \left( \sum _{i=0}^{m+1} y_i f_{i,\alpha } \right) \frac{\alpha _1 ! \cdots \alpha _n !}{d!} \right) +\varepsilon \\&\le \sum _{i=0}^{m+1} \left( \max _{ \alpha } y_if_{i,\alpha } \frac{\alpha _1 ! \cdots \cdots \alpha _n !}{d!} \right) +\varepsilon \\&= \sum _{i=0}^{m+1} B(y_if_i)+\varepsilon . \end{aligned}$$

With this bound on B(f) we find that if r is large enough, i.e.,

$$\begin{aligned} r \ge \left\lceil \frac{d(d-1)}{2}\frac{\sum _{i=0}^{m+1} B(y_if_i)+\varepsilon }{\varepsilon } \right\rceil \ge \left\lceil \frac{d(d-1)}{2}\frac{B(f)}{\varepsilon } \right\rceil , \end{aligned}$$

it follows from Lemma 1 that

$$\begin{aligned} - L^{(r)}\left( f_0 - \sum _{i=1}^{m+1} {\bar{y}}_if_i + \varepsilon \right) + \varepsilon&= \varepsilon - L^{(r)}(f) \\&= \varepsilon - L^{(r)}\left( f \left( \sum _{i=1}^{n} x_i \right) ^k \right) \\&= \varepsilon - L^{(r)} \left( \sum _{\beta \in {\mathbb {N}}_{k+d}^n} c_{\beta } x^{\beta } \right) \le \varepsilon , \end{aligned}$$

where the last inequality follows from the fact that \(L^{(r)}(\mathbf{x }^\alpha ) \ge 0\) for all \(\vert \alpha \vert \le r\). To find a bound on r in terms of \(\varepsilon \) we set

$$\begin{aligned} r = \left\lceil \frac{d(d-1)}{2}\frac{\sum _{i=0}^{m+1} B(y_if_i)+\varepsilon }{\varepsilon } \right\rceil . \end{aligned}$$

Then, one may bound r as follows

$$\begin{aligned} r-1&\le \frac{d(d-1)}{2}\left( \frac{\sum _{i =0}^{m+1} B(y_if_i)}{\varepsilon } + 1 \right) \\ \Leftrightarrow \varepsilon&\le \frac{ \sum _{i=0}^{m+1} B(y_if_i) d(d-1)}{2(r-1)-d(d-1)}, \end{aligned}$$

concluding the proof. \(\square \)

4 Lasserre hierarchy over the sphere

We now consider the GMP (1) over the sphere, i.e. we consider the case \(K= {\mathcal {S}}^{n-1}\). Additionally, we assume the \(f_0, f_1, \ldots , f_m\) in (1) are homogeneous polynomials of even degree 2d.

The Lasserre hierarchy [11] of semidefinite relaxations of the GMP (1) over the sphere is given by

$$\begin{aligned} {\underline{f}}_{\text {SDP}}^{(2r)} = \min \min _{{\mathop {L\text { linear}}\limits ^{L:{\mathbb {R}}[\mathbf{x }]\rightarrow {\mathbb {R}}}}}&\; L(f_0) \nonumber \\ \text {s.t.} \quad&L(f_i ) = b_i \quad \forall i \in [m] \nonumber \\&L(1) \le 1 \nonumber \\&L\left( [\mathbf{x }]_{r} [\mathbf{x }]_{r}^T\right) \succeq 0 \quad \nonumber \\&L(\mathbf{x }^\alpha ) = L \left( \mathbf{x }^{\alpha } \Vert \mathbf{x }\Vert _2^2 \right) \quad \forall \quad \vert \alpha \vert \le 2r-2, \end{aligned}$$
(12)

where the L operator is now applied entry-wise to matrix-valued functions, where needed and the optimal solution is denoted by \(L^{(2r)}\).

The following lemma enables us to use a quantitative Positivstellensatz by Fang and Fawzi [8] for positive polynomials on the sphere, to obtain a rate of convergence of the Lasserre hierarchy. It is a folklore result and certainly known to be true, however we did not find a suitable reference. Hence, we give a short proof for completeness.

Lemma 3

Let \(L : {\mathbb {R}}[ {\mathbf{x }}]_{2k} \rightarrow {\mathbb {R}}\) be a linear operator and suppose \(L\left( [ {\mathbf{x }}]_k [ {\mathbf{x }}]_k^T \right) \succeq 0\), where the operator is applied entrywise to the matrix \([ {\mathbf{x }}]_k [ {\mathbf{x }}]_k^T\). Then, \(L(\sigma ) \ge 0\) for all \(\sigma \in \varSigma [ {\mathbf{x }}]_k\).

Proof

Let \(\sigma \in \varSigma [\mathbf{x }]_k\) be a sum of squares of degree 2k. Then there exists \(A \succeq 0\) such that \(\sigma = [\mathbf{x }]_k^TA[\mathbf{x }]_k\). Let \(\langle \cdot , \cdot \rangle \) denote the trace inner product. We have

$$\begin{aligned} L(\sigma ) = L\left( [\mathbf{x }]_k^T A [\mathbf{x }]_k \right) =\sum _{i,j} A_{i,j} L\left( ([\mathbf{x }]_k)_i( [\mathbf{x }]_k)_j \right) = \langle A, L\left( [\mathbf{x }]_k [\mathbf{x }]_k^T \right) \rangle \ge 0, \end{aligned}$$

since both A and \(L\left( [\mathbf{x }]_k [\mathbf{x }]_k^T \right) \) are psd. \(\square \)

The quantitative Positivstellensatz by Fang and Fawzi [8] is as follows.

Theorem 6

[8, Theorem 3.8] Assume f is a homogeneous polynomial of degree 2d such that \(0 \le f( {\mathbf{x }}) \le 1\) for all \( {\mathbf{x }} \in {\mathcal {S}}^{n-1}\) and \(d \le n\). There are constants \(C_d, C_d^\prime \) that depend only on d such that if \(r \ge C_d n\) then

$$\begin{aligned} f + C_d^\prime (d/r)^2= \sigma ( {\mathbf{x }}) + (1-\Vert {\mathbf{x }}\Vert _2^2)h( {\mathbf{x }}) \end{aligned}$$

for \(\sigma ( {\mathbf{x }}) \in \varSigma [ {\mathbf{x }}]_r\) and \(h \in {\mathbb {R}}[ {\mathbf{x }}]_{2r-2}\).

We may now use the theorem by Fang and Fawzi [8] and Lemma 3 to derive a rate of convergence for Lasserre hierarchy [11] of the GMP on the sphere as follows.

Theorem 7

Let \(\mathrm {val}\) be the optimal value of the GMP (1) for input data \(K = {\mathcal {S}}^{n-1}, f_0, f_1, \ldots , f_m \in {\mathbb {R}}[ {\mathbf{x }}]\) homogeneous of even degree 2d, \(b_1, \ldots , b_m \in {\mathbb {R}}\) and \(d \le n\). Let \(({\bar{y}},t)\) be a dual optimal solution and let \(f_{m+1}( {\mathbf{x }}):=1\) for every \( {\mathbf{x }} \in {\mathcal {S}}^{n-1}\), set \({\bar{y}}_{m+1}=-t\) and set \(y_0 = 1\) and \(y = -{\bar{y}}\). Further, let \(f^{i,y_i}_{\max } = \max _{ {\mathbf{x }}\in {\mathcal {S}}^{n-1}} y_if_i( {\mathbf{x }})\). There exist constants \(C_d, C_d^\prime \), only dependent on d, such that if \(r \ge C_d n\) we have

$$\begin{aligned} 0 \le \mathrm {val}-{\underline{f}}_{\mathrm {SDP}}^{(2r)} \le \frac{C_d^\prime d^2 \sum _{i=0}^{m+1} f^{i,y_i}_{\max }}{r^2}. \end{aligned}$$

Proof

The proof is similar to that of Theorem 5, essentially the only difference being that Lemma 3 is used, and we omit the details.\(\square \)

5 Limiting behavior of the hierarchies of linear operators

The purpose of this section is to show that the limit functionals of the introduced hierarchies correspond to measures, in the sense that they are the Riesz functional of the optimal solution of the corresponding GMP. In the following we will define the limit of the optimal solutions \(L^{(r)}\) of the introduced hierarchies in a meaningful way and prove that the corresponding moment sequences have a representing measure.

5.1 The simplex case

Consider the case when \(K = \varDelta _{n-1}\). When looking at the linear operators in the relaxation hierarchies (8) one would expect that in the limit, i.e. for \(r \rightarrow \infty \), the operators \(L^{(r)}(\cdot )\) behave like \(\langle \cdot , \mu \rangle \) for some positive measure \(\mu \). In the rest of this section we prove that this is in fact the case and we will define the limit in a meaningful way. Consider again the ideal \({\mathcal {I}} = \{ \mathbf{x } \mapsto p(\mathbf{x })\left( 1- \sum _{i=1}^{n}x_i \right) : p \in \mathbb {R[\mathbf{x }]} \}\) and let \({\bar{L}} : \mathbb {R[\mathbf{x }]} / {\mathcal {I}} \rightarrow {\mathbb {R}}\) be a linear operator such that

  1. 1.

    \({\bar{L}}(\mathbf{x }^\alpha ) \ge 0\) for all \(\alpha \in {\mathbb {N}}^n\)

  2. 2.

    \({\bar{L}}(1)\le 1\)

and let

$$\begin{aligned} {\mathcal {L}} = \{ {\bar{L}} : {\mathbb {R}}[\mathbf{x }]/{\mathcal {I}} \rightarrow {\mathbb {R}} : {\bar{L}} \text { fulfills conditions } 1. \text { and } 2. \} \end{aligned}$$

be the class of all linear operators that satisfy the conditions above. Note that for every \({\bar{L}} \in {\mathcal {L}}\) the relation

$$\begin{aligned} {\bar{L}}\left( \left( 1-\sum _{i=1}^{n}x_i\right) \mathbf{x }^\alpha \right) = 0 \text { for all } \alpha \in {\mathbb {N}}^n \end{aligned}$$

trivially holds. If \(\Vert f\Vert = \sup _{\mathbf{x } \in \varDelta _{n-1}} \vert f(\mathbf{x }) \vert \), then \((\mathbb {R[\mathbf{x }]}/{\mathcal {I}}, \Vert \cdot \Vert )\) is a normed vector space.

Theorem 8

(see, e.g. [13, Theorem 1.4.2]) Suppose \(F: X \rightarrow Y\) is a linear operator between two normed vector spaces \((X, \Vert \cdot \Vert _X)\) and \((Y, \Vert \cdot \Vert _Y)\), then the following are equivalent

  1. 1.

    F is continuous

  2. 2.

    \(\Vert Fx\Vert _Y \le M \Vert x\Vert _X\) for some \(M\in {\mathbb {R}}\).

Using Theorem 8 we can prove that the operators we consider are continuous in the limit.

Lemma 4

Every \({\bar{L}} \in {\mathcal {L}}\) is continuous.

Proof

By Theorem 8 it suffices to show that every \({\bar{L}} \in {\mathcal {L}}\) satisfies

$$\begin{aligned} \vert {\bar{L}}(f)\vert \le M \Vert f\Vert = M \sup _{\mathbf{x } \in \varDelta _{n-1}} \vert f\vert \end{aligned}$$

for all \(f \in \mathbb {R[\mathbf{x }]}/{\mathcal {I}}\). Hence, let \(f \in {\mathbb {R}}[\mathbf{x }]/{\mathcal {I}}\) and let \(\Vert f\Vert = \sup _{\mathbf{x } \in \varDelta _{n-1}} \vert f(\mathbf{x }) \vert \). Also set

$$\begin{aligned} f_{\min } = \min _{x \in \varDelta _{n-1}} f(\mathbf{x }) \ge -\Vert f\Vert \text { and } f_{\max } = \max _{x \in \varDelta _{n-1}} f(\mathbf{x }) \le \Vert f\Vert . \end{aligned}$$

Let \({\bar{L}}^*\) be the optimizer of

$$\begin{aligned} \min {\bar{L}}(f) \text { s.t. } {\bar{L}} \in {\mathcal {L}}. \end{aligned}$$

Then \({\bar{L}}^*(f) \ge - \Vert f\Vert \). To see this suppose \(f_{\min } \ge 0\), from which follows that \({\bar{L}}^*(f) \ge 0 \ge - \Vert f\Vert \) by Theorem 5. If \(f_{\min } < 0\), consider \({\bar{L}}^*(f-f_{\min })\ge 0\) and so \({\bar{L}}^*(f) \ge f_{\min } \ge - \Vert f\Vert \). Hence, for all \({\bar{L}} \in {\mathcal {L}}\) we have

$$\begin{aligned} {\bar{L}}(f) \ge {\bar{L}}^*(f) \ge -\Vert f\Vert . \end{aligned}$$

Similarly, let \({\bar{L}}^\prime \) be the optimizer of

$$\begin{aligned} \max {\bar{L}}(f) \text { s.t. } {\bar{L}} \in {\mathcal {L}}. \end{aligned}$$

By the same reasoning we have \({\bar{L}}^\prime (f) \le \Vert f\Vert \). Hence one can set \(M=1\) and we see

$$\begin{aligned} \vert L(f)\vert \le \Vert f\Vert . \end{aligned}$$

\(\square \)

The set \(\mathbb {R[\mathbf{x }]} / {\mathcal {I}}\) is dense in \({\mathcal {C}}(\varDelta _{n-1})\). This means we can employ the following theorem in the next step.

Theorem 9

(see, e.g. [13, Theorem 1.9.1]) Suppose that M is a dense subspace of a normed space X, that Y is a Banach space, and that \(T_0 : M \rightarrow Y\) is a bounded linear operator. Then there is a unique continuous function \(T : X \rightarrow Y\) that agrees with \(T_0\) on M. This function T, called a continuous linear extension of \(T_0\), is a bounded linear operator and \(\Vert T\Vert = \Vert T_0\Vert \).

Now let

$$\begin{aligned} {\mathcal {T}} = \left\{ T : {\mathcal {C}}(\varDelta _{n-1}) \rightarrow {\mathbb {R}} : T \text { is the continuous linear extension of some } {\bar{L}} \in {\mathcal {L}} \right\} . \end{aligned}$$

Proposition 1

Let \(T \in {\mathcal {T}}\) and \(f \in {\mathcal {C}}(\varDelta _{n-1})\). Then

$$\begin{aligned} T(f) = \int _{\varDelta _{n-1}} f( {\mathbf{x }})\mathrm {d}\mu ( {\mathbf{x }}) \end{aligned}$$

for some positive measure \(\mu \) supported on \(\varDelta _{n-1}\), satisfying \(\mu (\varDelta _{n-1}) \le 1\).

Proof

It is sufficient to show \(T(f) \ge 0\) for all \(f \in {\mathcal {C}}(\varDelta _{n-1})_+ = \{f \in {\mathcal {C}}(\varDelta _{n-1}) : f(\mathbf{x }) \ge 0 \; \forall \mathbf{x } \in \varDelta _{n-1} \}\). To see this, note that the space \({\mathcal {C}}(\varDelta _{n-1}\)) can be ordered by the convex cone \({\mathcal {C}}(\varDelta _{n-1})_+\). Now \(T(f) \ge 0\) for all \(f \in {\mathcal {C}}(\varDelta _{n-1})_+\) implies that \(T \in \left( {\mathcal {C}}(\varDelta _{n-1})_+ \right) ^*\), i.e. the dual cone of \({\mathcal {C}}(\varDelta _{n-1})_+\) which is known to be the set of finite Borel measures on \(\varDelta _{n-1}\). Let f be a homogeneous continuous function that is non-negative on the simplex and consider its Bernstein approximation of order r given by

$$\begin{aligned} {\mathcal {B}}_f^r(\mathbf{x }) = \sum _{\begin{array}{c} \alpha \in {\mathbb {N}}_r^n \\ \vert \alpha \vert = r \end{array}} f\left( \frac{\alpha }{r}\right) {{r}\atopwithdelims (){\alpha }}\mathbf{x }^\alpha . \end{aligned}$$

The approximation converges uniformly to f as \(r \rightarrow \infty \) since f is continuous. Using Lemma 4 we see

$$\begin{aligned} T(f)&= T( \lim _{r \rightarrow \infty } {\mathcal {B}}_f^r ) \\&\overset{T\text { cont.}}{=} \lim _{r \rightarrow \infty } T({\mathcal {B}}_f^r) \\&= \lim _{r \rightarrow \infty } \sum \limits _{\begin{array}{c} \alpha \in {\mathbb {N}}_r^n \\ \vert \alpha \vert = r \end{array}} \underbrace{ f\left( \frac{\alpha _1}{r}, \ldots , \frac{\alpha _n}{r}\right) }_{\ge 0} \underbrace{ \left( {\begin{array}{c}r\\ \alpha \end{array}}\right) }_{\ge 0} \underbrace{ T(\mathbf{x }^\alpha )}_{\ge 0} \ge 0. \end{aligned}$$

Hence, it follows that \(T(f) = \langle f, \mu \rangle \) for some positive measure \(\mu \), such that \(\mu (\varDelta _{n-1}) \le 1\). \(\square \)

Remark 2

By the proof given above, it becomes clear that the continuous linear extension can in fact be defined in terms of the limit of the Bernstein approximation, i.e., define \(T(f) := \lim _{r \rightarrow \infty } {\bar{L}}({\mathcal {B}}_f^r)\) for \(f \in {\mathcal {C}}(\varDelta _{n-1})\) and \({\bar{L}} \in \mathcal {{L}}\).

5.2 The sphere case

For the sphere case, i.e. \(K = {\mathcal {S}}^{n-1}\) consider the following theorem.

Theorem 10

(see, e.g. [11, Theorem 3.8]) Let \(\mathbf{y } = (y_\alpha )_{\alpha \in {\mathbb {N}}^n} \subset {\mathbb {R}}^\infty \) be a given infinite real sequence, \({\bar{L}} : {\mathbb {R}}[ {\mathbf{x }}] \rightarrow {\mathbb {R}}\) be the linear operator defined by

$$\begin{aligned} p( {\mathbf{x }}) = \sum _{\alpha \in {\mathbb {N}}^n} p_\alpha {\mathbf{x }}^\alpha \mapsto {\bar{L}}(p) = \sum _{\alpha \in {\mathbb {N}}^n} p_\alpha y_\alpha , \end{aligned}$$

and let \(K = \{ {\mathbf{x }} \in {\mathbb {R}}^n : g_1( {\mathbf{x }})\ge 0, \ldots , g_m( {\mathbf{x }})\ge 0 \}\). The sequence \(\mathbf{y }\) has a finite Borel representing measure with support contained in K if and only if

$$\begin{aligned} {\bar{L}}(f^2g_J) \ge 0 \; \forall J \subseteq \{1, \ldots , m\} \text { and } f \in {\mathbb {R}}[ {\mathbf{x }}], \end{aligned}$$

where \( g_J( {\mathbf{x }}) = \prod _{j \in J} g_j ( {\mathbf{x }}). \)

Now, let \({\bar{L}}\) be a linear operator such that

  1. 1.

    \({\bar{L}}(1)\le 1\)

  2. 2.

    \({\bar{L}}([\mathbf{x }]_t[\mathbf{x }]_t^T) \succeq 0 \;\forall t \in {\mathbb {N}}\)

  3. 3.

    \({\bar{L}}(\mathbf{x }^\alpha ) = {\bar{L}}(\mathbf{x }^\alpha \Vert \mathbf{x }\Vert _2^2) \; \forall \alpha \in {\mathbb {N}}^n\)

and let \({\mathcal {L}}^\prime = \{ {\bar{L}} : {\mathbb {R}}[\mathbf{x }] \rightarrow {\mathbb {R}} : {\bar{L}} \text { satisfies } 1. \text { - } 3. \}\). Recall that as a semialgebraic set the sphere can be written as \({\mathcal {S}}^{n-1} = \{ \mathbf{x } \in {\mathbb {R}}^n : g_1(\mathbf{x }) := 1- \Vert \mathbf{x }\Vert _2^2 \ge 0, g_2(\mathbf{x }): = \Vert \mathbf{x }\Vert _2^2-1 \ge 0 \}\). Then for \(K = {\mathcal {S}}^{n-1}\) every \({\bar{L}} \in {\mathcal {L}}^\prime \) satisfies all conditions of Theorem 10. To see this, note that the only possibilities for J are \(\{ \emptyset , \{1\}, \{2\}, \{1,2\} \}\). Because of condition 3 we have that \({\bar{L}}(\pm (1-\Vert \mathbf{x }\Vert _2^2)p)=0\) for all \(p \in \mathbb {R[\mathbf{x }]}\) covering all cases except \(J = \emptyset \). For \(J = \emptyset \) the condition reduces to \({\bar{L}}(p^2)\ge 0\) which holds for all \(p \in \mathbb {R[\mathbf{x }]}\) because of Lemma 3. Hence, every \({\bar{L}} \in {\mathcal {L}}^\prime \) has a representing measure whose support is contained in \({\mathcal {S}}^{n-1}\).

6 Concluding remarks

In this last section we conclude by outlining the connection of our results to previous work. We show that—in the special case of polynomial optimization on the simplex—our RLT hierarchy reduces to one studied earlier by Bomze and De Klerk [2], and De Klerk et al. [5].

De Klerk et al. [5] introduced the following hierarchy for minimizing a homogeneous polynomial \(p \in {\mathbb {R}}[x]\) of degree d over the simplex.

$$\begin{aligned} \begin{aligned} p^{(r)} = \max \lambda \text { s.t. }&\text { the polynomial } \left( \sum _{i=1}^n x_i \right) ^r\left( p(\mathbf{x })-\lambda \left( \sum _{i=1}^n x_i \right) ^d \right) \\&\text { has only nonneg. coefficients.} \end{aligned} \end{aligned}$$
(13)

It was proved that \(\lim _{r \rightarrow \infty } p^{(r)} = p_{\min } = \min _{\mathbf{x } \in \varDelta _{n-1}} p(\mathbf{x })\). The LP hierarchy introduced in Sect. 3 of this paper is a generalization of the hierarchy (13), in the sense made precise in the following theorem.

Theorem 11

For some homogeneous polynomial \(p \in {\mathbb {R}}[ {\mathbf{x }}]\) of degree d let \({\underline{f}}_{\text { LP}}^{(r+d)}\) be the solution to the LP relaxation of the problem

$$\begin{aligned} \min _{ {\mathbf{x }} \in \varDelta _{n-1}}p( {\mathbf{x }}) = \text { val} = \inf _{\mu \in {\mathcal {M}}(\varDelta _{n-1})_+} \left\{ \int _{\varDelta _{n-1}}p( {\mathbf{x }}) \mathrm {d}\mu ( {\mathbf{x }}) : \int _{\varDelta _{n-1}}\mathrm {d}\mu ( {\mathbf{x }}) = 1 \right\} \end{aligned}$$

for some \(r \in {\mathbb {N}}\). Then,

$$\begin{aligned} p^{(r)} = {\underline{f}}_{\text { LP}}^{(r+d)}. \end{aligned}$$

Proof

The proof is straightforward, and omitted for the sake of brevity.\(\square \)

As has been noted before, the estimate of Theorem (5) depends on the dual variables. While it is in general not possible to get rid of these variables in the estimate there are cases in which we can. In the following we present an example of such a case.

Example 1

Consider the case of polynomial optimization over the simplex. Let \(f \in {\mathbb {R}}[\mathbf{x }]\) be of degree d and set

$$\begin{aligned} f_{\min } = \min _{\mathbf{x } \in \varDelta _{n-1}} f(\mathbf{x }), \end{aligned}$$

and analogously define \(f_{\max }\). We can cast this as a GMP of type (1)

$$\begin{aligned} f_{\min } = \inf _{\mu \in {\mathcal {M}}(\varDelta _{n-1})} \left\{ \int _{\varDelta _{n-1}}f(\mathbf{x })\mathrm {d}\mu : \int _{\varDelta _{n-1}} \mathrm {d}\mu = 1,\; \int _{\varDelta _{n-1}} \mathrm {d}\mu \le 1 \right\} . \end{aligned}$$

A dual optimal solution is in this case given by \((y^*, t^*) = (f_{\min }, 0)\). Noting that in the estimate we set \(y_0 = 1\), our estimate (11) becomes

$$\begin{aligned} f_{\min } - {\underline{f}}_{\text {LP}}^{(r+d)} \le \frac{d(d-1)}{2(r+d-1)-d(d-1)}(B(f)-f_{\min }) \end{aligned}$$

and applying the inequality

$$\begin{aligned} B(f)-f_{\min } \le {{2d-1}\atopwithdelims (){d}}d^d\left( f_{\max }-f_{\min }\right) , \end{aligned}$$

shown in [5, Theorem 2.2], we find

$$\begin{aligned} f_{\min } - {\underline{f}}_{\mathrm {LP}}^{(r+d)} \le \frac{d(d-1)}{2(r+d-1)-d(d-1)}{{2d-1}\atopwithdelims (){d}}d^d\left( f_{\max }-f_{\min }\right) . \end{aligned}$$

This is essentially the same result as was obtained in [5, Theorem 1.3]. The presented example highlights the fact that results for convergence rates of the GMP may not be as clean as for simpler problems like polynomial optimization, even though the tools that are used to obtain these results are the same. This, of course, is due to the fact that the GMP is much more complicated in general.

Moreover, we would like to emphasize that the conceptual tools of this paper are not limited to the cases that were treated. In fact, given a quantitative version of a Positivstellensatz, it is possible to perform a convergence analysis of the kinds we proposed in this paper as long as the nature of the relaxation hierarchy, i.e. linear or semidefinite, is coherent with the positivity certificate given by the Positivstellensatz. For example, for more general sets K there is a (much weaker) quantitative Positivstellensatz available found by Nie and Schweighofer [14]. This result can be used to bound the rate of convergence of the GMP for more general sets. We chose to discuss the simplex and the sphere as there are strong Positivstellensätze available in these cases and to expose the fact that the relaxation must be in line with the certificate. For the sphere case, one could also use the following Positivstellensatz by Reznick.

Theorem 12

(cf. Theorem 3.12 in Reznick [16]) Assume f is a homogeneous polynomial of degree 2d such that \(0 \le f( {{x}}) \le 1\) for all \( {{x}} \in {\mathcal {S}}^{n-1}\). Then one has

$$\begin{aligned} f( {\mathbf{x }}) +\frac{d(d-1)n}{r\log 2}= \sigma ( {\mathbf{x }}) + (1-\Vert { {\mathbf{x }}}\Vert ^2)\Vert h( {\mathbf{x }}) \end{aligned}$$

for some \(\sigma \in \varSigma [ {\mathbf{x }}]_{2(r+d)}\) and \(h \in {\mathbb {R}}[ {\mathbf{x }}]_{2(r+d)-2}\).

By using this theorem instead of Theorem 6, one obtains a convergence result with fewer assumptions than the one presented in Theorem 7, but at the cost of a worse convergence rate. In particular, one may avoid the assumption \(n \le d\) in Theorem 6 by using the result by Reznick, leading to a convergence rate of O(1/r) on the sphere (as opposed to the \(O(1/r^2)\) in Theorem 7).