Near-optimal analysis of Lasserre’s univariate measure-based bounds for multivariate polynomial optimization

We consider a hierarchy of upper approximations for the minimization of a polynomial f over a compact set K⊆Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K \subseteq \mathbb {R}^n$$\end{document} proposed recently by Lasserre (arXiv:1907.097784, 2019). This hierarchy relies on using the push-forward measure of the Lebesgue measure on K by the polynomial f and involves univariate sums of squares of polynomials with growing degrees 2r. Hence it is weaker, but cheaper to compute, than an earlier hierarchy by Lasserre (SIAM Journal on Optimization 21(3), 864–885, 2011), which uses multivariate sums of squares. We show that this new hierarchy converges to the global minimum of f at a rate in O(log2r/r2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\log ^2 r / r^2)$$\end{document} whenever K satisfies a mild geometric condition, which holds, eg., for convex bodies and for compact semialgebraic sets with dense interior. As an application this rate of convergence also applies to the stronger hierarchy based on multivariate sums of squares, which improves and extends earlier convergence results to a wider class of compact sets. Furthermore, we show that our analysis is near-optimal by proving a lower bound on the convergence rate in Ω(1/r2)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varOmega (1/r^2)$$\end{document} for a class of polynomials on K=[-1,1]\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=[-1,1]$$\end{document}, obtained by exploiting a connection to orthogonal polynomials.


Introduction
Consider the problem of finding the minimum value taken by an n-variate polynomial f ∈ R[x] over a compact set K ⊆ R n , i.e., computing the parameter: Throughout we also set f max = max x∈K f (x). Computing the parameter f min (or f max ) is a hard problem in general, including for instance the maximum stable set problem as a special case. For a general reference on polynomial optimization and its applications, we refer, eg., to [14,16]. If we fix a Borel measure λ with support K , problem (1) may be reformulated as minimizing the integral K f (x)σ (x)dλ(x) over all sum-of-squares polynomials σ ∈ Σ[x] that provide a probability density on K with respect to the measure λ. By bounding the degree of σ , we obtain the following hierarchy of upper bounds on f min proposed by Lasserre [15]: Here Σ[x] denotes the set of polynomials that can be written as a sum of squares of polynomials and we set Σ[x] r = Σ[x]∩R [x] 2r . Since sums of squares of polynomials can be expressed using semidefinite programming, for any fixed r ∈ N the parameter f (r ) can be computed efficiently by semidefinite programming or, even simpler, as the smallest eigenvalue of an appropriate matrix of size n+r r ( [15], see also [5]). Recently, Lasserre [17] introduced new, weaker but more economical, upper bounds on f min that are based on a univariate approach to the problem. For this purpose, he considers the push-forward measure λ f of λ by f , which is defined by λ f (B) = λ( f −1 (B)) for any Borel set B ⊆ R. ( Note that for any measurable function g : R → R, we thus have We then can define the following hierarchy of upper bounds on f min : The difference with the parameter f (r ) is that we now restrict the search to univariate sums of squares s ∈ Σ[t] r , which we then evaluate at the polynomial f , leading to the multivariate sum of squares σ pfm := s • f ∈ Σ[x] rd if f has degree d. Therefore we have the inequality pfm . (6) Again, the parameter f (r ) pfm can be computed efficiently for any fixed r . But now it can be computed as the smallest eigenvalue of an appropriate matrix of much smaller size r +1 (see (8) below). Asymptotic convergence of the parameters f (r ) pfm to f min is shown in [17], but no quantitative results are given there. In this paper, we are interested in analyzing the convergence rate of the parameters f (r ) pfm to the global minimum f min in terms of the degree r .

Previous work
In what follows we always consider for λ the Lebesgue measure on K (unless specified otherwise). Several results exist on the convergence rate of the parameters f (r ) to the global minimum f min , depending on the set K . The best rates in O(1/r 2 ) were shown in [5,6,23] when K belongs to special classes of convex bodies, including the hypercube [−1, 1] n , the ball B n , the sphere S n−1 , the standard simplex Δ n and compact sets that are locally 'ball-like'. Furthermore, it was shown in [5] that this analysis is best possible in general (already for K = [−1, 1] and f (x) = x). The starting point for each of these results is a connection between the parameters f (r ) and the smallest roots of certain orthogonal polynomials (see [5,Sect. 2] and the short recap below).
In [23,, a rate in O(log 2 r /r 2 ) was shown for general convex bodies K , as well as a rate in O(log r /r ) for general compact sets K that satisfy a minor geometric condition (a srengthening of Assumption 1 below). There the analysis relied on constructing explicit sum-of-squares densities that approximate well the Dirac delta function at a global minimizer of f , making use of the so-called 'needle' polynomials from [12]. An improved rate in O(log k r /r k ) was shown in [23,Theorem 14] when the partial derivatives of f up to degree k − 1 vanish at one of its global minimizers on K .
When K is a convex body, a convergence rate in O(1/r ) had been shown earlier in [4], by exploiting a link to simulated annealing. There the authors considered sum-ofsquares densities of (roughly) the form r is the truncated Taylor expansion of the exponential e −t/T . Hence this specific choice of s (or σ ) provides an upper bound not only for the parameter f (rd) (as exploited in [4]) but also for the parameter f (r ) pfm and thus the result of [4] gives directly The result above gives a first quantitative analysis of the parameters f (r ) pfm for convex bodies. In this paper we improve this result in two directions. First we sharpen the analysis and show the stronger convergence rate O(log 2 r /r 2 ) and second we show that this analysis applies a large class of compact sets (those satisfying Assumption 1), which includes all semialgebraic sets that have a dense interior.
We also mention briefly another hierarchy of bounds when K is a semialgebraic set, of the form Then lower bounds for the minimum of f over K can be obtained as . This hierarchy has been widely studied in the literature (see, eg., [14,16] and references therein). Asymptotic convergence to f min holds when the semialgebraic set K satisfies the Archimedean condition (which implies K is compact) [13] and relies on the positivity certificate of Putinar [20]. (The Archimedean condition requires existence of R > 0 such that R − n i=1 x 2 i lies in the quadratic module generated by the g j 's, consisting of the polynomials j s j g j for some sum-of-squares polynomials s j ). The question arises naturally of analyzing the quality of the bounds f (r ) . A convergence rate in O(1/(log(r /c)) 1/c ) is shown in [18], where c is a constant depending only on K . If in the definition of the bounds f (r ) we allow decompositions in the preordering, which consists of the polynomials J ⊆[m] σ J j∈J g j with σ J sum-of-squares polynomials, then, based on Schmüdgen's positivity certificate [21], asymptotic convergence holds for any compact K and a stronger convergence rate in O(1/r c ) was shown in [22] (where c again depends only on K ). When allowing decompositions in the preordering a stronger convergence rate in O(1/r ) was shown for special sets like the simplex (in [1]) and the hypercube (in [2]). (See also [7] for an overview). For the minimization of a homogeneous polynomial f over the unit sphere an improved convergence rate in O(1/r 2 ) for the bounds f (r ) was shown recently in [11] (improving the earlier rate in O(1/r ) from [9]). It turns out that this analysis relies (implicitly) on the convergence rate of the upper bounds for a special class of polynomials. This indicates there are intimate links between the upper and lower bounds f (r ) and f (r ) , which forms an additional motivation for better understanding the upper bounds f (r ) . Showing an improved convergence analysis for the bounds f (r ) for broader classes of semialgebraic sets remains an important research question.

New results
The main contribution of this paper is the following bound on the convergence rate of the parameter f (r ) pfm that holds whenever K satisfies a minor geometric condition.
In view of (6), we immediately get the following corollary, extending the rate in O(log 2 r /r 2 ), shown in [23] for convex bodies, to all connected compact sets K satisfying Assumption 1.

Corollary 1
Let K ⊆ R n be a compact connected set satisfying Assumption 1. Then we have In light of the following special case of [5, Corollary 3.2] our result on the convergence rate of f (r ) pfm is best possible in general, up to the log-factor.
. As an additional result, we extend the lower bound Ω(1/r 2 ) on the error range f [23,Theorem 14]), we thus show a large separation between the asymptotic quality of the bounds f (r ) and f (r ) pfm for this class of functions.

Approach and discussion
As already mentioned above, a crucial ingredient in the analysis of the parameters f (r ) for special compact sets like the hypercube [−1, 1] n , the ball, the sphere, or the simplex, is the analysis in the univariate case when K = [−1, 1] (equipped with the Lebesgue measure or more generally allowing a weight of Jacobi type) and the special polynomial with respect to the inner product ·, · λ given by Then, as is shown in [5], the parameter f (r ) coincides with the smallest eigenvalue of the (truncated) moment matrix M λ,r of λ, which is defined as A classical result on orthogonal polynomials (cf., eg., [24]) shows that the eigenvalues of M λ,r are given by the roots of p r +1 . Hence, the parameter f (r ) is equal to the smallest root of p r +1 , the asymptotic behaviour of which is well understood and known to be in −1 + Θ(1/r 2 ) when λ is a measure of Jacobi type ( [5], see also Lemma 2 below).
Recall that λ f is the push-forward measure of λ by f , as defined in (3), and In view of the above discussion, if we use the first (univariate) formulation of f (r ) pfm in (5), we can immediately conclude that f (r ) pfm is equal to the smallest eigenvalue of the matrix , and also to the smallest root of the orthogonal polynomial p f ,r +1 . However it is not clear how to exploit this connection in order to gain information about the convergence rate of the parameters f (r ) pfm since the orthogonal polynomials p f ,i are not known explicitly in general.
In this paper, we will go back to the idea of trying to find a good sum-of-squares polynomial approximation of the Dirac delta function. As in [23], we make use of the needle polynomials from [12] for this purpose. The difference with the approach in [23] is that we now work on the interval [ f min , f max ]; so we need an approximation of the Dirac delta function centered at f min , which is on the boundary of this interval. As is already noted in [12], this special setting allows for better approximations than would be available in general.

Outline
The rest of the paper is organized as follows. In Sect. 2 we give a proof of Theorem 1. Then, in Sect. 3, we prove Theorem 3. We provide some numerical examples that illustrate the practical behaviour of the bounds f (r ) and f (r ) pfm in Sect. 4. Finally, in Sect. 5, we give a small discussion of the geometric Assumption 1 below and we show that it is satisfied by the compact semialgebraic sets with a dense interior.

Convergence analysis for the new hierarchy
We first state the precise geometric condition alluded to in Theorem 1.
Assumption 1 There exist positive constants K , η K > 0 and N ≥ n, such that, for all x ∈ K and 0 < δ ≤ K , we have Here, B n δ (x) is the Euclidean ball centered at x with radius δ and B n = B n 1 (0).
A slightly stronger version of Assumption 1 (requiring N = n) was introduced in [3], where it was used to give the first error analysis in O(1/ √ r ) for the bounds f (r ) . The condition of [3] is satisfied, eg., when K is a convex body, or more generally when K satisfies an interior cone condition, or when K is star-shaped with respect to a ball (see also [3] for a more complete discussion). The weaker condition (9) is satisfied additionally by the compact semialgebraic sets that have a dense interior, which allows in particular that K has certain types of cusps. We discuss Assumption 1 in more detail in Sect. 5 below.
We show the following restatement of Theorem 1.
Theorem 4 Assume K is connected, compact and satisfies the above geometric condition (9). Then there exists a constant C (depending only on n, the Lipschitz constant of f and K ) such that The rest of this section is devoted to the proof of Theorem 4. We will make the following assumptions in order to simplify notation in our arguments. Let a be a global minimizer of f in K . After applying a suitable translation (replacing K by K − a and the polynomial f by the polynomial x → f (x − a)), we may assume that a = 0, that is, we may assume that the global minimum of f over K is attained at the origin. Furthermore, it suffices to work with the rescaled polynomial which satisfies F(K ) = [0, 1], with F min = 0 and F max = 1. Indeed, one can easily check that Then, for this polynomial F, we know that the support of the push-forward measure λ F is equal to [0, 1], and (5) gives In order to analyze the bound F (r ) pfm , we follow a similar strategy to the one employed in [23] to analyze the bound F (r ) . Namely, we construct a univariate sum-of-squares polynomial s which approximates well the Dirac delta centered at the origin on the interval [0, 1], making use of the so-called 1 2 -needle polynomials from [12].
Lemma 1 [12] Let h ∈ (0, 1) be a scalar and let r ∈ N. Then there exists a univariate polynomial ν h r ∈ Σ[t] 2r satisfying the following properties: We consider the sum-of-squares polynomial s(t) := Cν h r (t), where h ∈ (0, 1) will be chosen later, and C is chosen so that s is a density on [0, 1] with respect to the measure λ F . That is, As s is a feasible solution to (10), we obtain .
Our goal is thus to show that Define the set We first work out the numerator of (12), which we split into two terms, depending whether we integrate on K h or on its complement: Here we have upper bounded F(x) by h on K h and by 1 on K \ K h . On the other hand, we can lower bound the denominator in (12) as follows: Combining the above two inequalities on numerator and denominator we get .
Thus we only need to upper bound the second term above. We first work on the numerator. For any x ∈ K \ K h we have F(x) > h and thus, using (11), we get Next, we bound the denominator. In [23,Corollary 4], it is observed that Note that K ∩ B n ρ/C F ⊆ K ρ . By the geometric assumption (9) we have for all r large enough such that ρ/C F ≤ K . We can then lower bound the denominator as follows: Combining the above inequalities, we obtain If we now select h = 4(N + 1) log r r 2 , we have h ≥ ρ and a straightforward computation shows that ratio ≤ O log 2 r r 2 .
Here, the constant in the big O depends on n, N , C F , η K and λ(K ). This concludes the proof of Theorem 4.

Separation for a special class of polynomials
In this section we consider in more detail the behaviour of the bounds f (r ) and f (r ) pfm for the class of polynomials f (x) = x 2k (with k ≥ 1 integer) on the interval K = [−1, 1]. Then f ([−1, 1]) = [0, 1] and, by applying (6) to the polynomial f (x) = x 2k , we have the following inequality:  7)). Then, as shown in [5] and as recalled above, the parameter f (r ) pfm is equal to the smallest root of the polynomial p k,r +1 (t). As it turns out, here we can find explicitly the push-forward measure λ k , which can be shown to be of Jacobi type. Hence, we have information about the corresponding orthogonal polynomials p k,i , whose extremal roots are well understood. First we introduce the classical Jacobi polynomials (see, eg., [24] for a general reference).
Proof A proof of this fact based on results in [8,10] is given in [5].
In each case, f min is the global minimum of f on [−1, 1] 2 Proof It suffices to show the first claim, which follows by making a change of variables t = x 2k so that we get

Proof of Theorem 3 By applying the change of variables
and that (up to scaling) the orthogonal polynomials for the latter measure on [0, 1] are given by t → p a,b i (2t − 1) for i ∈ N. If we set a = 0 and b = −1 + 1/2k, then the measure obtained in this way on [0, 1] is precisely the push-forward measure λ k (see Lemma 3). Hence, we can conclude that (up to scaling) the orthogonal polynomials p k,i for λ k on [0, 1] are given by p k,i (t) = p a,b i (2t − 1) for each i ∈ N. Therefore, the smallest root of p k,r +1 (t) is equal to (ξ a,b r +1 + 1)/2 = Θ(1/r 2 ) by (14). In particular, we can conclude that f (r ) pfm = Ω(1/r 2 ) for any k ≥ 1.

Numerical examples
In this section, we illustrate the practical behaviour of the bounds f

Comparison of f (r )
pfm and f (r ) for polynomial test functions. First, we consider the polynomial test functions listed in Table 1. These are all well-known in optimization, and were already used to test the behaviour of the bounds f (r ) in [3,23]. We compare the bounds f (r ) and f So, values of ρ r ( f ) smaller than 1 indicate good performance of the bounds f (r ) pfm in comparison to f (r ) . The results can be found in Fig. 3. Remarkably, it appears that the performance of the bound f (r ) pfm is comparable to (or better than) the performance of f (r ) in each instance, except for the Camel function. Additionally, we note that the performance of f (r ) pfm for the Motzkin polynomial is comparatively much better on the unit ball than on the unit box. Figure 1 shows a plot of the Camel function, as well as the sum-of-squares densities corresponding to f (6) and f (6) pfm on the unit box. Note that while the density corresponding to f (6) resembles the Dirac delta function centered at the global minimizer (0, 0) of the Camel function, the density corresponding to f (6) pfm instead mirrors the Camel function itself.

Comparison of f (r )
pfm and f (r ) for the special class of polynomials f (x) = x 2k . Next, we consider the polynomials f (x) = x 2k for k ≥ 1 on the interval [−1, 1], which were treated in Sect. 3. In Fig. 4, the values of ρ r ( f ) are shown for 1 ≤ r ≤ 20 and 1 ≤ k ≤ 5. It can be seen that the performance of f (r ) pfm is comparable to the performance of f (r ) for k = 1 (indeed, in this case we have f (r ) pfm = f (2r ) ), but it is much worse for k > 1, which matches our earlier findings (Theorem 3). In Fig. 2, the optimal sum-of-squares densities σ (corresponding to f (r ) ) and σ pfm (corresponding to f (r ) pfm ) are depicted for k = 1, 3, 5 and r = 6. Note that while the density σ changes very little as we increase k, the density σ pfm grows increasingly 'flat' around the minimizer 0 of f (mirroring the behavior of f itself). As such, the density σ pfm is a comparatively much worse approximation of the Dirac delta function centered at 0 than σ . Note also that in this instance f (r ) = f (r +1) for even r , explaining the 'zig-zagging' behaviour of the ratio ρ r ( f ).

Comparison of f (r )
pfm and f (r ) for random instances of maximum cut. Finally, we consider some polynomial maximization problems on [−1, 1] n coming from small instances of MaxCut. An instance of MaxCut with vertex set [n] and edge weights w i j ≥ 0 can be written as: Note that while f is usually maximized over the discrete cube {−1, 1} n , the formulation (15) is equivalent as f is convex. Following [15], we create our instances by setting w i j = 0 with probability p, and sampling w i j uniformly from [0, 1] otherwise. In Table 2, we list values of f (r ) pfm and f (r ) for a few such random instances with p = 1/2 and n = 8. In each case, f (r ) pfm provides a better bound than f (r ) . In Table 3, we list the average over 50 randomly generated instances of the ratios: for r ≤ 4 and p ∈ {1/4, 1/2, 3/4}. Although it seems f (r ) pfm is more sensitive to changes in the density of the instances, we find again that it provides a better bound in general than f (r ) .

On the geometric assumption
As mentioned above, the condition (9) is a weaker version of a condition introduced in [3]. There, the authors demand that there exist constants η K , K such that The difference is that the power of δ in (16) is fixed to be the dimension n of K , whereas it is allowed to be an arbitrary N ≥ n in (9).  Table 1, computed on the unit box (left) and unit ball (right) Condition (9) is satisfied by a significantly larger class of sets K than (16). In particular, as we will observe below, sets satisfying (9) may have polynomial cusps, whereas sets satisfying (16) may not have any cusps at all. Fig. 5). This set K satisfies (9) (with N = 3), but it does not satisfy (16). Indeed, for the point

Example 1 Consider the set
and conclude (16) cannot be satisfied at x = 0. Note that the point 0 is indeed a polynomial cusp of the set K . Fig. 5). This set K does not satisfy (9) (and, as a consequence, does not satisfy (16)). Indeed, for the point 0 ∈ K we have Now note that for any N , η > 0 fixed, we have: and so (9) can not be satisfied at x = 0. Note that the point 0 is an exponential cusp of K .
It turns out that compact semialgebraic sets which have a dense interior (aka being fat) satisfy Assumption 1, as is shown essentially in [19].
Theorem 5 ([19], Theorem 6.4, see also Remark 6.5) Let K ⊆ R n be a compact, fat semialgebraic 1 set. Then there exist constants η > 0, N ≥ 1 and a positive integer d ∈ N such that one may find a polynomial h x of degree d for each x ∈ K satisfying: Furthermore, the polynomials h x may be chosen such that x − h x (t) ≤ t for all x ∈ K , t ∈ [0, 1].
Corollary 2 Let K ⊆ R n be a compact, fat semialgebraic set. Then K satisfies Assumption 1.
Proof For x ∈ K , let η, N and h x be as in Theorem 5. We may assume that h t := h x (t) ∈ B n t (x) for all t ∈ [0, 1]. For clarity, we write B(y, a) := B n a (y) in the rest of the proof.

Conclusions
We have shown a convergence rate in O(log 2 r /r 2 ) for the approximations f (r ) pfm of the minimum of a polynomial f over a compact connected set K satisfying the minor geometric assumption (9). Furthermore, we have shown that this analysis is nearoptimal, in the sense that the asymptotic behaviour of the error range f This latter result shows that although the worst-case guarantees on the convergence of the bounds f (r ) and f (r ) pfm are very similar, a large separation may exist for certain polynomials (eg., when f (x) = x 2k ). Of course, it should be noted that the parameter f (r ) pfm can be obtained via a much smaller eigenvalue computation than the parameter f (r ) , namely by computing the smallest eigenvalue of a matrix of size r + 1 for the latter in comparison to a matrix of size n+r r for the former. From a computational point of view, one should also observe that while the computation of f (r ) pfm involves a smaller matrix, it however requires to know the moments K f k dλ of powers of f for k ≤ 2r . If f has many terms this computation can be demanding for large values of r . This has to be taken into account when comparing the computational burden of both bounds f (r ) and f (r ) pfm . Lastly, as a surprising consequence of Theorem 1, we are able to extend the bound in O(log 2 r /r 2 ) on the convergence rate of f (r ) to all compact connected sets K satisfying the geometric condition (9), whereas it was previously only known for convex bodies [23]. In this sense, the arguments of Sect. 2 can be seen as a refinement (and simplification) of the ones given in [23].
As said above, the analysis in this paper is near-optimal: we can show an upper bound in O(log 2 r /r 2 ) and a lower bound in Ω(1/r 2 ) for a certain class of polynomials. Deciding what is the right regime and whether the log-factor can be avoided in the convergence analysis is the main research question left open by this work.
The log-factor arises from our analysis technique, based on using polynomial approximation by the needle polynomials. We had to use this analysis technique since the behaviour of the orthogonal polynomials for the push-forward measure λ f is not known for general f . On the other hand, our results may be interpreted as giving back some information for general push-forward measures λ f and their corresponding orthogonal polynomials p f ,i on the interval [ f min , f max ]. Indeed, what our results imply is that for any polynomial f and any compact connected K satisfying (9), the asymptotic behaviour of the smallest root of p f ,i is in f min + O(log 2 r /r 2 ).