1 Introduction

Consider the problem of computing the Lebesgue volume \(\lambda (\textbf{K})\) of a compact basic semi-algebraic set \(\textbf{K}\subset \mathbb {R}^n\). For simplicity of exposition we will restrict to the case where \(\textbf{K}\) is the smooth super-level set \(\{\textbf{x}: g(\textbf{x})\ge 0\}\subset \mathbb {R}^n\) of a single polynomial g.

If \(\textbf{K}\) is a convex body then several procedures are available; see e.g. exact deterministic methods for convex polytopes [1], or non deterministic Hit-and-Run methods [19, 24] and the more recent [2, 3]. Even approximating \(\lambda (\textbf{K})\) by deterministic methods is still a hard problem as explained in e.g. [3] and references therein. In full generality with no specific assumption on \(\textbf{K}\) such as convexity, the only general method available is Monte-Carlo, that is, one samples N points according to Lebesgue measure \(\lambda \) normalized on a simple set \(\textbf{B}\) (e.g. a box or an ellipsoid) that contains \(\textbf{K}\). If \(\rho _N\) is the proportion of points that fall into \(\textbf{K}\) then the random variable \(\rho _N\lambda (\textbf{B})\) provides a good estimator of \(\lambda (\textbf{K})\) with convergence guarantees as N increases. However this estimator is non deterministic and neither provides a lower bound nor an upper bound on \(\lambda (\textbf{K})\).

When \(\textbf{K}\) is a compact basic semi-algebraic set, a deterministic numerical scheme described in [8] provides a sequence \((\tau _k)_{k\in \mathbb {N}}\subset \mathbb {R}\) of upper bounds that converges to \(\lambda (\textbf{K})\) as k increases. Briefly,

$$\begin{aligned} \lambda (\textbf{K})&=\inf _{p\in \mathbb {R}[\textbf{x}]}\left\{ \int p\,d\lambda :p\ge \mathbb {1}_\textbf{K}\hbox { on}\ \textbf{B}\right\} , \end{aligned}$$
(1)
$$\begin{aligned} \tau _k&=\inf _{p\in \mathbb {R}[\textbf{x}]_k}\!\left\{ \int p\,d\lambda :p\ge \mathbb {1}_\textbf{K}\hbox { on}\ \textbf{B}\right\} , \end{aligned}$$
(2)

with \(\textbf{x}\mapsto \mathbb {1}_\textbf{K}(\textbf{x})=1\) if \(\textbf{x}\in \textbf{K}\) and 0 otherwise. One can notice that minimizing sequences for (1) and (2) also minimize the \(L^1(\textbf{B},\lambda )\)-norm \(\Vert p - \mathbb {1}_\textbf{K}\Vert _1\) (with convergence to 0 in the case (1)). As the upper bound \(\tau _k> \lambda (\textbf{K})\) is obtained by restricting the search in (2) to polynomials of degree at most k, the infimum is attained and an optimal solution can be obtained by solving a semidefinite program. Of course, the size of the resulting semidefinite program increases with the degree k: this is the so-called Moment-SOS hierarchy; for more details the interested reader is referred to [8].

Also focusing on compact semi-algebraic sets, [11] proposes a symbolic method to compute the volume of \(\textbf{K}\) with absolute precision \(2^{-p}\), in time \(O(p(\log p)^{3+\varepsilon })\) for any \(\varepsilon >0\) as \(p\rightarrow \infty \). This is in sharp contrast with the approach considered here, which consists in approximating problem (1) with the sequence of problems (2) indexed by k. Indeed,

  • In [8], for any \(k\in \mathbb {N}\), \(\tau _k\) is guaranteed to be a converging upper bound for \(\lambda (\textbf{K})\), i.e., \(\tau _k - \lambda (\textbf{K})>0\), while [11] also guarantees convergence of the approximant to \(\lambda (\textbf{K})\) but gives no information on the sign of the difference between the two quantities.

  • [11] uses symbolic computations that can achieve arbitrary precision, while [8] uses numerical computations based on semidefinite programming, limited to floating-point arithmetic precision.

  • The approach of [11] can be used to approximate other quantities than the volume, namely real periods of algebraic surfaces. The approach of [8] was extended to approximate sets relevant in systems control, such as regions of attraction or maximal positively invariant sets (see e.g. [6]). In this context, the present contribution can help improving the Moment-SOS hierarchy for assessing the stability of polynomial differential systems.

When solving problem (2), clearly a Gibbs phenomenonFootnote 1 takes place as one tries to approximate on \(\textbf{B}\) and from above, the discontinuous function \(\mathbb {1}_\textbf{K}\) by a polynomial of degree at most k. This makes the convergence of the upper bounds \(\tau _k\) very slow (even for modest dimension problems). A trick was used in [8] to accelerate this convergence but at the price of loosing monotonicity of the resulting sequence.

In fact (1) is a dual of the following infinite-dimensional linear program (LP) on measures

$$\begin{aligned} \sup _{\mu }\,\{\mu (\textbf{K}):\mu \le \lambda ;\,\mu \in \mathcal {M}(\textbf{K})_+\} \end{aligned}$$
(3)

(where \(\mathcal {M}(\textbf{K})_+\) is the space of finite Borel measures on \(\textbf{K}\)). Its optimal value is also \(\lambda (\textbf{K})\) and is attained at the unique optimal solution \(\mu ^\star := \lambda _\textbf{K}=\mathbb {1}_\textbf{K}\lambda \) (the restriction of \(\lambda \) to \(\textbf{K}\)).

A simple but key observation. As one knows the unique optimal solution \(\mu ^\star =\lambda _{\textbf{K}}\) of (3), any constraint satisfied by \(\mu ^\star \) (in particular, linear constraints) can be included as a constraint on \(\mu \) in (3) without changing the optimal value and the optimal solution. While these constraints provide additional restrictions in (3), they translate into additional degrees of freedom in the dual (hence a relaxed version of (1)), and therefore better approximations when passing to the finite-dimensional relaxed version of (2). A first set of such linear constraints experimented in [14] and later in [15], resulted in drastic improvements but with no clear rationale behind such improvements.

Contribution. The main message and result of this paper is that there is an appropriate set of additional linear constraints on \(\mu \) in (3) such that the resulting dual (a relaxed version of (1)) has an explicit continuous optimal solution with value \(\lambda (\textbf{K})\). These additional linear constraints (called Stokes constraints) come from an appropriate modelling of Stokes’ theorem for integration over \(\textbf{K}\), a refined version of that in [14]. Therefore the optimal continuous solution can be approximated efficiently by polynomials with no Gibbs phenomenon, by the hierarchy of semidefinite relaxations defined in [8] (adapted to these new linear constraints). Interestingly, the technique of proof and the construction of the optimal solution invoke results from the field of elliptic partial differential equations (PDE), namely a recent extension of standard Schauder estimates from Dirichlet problems to Neumann formulations.

Outline. In Sect. 2 we recall the primal-dual linear formulation of the volume problem, and we explain why the dual value is not attained, which results in a Gibbs phenomenon. In Sect. 3 we revisit the acceleration strategy based on Stokes’ theorem, with the aim of introducing in Sect. 4 a more general acceleration strategy and a new primal-dual linear formulation of the volume problem. Our main result, attainment of the dual value in this new formulation, is stated as Theorem 4.2 at the end of Sect. 4. The drastic improvement in the convergence to \(\lambda (\textbf{K})\) is illustrated on various simple examples.

2 Linear Reformulation of the Volume Problem

Consider a compact basic semi-algebraic set

$$\begin{aligned} \textbf{K}:=\{\textbf{x}\in \mathbb {R}^n:g(\textbf{x})\ge 0\} \end{aligned}$$

with \(g \in \mathbb {R}[\textbf{x}]\). We suppose that \(\textbf{K}\subset \textbf{B}\) where \(\textbf{B}\) is a compact basic semi-algebraic set for which we know the moments \(\int _\textbf{B}\textbf{x}^\textbf{k}\,\textrm{d}\textbf{x}\) of the Lebesgue measure \(\lambda _\textbf{B}\), where \(\textbf{x}^\textbf{k}:= x^{k_1}_{1} x^{k_2}_{2} \cdots x^{k_n}_{n}\) denotes a multivariate monomial of degree \(\textbf{k}\in \mathbb {N}^n\). We assume that

$$\begin{aligned} \varvec{\Omega }:= \{\textbf{x}\in \mathbb {R}^n : g(\textbf{x}) > 0\} \end{aligned}$$

is a nonempty open set with closureFootnote 2

$$\begin{aligned} {\overline{\varvec{\Omega }}} = \textbf{K}, \end{aligned}$$

and that its boundary

$$\begin{aligned} \partial \varvec{\Omega }=\partial \textbf{K}=\textbf{K}\setminus \varvec{\Omega }\end{aligned}$$

is \(C^1\) in the sense that it is locally the graph of a continuously differentiable function. We want to compute the Lebesgue volume of \(\textbf{K}\), i.e., the mass of the Lebesgue measure \(\lambda _\textbf{K}\):

$$\begin{aligned} \lambda (\textbf{K}) := \int _{\textbf{K}} d\textbf{x}= \int _{\mathbb {R}^n} d\lambda _\textbf{K}(\textbf{x}). \end{aligned}$$

If \(\textbf{X}\subset \mathbb {R}^n\) is a compact set, denote by \(\mathcal {M}(\textbf{X})\) the space of signed Borel measures on \(\textbf{X}\), which identifies with the topological dual of \(C^0(\textbf{X})\), the space of continuous functions on \(\textbf{X}\). Denote by \(\mathcal {M}(\textbf{X})_+\) the convex cone of non-negative Borel measures on \(\textbf{X}\), and by \(C^0(\textbf{X})_+\) the convex cone of non-negative continuous functions on \(\textbf{X}\).

In [8] a sequence of upper bounds converging to \(\lambda (\textbf{K})\) is obtained by applying the Moment-SOS hierarchy [12] (a family of finite-dimensional convex relaxations) to approximate as closely as desired the (primal) infinite-dimensional LP on measures:

$$\begin{aligned} \begin{aligned}&\max _{\mu }\mu (\textbf{K}) \\&\quad \text {s.t.}\quad \mu \in \mathcal {M}(\textbf{K})_+ \quad \text {and}\quad \lambda _\textbf{B}- \mu \in \mathcal {M}(\textbf{B})_+ \end{aligned} \end{aligned}$$
(4)

whose optimal value is \(\lambda (\textbf{K})\), attained for \(\mu ^\star := \lambda _\textbf{K}\). The LP (4) has an infinite-dimensional LP dual on continuous functions which reads:

$$\begin{aligned} \begin{aligned}&\inf _{w}\int _\textbf{B}w\,d\lambda \\&\quad \text {s.t.}\quad w \in C^0(\textbf{B})_+ \quad \text {and}\quad w\vert _\textbf{K}- 1 \in C^0(\textbf{K})_+. \end{aligned} \end{aligned}$$
(5)

Observe that (5) consists of approximating the discontinuous indicator function \(\mathbb {1}_\textbf{K}\) (equal to one on \(\textbf{K}\) and zero elsewhere) from above by continuous functions w, in minimizing the \(L^1(\textbf{B})\)-norm \(\Vert w-\mathbb {1}_{\textbf{K}}\Vert _1\). Clearly the infimum \(\lambda (\textbf{K})\) is not attained.

Since \(\textbf{K}\) is generated by a polynomial g, and measures on compact sets are uniquely determined by their moments, one may apply the Moment-SOS hierarchy [12] for solving (4). The moment relaxation of (4) consists of replacing \(\mu \) by finitely many of its moments \(\textbf{y}\), say up to degree \(d\in \mathbb {N}\). Then the cone of moments is relaxed by a linear slice of the semidefinite cone constructed from so-called moment and localizing matrices indexed by d, as defined in e.g. [12], and which defines a semidefinite program. Therefore the dual of this semidefinite program (i.e., the dual SOS-hierarchy) is a strengthening of (5) where

  1. (i)

    continuous functions w are replaced with polynomials of increasing degree d, and

  2. (ii)

    nonnegativity constraints are replaced with Putinar’s SOS-based certificates of positivity [18] which translate to semidefinite constraints on the coefficients of polynomials; again the interested reader is referred to [8, 12] for more details.

For each fixed degree d, a valid upper bound on \(\lambda (\textbf{K})\) is computed by solving a primal-dual pair of convex semidefinite programming problems (not described here). As proved in [8] by combining Stone–Weierstrass’ theorem and Putinar’s Positivstellensatz [18],

  1. (i)

    there is no duality gap between each primal semidefinite relaxation of the hierarchy and its dual, and

  2. (ii)

    the resulting sequence of upper bounds converges to \(\lambda (\textbf{K})\) as d increases.

The main drawback of this numerical scheme is its typical slow convergence, observed already for very simple univariate examples, see e.g. [8, Figs. 4.1 and 4.5]. The best available theoretical convergence speed estimates are also pessimistic, with an asymptoptic rate of \(\log \log d\) [10]. Slow convergence is mostly due to the so-called Gibbs phenomenon which is well known in numerical analysis [22, Chap. 9]. Indeed, as already mentioned, solving (5) numerically amounts to approximating the discontinuous function \(\mathbb {1}_\textbf{K}\) from above with polynomials of increasing degree, which generates oscillations and overshoots and slows down the convergence, see e.g. [8, Figs. 4.2, 4.4, 4.6, 4.7, 4.10, 4.12].

Example 1

Let \(\textbf{K}:=[0,1/2]\subset \textbf{B}:=[-1,1]\). In Fig. 1 are displayed the degree-10 and degree-20 polynomials w obtained by solving the dual of SOS strengthenings of problem (4). We can clearly see bumps, typical of a Gibbs phenomenon at points of discontinuity.

Fig. 1
figure 1

Gibbs effect occurring when approximating from above with a polynomial of degree 10 (left red curve) and 20 (right red curve) the indicator function of an interval (black curve)

An idea to bypass this limitation consists of adding certain linear constraints to the finite-dimensional semidefinite relaxations, to make their optimal values larger and so closer to the optimal value \(\lambda (\textbf{K})\). Such linear constraints must be chosen appropriately:

  1. (i)

    they must be redundant for the infinite-dimensional moment LP on measures (4), and

  2. (ii)

    become active for its finite-dimensional relaxations.

This is the heuristic proposed in [14] to accelerate the Moment-SOS hierarchy for evaluating transcendental integrals on semi-algebraic sets. These additional linear constraints on the moments \(\textbf{y}\) of \(\mu ^\star \) are obtained from an application of Stokes’ theorem for integration on \(\textbf{K}\), a classical result in differential geometry. It has been also observed experimentally that this heuristic accelerates significantly the convergence of the hierarchy in other applied contexts, e.g. in chance-constrained optimization problems [23].

3 Introducing Stokes Constraints

In this section we explain the heuristic introduced in [14] to accelerate convergence of the Moment-SOS hierarchy by adding linear constraints on the moments of \(\mu ^\star \). These linear constraints are obtained from a certain application of Stokes’ theorem for integration on \(\textbf{K}\).

3.1 Stokes’ Theorem and its Variants

Theorem 3.1

(Stokes’ theorem)    Let \(\varvec{\Omega }\subset \mathbb {R}^n\) be a \(C^1\) open setFootnote 3 with closure \(\textbf{K}\). For any \((n-1)\)-differential form \(\omega \) on \({\textbf{K}}\), it holds

$$\begin{aligned} \displaystyle \int _{\partial {\varvec{\Omega }}} \omega = \int _{\varvec{\Omega }} d\omega . \end{aligned}$$

Corollary 3.2

In particular, for \(\textbf{u}\in C^1({\textbf{K}})^n\) and \(\omega (\textbf{x})=\textbf{u}(\textbf{x})\cdot \textbf{n}_{\varvec{\Omega }}(\textbf{x})\,d\sigma (\textbf{x})\), where the dot is the inner product, \(\sigma \) is the surface or Hausdorff measure on \(\partial \varvec{\Omega }\) and \(\textbf{n}_{\varvec{\Omega }}\) is the outward pointing normal to \(\partial \varvec{\Omega }\), we obtain the Gauss formula

$$\begin{aligned} \int _{\partial \varvec{\Omega }} \textbf{u}(\textbf{x}) \cdot \textbf{n}_{\varvec{\Omega }}(\textbf{x})\,d\sigma (\textbf{x}) = \int _{\varvec{\Omega }} \textrm{div}~\textbf{u}(\textbf{x})\,d\textbf{x}. \end{aligned}$$
(6)

With the choice \(\textbf{u}(\textbf{x}) := u(\textbf{x})\textbf{e}_i\) where \(u \in C^1({\textbf{K}})\) and \(\textbf{e}_i\) is the vector of \(\mathbb {R}^n\) with one at entry i and zeros elsewhere, for \(i=1,\ldots ,n\), we obtain the dual Gauss formula

$$\begin{aligned} \int _{\partial \varvec{\Omega }}u(\textbf{x})\textbf{n}_{\varvec{\Omega }}(\textbf{x})\,d\sigma (\textbf{x}) = \int _{\varvec{\Omega }} \mathop {\textrm{grad}}u(\textbf{x})\,d\textbf{x}. \end{aligned}$$
(7)

Proof

These are all particular cases of [9, Thm. 6.10.2]. \(\square \)

3.2 Original Stokes Constraints

Associated to a sequence \(\textbf{y}=(y_\textbf{k})_{\textbf{k}\in \mathbb {N}^n} \in \mathbb {R}^{\mathbb {N}^n}\), introduce the Riesz linear functional \(\textrm{L}_\textbf{y}:\mathbb {R}[\textbf{x}] \rightarrow \mathbb {R}\) which acts on a polynomial \(p := \sum _{\textbf{k}} p_\textbf{k}\textbf{x}^\textbf{k}\in \mathbb {R}[\textbf{x}]\) by \(\textrm{L}_\textbf{y}(p):=\sum _{\textbf{k}}p_\textbf{k}y_\textbf{k}\). Thus, if \(\textbf{y}\) is the sequence of moments of \(\lambda _\textbf{K}\), i.e., \(y_\textbf{k}:=\int _\textbf{K}\textbf{x}^\textbf{k}\,d\textbf{x}\) for all \(\textbf{k}\in \mathbb {N}^n\), then \(\textrm{L}_\textbf{y}(p) = \int _\textbf{K}p(\textbf{x})\,d\textbf{x}\) and by (7) with \(u(\textbf{x}) := \textbf{x}^\textbf{k}g(\textbf{x})\):

$$\begin{aligned} \textrm{L}_{\textbf{y}}(\mathop {\textrm{grad}}(\textbf{x}^\textbf{k}g))= \int _\textbf{K}\mathop {\textrm{grad}}(\textbf{x}^\textbf{k}g(\textbf{x}))\,d\textbf{x}=\int _{\partial \textbf{K}}\textbf{x}^\textbf{k}g(\textbf{x})\textbf{n}_\textbf{K}(\textbf{x})\,d\sigma (\textbf{x})=0, \end{aligned}$$

since by construction g vanishes on \(\partial \textbf{K}\). Thus while in the infinite-dimensional LP (4) one may add the linear constraints

$$\begin{aligned} \int _{\textbf{K}}\mathop {\textrm{grad}}(\textbf{x}^\textbf{k}g)\,d\mu =0\qquad \forall \,\textbf{k}\in \mathbb {N}^n, \end{aligned}$$

without changing its optimal value \(\lambda (\textbf{K})\), on the other hand inclusion of the linear moment constraints

$$\begin{aligned} \textrm{L}_\textbf{y}(\mathop {\textrm{grad}}(\textbf{x}^\textbf{k}g)) = 0,\quad \ \ \vert \textbf{k}\vert \le 2d+1-{\text {deg}}(g) \end{aligned}$$
(8)

in the moment relaxation with pseudo-moments \(\textbf{y}\) of degree at most d, will decrease the optimal value of the initial relaxation.

In practice, it was observed that adding constraints (8) dramatically speeds up the convergence of the Moment-SOS hierarchy, see e.g. [14, 23]. One main goal of this paper is to provide a qualitative mathematical rationale behind this phenomenon.

3.3 Infinite-Dimensional Stokes Constraints

In [21], Stokes constraints were formulated in the infinite-dimensional setting, and a dual formulation was obtained in the context of the volume problem. Using (6) with \(\textbf{u}= g \textbf{v}\) (which vanishes on \(\partial \textbf{K}\)) and \(\textbf{v}\in C^1(\textbf{K})^n\) arbitrary, yields:

$$\begin{aligned} \int _\textbf{K}(\mathop {\textrm{grad}}g(\textbf{x})\cdot \textbf{v}(\textbf{x}) + g(\textbf{x})~ \textrm{div}~\textbf{v}(\textbf{x})) \,\textrm{d}\textbf{x}= \int _{\partial \textbf{K}}g\textbf{v}\textbf{n}_\textbf{K}\,d\sigma =0, \end{aligned}$$

which can be written equivalently (in the sense of distributions) as

$$\begin{aligned} (\mathop {\textrm{grad}}g)\lambda _\textbf{K}- \mathop {\textrm{grad}}(g\lambda _\textbf{K}) = 0. \end{aligned}$$

This allows to rewrite problem (4) as

$$\begin{aligned} \begin{aligned}&\max _\mu \mu (\textbf{K}) \\&\quad \text {s.t.}\quad \mu \in \mathcal {M}(\textbf{K})_+, \quad \lambda _\textbf{B}- \mu \in \mathcal {M}(\textbf{B})_+, \quad (\mathop {\textrm{grad}}g)\mu - \mathop {\textrm{grad}}(g \mu ) = 0, \end{aligned} \end{aligned}$$
(9)

without changing its optimal value \(\lambda (\textbf{K})\) attained at \(\mu ^\star = \lambda _\textbf{K}\). Using infinite-dimensional convex duality as in e.g. the proof of [6, Thm. 2], the dual of LP (9) reads

$$\begin{aligned} \begin{aligned}&\inf _{\textbf{v},w}\int _\textbf{B}w \,d\lambda \\&\quad \text {s.t.}\quad \textbf{v}\in C^1(\textbf{K})^n,\quad w \in C^0(\textbf{B})_+,\quad w\vert _\textbf{K}- \textrm{div}(g\textbf{v}) - 1 \in C^0(\textbf{K})_+. \end{aligned} \end{aligned}$$
(10)

Crucial observation. Notice that w in (10) is not required to approximate \(\mathbb {1}_\textbf{K}\) from above anymore. Instead, it should approximate \(1 + \textrm{div}(g\textbf{v})\) on \(\textbf{K}\) and 0 outside \(\textbf{K}\). Hence, provided that \(1+\textrm{div}(g\textbf{v}) = 0\) on \(\partial \textbf{K}\), w might be a continuous function for some well-chosen \(\textbf{v}\in C^1(\textbf{K})^n\), and therefore an optimal solution of (10) (i.e., the infimum is a minimum). As a result, the Gibbs phenomenon would disappear and convergence would be faster.

The issue is then to determine whether the infimum in (10) is attained or not. And if not, are there other special features of problem (10) that can be exploited to yield more efficient semidefinite relaxations?

4 New Stokes Constraints and Main Result

In the previous section, the Stokes constraint

$$\begin{aligned} \int _\textbf{K}(\textbf{v}(\textbf{x}) \cdot \mathop {\textrm{grad}}g(\textbf{x}) + g(\textbf{x})~ \textrm{div}~ \textbf{v}(\textbf{x}))\,d\mu (\textbf{x}) = 0 \end{aligned}$$

or equivalently (in the sense of distributions)

$$\begin{aligned} (\mathop {\textrm{grad}}g) \mu - \mathop {\textrm{grad}}(g \mu ) = 0 \end{aligned}$$
(11)

(with \(\mu \in \mathcal {M}(\textbf{K})_+\) being the Lebesgue measure on \(\textbf{K}\)) was obtained as a particular case of Stokes’ theorem with \(\textbf{u}=g\textbf{v}\) in (6). Instead, we can use a more general version with \(\textbf{u}\) not in factored form, and also use the fact that for all \(\textbf{x}\in \partial \textbf{K}\), \(0\ne \mathop {\textrm{grad}}g(\textbf{x})=-|{\mathop {\textrm{grad}}g(\textbf{x})}|\textbf{n}_\textbf{K}(\textbf{x})\) (here \(|\textbf{y}| := \sqrt{\textbf{y}\cdot \textbf{y}}\) is the n-dimensional Euclidean norm), to obtain

$$\begin{aligned} \int _{\textbf{K}}\textrm{div}~\textbf{u}(\textbf{x})\,d\mu (\textbf{x})=-\int _{\partial \textbf{K}} \textbf{u}(\textbf{x})\cdot \mathop {\textrm{grad}}g(\textbf{x})\,d\nu (\textbf{x}), \end{aligned}$$

or equivalently (in the sense of distributions)

$$\begin{aligned} \mathop {\textrm{grad}}\mu =(\mathop {\textrm{grad}}g)\nu , \end{aligned}$$
(12)

with \(\mu \in \mathcal {M}(\textbf{K})_+\) being the Lebesgue measure on \(\textbf{K}\) and \(\nu \in \mathcal {M}(\partial \textbf{K})_+\) being the measure having density \(1/|{\mathop {\textrm{grad}}g(\textbf{x})}|\) with respect to the \((n-1)\)-dimensional Hausdorff measure \(\sigma \) on \(\partial \textbf{K}\). The same linear equation was used in [15] to compute moments of the Hausdorff measure. In fact, (12) is a generalization of (11) in the following sense.

Lemma 4.1

If \(\nu \in \mathcal {M}(\partial \textbf{K})_+\) is such that \(\mu \in \mathcal {M}(\textbf{K})_+\) satisfies (12), then \(\mu \) also satisfies (11).

Proof

Equation (12) means that \(\int _{\textbf{K}}\textrm{div}~\textbf{u}(\textbf{x})\,d\mu (\textbf{x})+\int _{\partial \textbf{K}}\textbf{u}(\textbf{x})\cdot \mathop {\textrm{grad}}g(\textbf{x})\,d\nu (\textbf{x})=0\) for all \(\textbf{u}\in C^1(\textbf{K})^n\). In particular if \(\textbf{u}= g\textbf{v}\) for some \(\textbf{v}\in C^1(\textbf{K})^n\) then (12) reads

$$\begin{aligned} \int _\textbf{K}(\textbf{v}(\textbf{x})\cdot \mathop {\textrm{grad}}g(\textbf{x}) + g(x)~\textrm{div}~ \textbf{v}(\textbf{x}))\,d\mu (\textbf{x}) = 0, \end{aligned}$$

which is precisely (11). \(\square \)

Hence we can incorporate linear constraints (12) on \(\mu \) and \(\nu \), to rewrite problem (4) as

$$\begin{aligned}{} & {} \max _{\mu ,\nu }\mu (\textbf{K}) \nonumber \\{} & {} \quad \text {s.t.}\quad \mu \in \mathcal {M}(\textbf{K})_+,\quad \nu \in \mathcal {M}(\partial \textbf{K})_+,\quad \lambda _\textbf{B}- \mu \in \mathcal {M}(\textbf{B})_+,\\{} & {} \qquad \qquad \qquad \qquad (\mathop {\textrm{grad}}g)\nu - \mathop {\textrm{grad}}\mu =0\nonumber \end{aligned}$$
(13)

without changing its optimal value \(\lambda (\textbf{K})\) attained at \(\mu ^\star = \lambda _\textbf{K}\) and \(\nu ^\star = \sigma /|{\mathop {\textrm{grad}}g}|\). Notice that LP (13) involves two measures \(\mu \) and \(\nu \) whereas LP (9) involves only one measure \(\mu \). Next, by convex duality as in e.g. the proof of [6, Thm. 2], the dual of (13) reads

$$\begin{aligned}{} & {} \inf _{\textbf{u},w}\int _\textbf{B}w \,d\lambda \nonumber \\{} & {} \quad \text {s.t.}\quad \!\textbf{u}\in C^1(\textbf{K})^n,\quad w \in C^0(\textbf{B})_+,\quad \!w\vert _\textbf{K}- \textrm{div}~\textbf{u}- 1 \in C^0(\textbf{K})_+,\quad \!\\{} & {} \quad -(\textbf{u}\cdot \mathop {\textrm{grad}}g)|_{\partial \textbf{K}} \in C^0(\partial \textbf{K})_+.\nonumber \end{aligned}$$
(14)

Our main result states that the optimal value of the dual (14) is attained at some continuous function \((w,\textbf{u})\in C^0(\textbf{B})_+\times C^1(\textbf{K})^n\). Therefore, in contrast with problem (5), there is no Gibbs phenomenon at an optimal solution of the (finite-dimensional) semidefinite strengthening associated with (14).

Let \(\varvec{\Omega }_i\), \(i=1,\ldots ,N\), denote the connected components of \(\varvec{\Omega }\), and let

$$\begin{aligned} m_{\varvec{\Omega }_i}(g):=\frac{1}{\lambda (\varvec{\Omega }_i)}\int _{\varvec{\Omega }_i}g\,d\lambda . \end{aligned}$$

Theorem 4.2

In dual LP (14) the infimum is a minimum, attained at

$$\begin{aligned} w^\star (\textbf{x}) := g(\textbf{x})\sum _{i=1}^N\frac{\mathbb {1}_{\varvec{\Omega }_i}(\textbf{x})}{m_{\varvec{\Omega }_i}(g)},\quad \textbf{x}\in \textbf{B}, \end{aligned}$$

and \(\textbf{u}^\star (\textbf{x}) := \mathop {\textrm{grad}}u(\textbf{x})\), where u solves the Poisson PDE

$$\begin{aligned} {\left\{ \begin{array}{ll} -\Delta u(\textbf{x})=1 - w^\star (\textbf{x}),&{} \quad \textbf{x}\in \varvec{\Omega },\\ \partial _\textbf{n}u(\textbf{x})=0,&{} \quad \textbf{x}\in \partial \varvec{\Omega }.\end{array}\right. } \end{aligned}$$

Remark 1

The Moment-SOS hierarchy associated to LPs (13) and (14) yields upper bounds for the volume. Theorem 4.2 is designed for these LPs but it has a straightforward counterpart for lower bound volume computation, obtained by replacing \(\textbf{K}\) with \(\textbf{B}\setminus \varvec{\Omega }\) in the previous developments, i.e., computing upper bounds of \(\lambda (\textbf{B}\setminus \varvec{\Omega })\). However, two additional technicalities should then be considered:

  • This work only deals with semi-algebraic sets defined by a single polynomial; actually, it immediately generalizes to finite intersections of such semi-algebraic sets, as long as their boundaries do not intersect (i.e., here \(\textbf{K}\) should be included in the interior of \(\textbf{B}\)): the constraints on boundaries should just be split between the boundaries of the intersected sets.

  • This work heavily relies on the fact that the boundary of the considered set should be smooth; for this reason, computing lower bounds of the volume implies that one chooses a smooth bounding box \(\textbf{B}\) (typically a euclidean ball, ellipsoid or \(\ell ^p\) ball), which rules out simple sets like the hypercube \([-1,1]^n\).

Upon taking into account these technicalities, Theorem 4.2 still holds, allowing to deterministically compute upper and lower bounds for the volume, with arbitrary precision. Of course in practice, one is limited by the performance of state-of-art SDP solvers.

5 Proof of Main Result

Theorem 4.2 is proved in several steps as follows:

  • we show that the optimal dual solution satisfies a Poisson PDE;

  • we study the Poisson PDE on a union of connected domains;

  • we construct an explicit optimum for problem (14).

5.1 Equivalence to a Poisson PDE

Lemma 5.1

Problem (14) has an optimal solution iff there exist \(\textbf{u}\in C^{1}({\textbf{K}})^{n}\) and \(h \in C^{0}({\textbf{K}})_{+}\) solving

$$\begin{aligned} h&= 0 \qquad \quad \quad \,\,\text { on } \partial \varvec{\Omega }, \end{aligned}$$
(15a)
$$\begin{aligned} - \textrm{div}~\textbf{u}&=1-h \qquad \text { in } \varvec{\Omega }, \end{aligned}$$
(15b)
$$\begin{aligned} \textbf{u}\cdot \textbf{n}_{\varvec{\Omega }}&= 0 \qquad \qquad \,\,\text { on } \partial \varvec{\Omega }. \end{aligned}$$
(15c)

Proof

Let \((\textbf{u},h)\) solve (15). Using (15a), one can define

$$\begin{aligned} w(\textbf{x}) ={\left\{ \begin{array}{ll}h(\textbf{x}) &{} \text { if } \textbf{x}\in {\textbf{K}}, \\ 0 &{} \text { if } \textbf{x}\in \textbf{B}\setminus {\textbf{K}}. \end{array}\right. } \end{aligned}$$

Then \((\textbf{u},w)\) is feasible for (14) and one has

$$\begin{aligned} \int _\textbf{B}w \,d\lambda&= \int _{\varvec{\Omega }} h \,d\lambda {\mathop {=}\limits ^{}}{(15b)}\int _{\varvec{\Omega }} (1 + \textrm{div}~\textbf{u}) \, d\lambda {\mathop {=}\limits ^{}}{(6)} \lambda (\varvec{\Omega }) + \int _{\partial \varvec{\Omega }}\textbf{u}\cdot \textbf{n}_{\varvec{\Omega }} \,d\sigma {\mathop {=}\limits ^{}}{(15c)} \lambda (\varvec{\Omega }), \end{aligned}$$

so that \((\textbf{u},w)\) is optimal.

Conversely, let \((\textbf{u},w)\) be an optimal solution of problem (14). We know that \((\mu ^\star ,\nu ^\star )=(\lambda _{\varvec{\Omega }},\sigma /{|{\mathop {\textrm{grad}}g}|})\) is optimal for problem (13). Then, the KKT optimality conditions ensure complementary slackness:

$$\begin{aligned}&\int _{\varvec{\Omega }} (w|_{\varvec{\Omega }} - \textrm{div}~\textbf{u}- 1) \,d\lambda = 0, \end{aligned}$$
(16a)
$$\begin{aligned}&\int _{\partial \varvec{\Omega }} \textbf{u}\cdot \frac{\mathop {\textrm{grad}}g}{|{\mathop {\textrm{grad}}g}|} \,d\sigma = 0. \end{aligned}$$
(16b)

Since \(w|_{\varvec{\Omega }} - \textrm{div}~\textbf{u}- 1\) is nonnegative, (16a) yields (15b) with \(h:=w|_{\varvec{\Omega }}\). Likewise, since \(-(\textbf{u}\cdot \mathop {\textrm{grad}}g)|_{\partial \varvec{\Omega }}\) is nonnegative, (16b) yields (15c) and thus, using (6), it holds \(\int _{\varvec{\Omega }}\textrm{div}~\textbf{u}\,d\lambda = 0\). Eventually, (16a) yields \(\int _{\varvec{\Omega }} w \,d\lambda = \lambda (\varvec{\Omega }) = \int _\textbf{B}w\,d\lambda \) by optimality of w, so that \(\int _{\textbf{B}\setminus \varvec{\Omega }} w \,d\lambda = 0\) and, since w is nonnegative, \(w|_{\textbf{B}\setminus \varvec{\Omega }} = 0\). Continuity of w finally allows to conclude that \(w = 0\) on \(\partial \varvec{\Omega }\), which is exactly (15a). \(\square \)

From Lemma 5.1, existence of an optimum for (14) is then equivalent to existence of a solution to (15), which we rephrase as follows, defining \(f:=1-h\) and \(\textbf{u}=\mathop {\textrm{grad}}u\) with \(u\in C^2({\textbf{K}})\), and where \(\Delta u:=\text {div grad } u\) is the Laplacian of u, and \(\partial _\textbf{n}u := \mathop {\textrm{grad}}u \cdot \textbf{n}_{\varvec{\Omega }}\).

Lemma 5.2

If there exist \(u \in C^2({\textbf{K}})^n\) and \(f \in C^0({\textbf{K}})\) solving

$$\begin{aligned} -\Delta u&= f \;\text { in } \varvec{\Omega }, \end{aligned}$$
(17a)
$$\begin{aligned} \partial _\textbf{n}u&= 0 \;\text { on } \partial \varvec{\Omega }, \end{aligned}$$
(17b)
$$\begin{aligned} f&\le 1 \;\text { in } \varvec{\Omega }, \end{aligned}$$
(17c)
$$\begin{aligned} f&= 1 \; \text { on } \partial \varvec{\Omega }, \end{aligned}$$
(17d)

then problem (14) has an optimal solution.

This rephrasing is a Poisson PDE (17a) with Neumann boundary condition (17b), whose source term f is a parameter subject to constraints (17c) and (17d).

Remark 2

(loss of generality)    Looking for \(\textbf{u}\) under the form \(\textbf{u}= \mathop {\textrm{grad}}u\) makes us loose the equivalence. Indeed, while (14) and (15) are equivalent, existence of a solution to (17) is only a sufficient condition for existence of an optimum for (14), since (15) might have only solutions \(\textbf{u}\) that are not gradients.

Remark 3

(invariant set for gradient flow)    From a dynamical systems point of view, the constraint in (14) which states that the inner product of \(\textbf{u}=\mathop {\textrm{grad}}u\) with \(\mathop {\textrm{grad}}g\) is non-positive on \(\partial \varvec{\Omega }\), means that we are looking for a velocity field or control \(\textbf{u}\) in the form of the gradient of a potential u such that \({\textbf{K}}\) is an invariant set for the solutions \(t\in \mathbb {R}\mapsto \textbf{x}(t)\in \mathbb {R}^n\) of the Cauchy problem

$$\begin{aligned} {\dot{\textbf{x}}}(t) = -\mathop {\textrm{grad}}u(\textbf{x}(t)),\quad \textbf{x}(0) \in \textbf{B}\end{aligned}$$

after what we just have to define \(h := 1 + \Delta u\) on \(\varvec{\Omega }\).

5.2 Regular Solutions to the Poisson PDE

It remains to prove existence of solutions to problem (17). First, notice that PDE (17a) together with its boundary condition (17b) enforces an important constraint on the source term f, namely its mean must vanish:

$$\begin{aligned} \int _{\varvec{\Omega }} f \,d\lambda = 0. \end{aligned}$$
(18)

Indeed, if (fu) solves (17), then

$$\begin{aligned} \int _{\varvec{\Omega }} f \,d\lambda {\mathop {=}\limits ^{}}{(17a)} - \int _{\varvec{\Omega }} \Delta u \,d\lambda {\mathop {=}\limits ^{}}{(6)} - \int _{\partial \varvec{\Omega }} \mathop {\textrm{grad}}u \cdot \textbf{n}_{\varvec{\Omega }} \,d\sigma {\mathop {=}\limits ^{}}{(17b)} 0. \end{aligned}$$

Moreover, the following holds.

Lemma 5.3

(existence and regularity on a connected domain) Suppose that \(\varvec{\Omega }\) is connected. Let the source term f be Lipschitz continuous on \(\textbf{K}\) and have zero mean on \(\varvec{\Omega }\). Then there exists \(u \in C^{2}({\textbf{K}})\) satisfying (17a) and (17b).

Proof

This is a direct application of [17]: for \(\alpha \in (0,1)\), since \(\textbf{K}\) is bounded (let \(R>0\) be such that \(\textbf{K}\subset \{\textbf{x}\in \mathbb {R}^n:\Vert \textbf{x}\Vert \le R \}\)) and f is Lipschitz (let L be its Lipschitz constant on \(\textbf{K}\)), one has for \(\textbf{x},\textbf{y}\in \textbf{K}\) that

$$\begin{aligned} |f(\textbf{x}) - f(\textbf{y})| \le L \Vert \textbf{x}-\textbf{y}\Vert \le L \Vert \textbf{x}-\textbf{y}\Vert ^{1-\alpha } \Vert \textbf{x}-\textbf{y}\Vert ^\alpha \le \underbrace{L (2R)^{1-\alpha }}_{< \infty } \Vert \textbf{x}-\textbf{y}\Vert ^\alpha , \end{aligned}$$

so that

$$\begin{aligned} f \in C^{0,\alpha }(\textbf{K}) := \biggl \{\varphi \in C^0(\textbf{K}) : \sup _{\textbf{x},\textbf{y}\in \textbf{K}} \frac{|\varphi (\textbf{x})-\varphi (\textbf{y})|}{\Vert \textbf{x}-\textbf{y}\Vert ^\alpha }<\infty \biggr \}, \end{aligned}$$

and [17] yields a solution

$$\begin{aligned} u \in C^{2,\alpha }(\textbf{K}) := \biggl \{\varphi \in C^2(\textbf{K}) : \sup _{\textbf{x},\textbf{y}\in \textbf{K}} \frac{\Vert \textrm{H}(\varphi )(\textbf{x})-\textrm{H}(\varphi )(\textbf{y})\Vert }{\Vert \textbf{x}-\textbf{y}\Vert ^\alpha } < \infty \biggr \} \end{aligned}$$

to the Poisson PDE (17a) with Neumann boundary condition (17b), where \(\textrm{H}(\varphi ) =({\partial ^2\varphi }/{\partial x_i\partial x_j})_{i,j}\) is the Hessian matrix of \(\varphi \). \(\square \)

Remark 4

Assuming that \(\partial \varvec{\Omega }\) is \(C^\infty \) instead of \(C^1\) is actually without loss of generality since \(\varvec{\Omega }\) is a semi-algebraic set: as soon as \(\partial \varvec{\Omega }\) is locally the graph of a \(C^1\) function, it is smooth.

In Lemma 5.3, we assumed that \(\varvec{\Omega }\) is connected, so that we could apply the results of [17]. To tackle non-connected sets, we recall that \(\varvec{\Omega }\) is a semi-algebraic set, hence it has a finite number of connected components \(\varvec{\Omega }_1,\dots ,\varvec{\Omega }_N\). Moreover, the regularity of \(\partial \varvec{\Omega }\) ensures that the \(\varvec{\Omega }_i\) have disjoint closures, so that we can trivially compute a solution \(u_i\) on each \(\varvec{\Omega }_i\) and glue the \(u_i\) together into a solution \(u :=\sum _{i=1}^N \mathbb {1}_{{{\overline{\varvec{\Omega }}}}_i}u_i\) on the whole \(\varvec{\Omega }\).

Remark 5

Tackling the non-connected case requires that f has zero mean on each connected component of \(\varvec{\Omega }\):

$$\begin{aligned} \int _{\varvec{\Omega }_i} f \, d\lambda = 0, \quad \forall \,i \in \{1,\ldots ,N\}. \end{aligned}$$

Remark 6

Lemma 5.3 automatically enforces \(-\Delta u =1\) on \(\partial \varvec{\Omega }\), which is crucial for the continuity of the optimization variable w.

5.3 Explicit Optimum for Volume Computation with Stokes Constraints

Our optimization problem does not feature only the Poisson PDE with Neumann condition: it also includes constraints (17c) and (17d) on the source term. Consequently, a Lipschitz continuous function f on \(\textbf{K}\) with zero integral over any connected component of \(\varvec{\Omega }\) and satisfying (17c) and (17d) remains to be constructed. We keep the notations of Sect. 5.2 and suggest as candidate

$$\begin{aligned} \textbf{x}\,\mapsto \,f(\textbf{x}) := 1 - g(\textbf{x})\sum _{i=1}^N\frac{\mathbb {1}_{\varvec{\Omega }_i}(\textbf{x})}{m_{\varvec{\Omega }_i}(g)}. \end{aligned}$$
(19)

By definition, \(g = 0\) on \(\partial \varvec{\Omega }\), so that (17d) automatically holds. Moreover, both g and \(\mathbb {1}_{\varvec{\Omega }_i}\) are nonnegative on \(\textbf{K}\), so that (17c) also holds. In terms of regularity, f is continuous and piecewise polynomial, so it is Lipschitz continuous on \(\textbf{K}\). Eventually, let \(i\in \{1,\ldots ,N\}\) so that \(\varvec{\Omega }_i\) is a connected component of \(\varvec{\Omega }\). Then, by definition, \(\partial \varvec{\Omega }_i\subset \partial \varvec{\Omega }\), and one has

$$\begin{aligned} \int _{\varvec{\Omega }_i} f \,d\lambda = \int _{\varvec{\Omega }_i} \left( 1 - g(\textbf{x})\sum _{i=1}^N\frac{\mathbb {1}_{\varvec{\Omega }_i}(\textbf{x})}{m_{\varvec{\Omega }_i}(g)}\right) \,\textrm{d}\textbf{x}= \lambda (\varvec{\Omega }_i) - \frac{1}{m_{\varvec{\Omega }_i}(g)} \int _{\varvec{\Omega }_i} g(\textbf{x}) \,\textrm{d}\textbf{x}=0, \end{aligned}$$

by definition of \(m_{\varvec{\Omega }_i}(g)\). This, together with Lemmata 5.2 and 5.3, concludes the proof of Theorem 4.2. Indeed, one can check that for the resulting \(w^\star (\textbf{x}) = g(\textbf{x})\sum _{i=1}^N{\mathbb {1}_{\varvec{\Omega }_i}(\textbf{x})}/{m_{\varvec{\Omega }_i}(g)}\),

$$\begin{aligned} \int _\textbf{B}w^\star \,d\lambda =\sum _{i=1}^N\frac{1}{m_{\varvec{\Omega }_i}(g)}\int _{\varvec{\Omega }_i}\!g\,d\lambda =\sum _{i=1}^N\lambda (\varvec{\Omega }_i)=\lambda (\varvec{\Omega })=\lambda (\textbf{K}). \end{aligned}$$

6 Examples

To illustrate how efficient can be the introduction of Stokes constraints for volume computation, we consider the simple setting where \(\textbf{K}\) is a Euclidean ball included in \(\textbf{B}\) the unit Euclidean ball, as well as some basic variations around this case, where \(\textbf{K}\) is a non-euclidean ball or a union of balls, or \(\textbf{B}\) a non-euclidean ball. Indeed drastic improvements on the convergence are observed. All numerical examples were processed on a standard laptop computer under the Matlab environment with the SOS parser of YALMIP [16], the moment parser GloptiPoly [7] and the semidefinite programming solver of MOSEK [4]. For an interested reader, the codes used to obtain the results presented in this section are available online: https://homepages.laas.fr/henrion/software/stokesvolume/

6.1 Practical Implementation

Following the Moment-SOS hierarchy methodology for volume computation as described in [8], in the (finite-dimensional) degree d semidefinite strengthening of dual problem (14) with unit euclidean ball as the bounding box \(\textbf{B}\):

  • \(w\in \mathbb {R}[\textbf{x}]_{d}\) and \(\textbf{u}\in \mathbb {R}[\textbf{x}]^n_{d}\) are polynomials of degree at most d;

  • the positivity constraint \(w \in C^0(\textbf{B})_+\) is replaced with a Putinar certificate of positivity on \(\textbf{B}\), that is,

    $$\begin{aligned} w(\textbf{x})=\sigma _0(\textbf{x})+\sigma _1(\textbf{x})(1-|x|^2),\quad \ \forall \,\textbf{x}\in \mathbb {R}^n, \end{aligned}$$

    where \(\sigma _0\) (resp. \(\sigma _1\)) is an SOS polynomial of degree at most d (resp. \(d-2\));

  • the positivity constraint \(w\vert _\textbf{K}- \textrm{div}\,\textbf{u}- 1 \in C^0(\textbf{K})_+\) is replaced with a Putinar certificate of positivity on \(\textbf{K}\), that is,

    $$\begin{aligned} w(\textbf{x})-\textrm{div}~\textbf{u}(\textbf{x}) -1=\psi _0(\textbf{x})+\psi _1(\textbf{x})g(\textbf{x}),\quad \forall \,\textbf{x}\in \mathbb {R}^n, \end{aligned}$$

    where \(\psi _0\) (resp. \(\psi _1\)) is an SOS polynomial of degree at most d (resp. \(d-{\text {deg}}(g)\));

  • the positivity constraint \((\textbf{u}\cdot \mathop {\textrm{grad}}g)|_{\partial \textbf{K}} \in C^0(\partial \textbf{K})_+\) is replaced with a Putinar certificate of positivity on \(\partial \textbf{K}\), that is,

    $$\begin{aligned} - \textbf{u}(\textbf{x}) \cdot \mathop {\textrm{grad}}g(\textbf{x})=\eta _0(\textbf{x})+\eta _1(\textbf{x})g(\textbf{x}),\quad \forall \,\textbf{x}\in \mathbb {R}^n, \end{aligned}$$

    where \(\eta _0\) is an SOS polynomial of degree at most d and \(\eta _1\) is a polynomial of degree at most \(d-{\text {deg}}(g)\);

  • the linear criterion \(\int _{\textbf{B}} w \, d\lambda \) translates into a linear criterion on the vector of coefficients of w, as \(\int _{\textbf{B}}\textbf{x}^\alpha \,d\lambda \) is available in closed form.

The above identities define linear constraints on the coefficients of all the unknown polynomials. Next, stating that some of these polynomials must be SOS translates into semidefinite constraints on their respective unknown Gram matrices. The resulting optimization problem is a semidefinite program, called the SOS strengthening of problem (14), and is in lagrangian duality with the so-called moment relaxation of problem (13). Using the strong duality property, we interchangeably use the SOS strengthenings and moment relaxations, as they are equivalent; for more details the interested reader is referred to e.g. [8].

6.2 Bivariate Disk

Let us first illustrate Theorem 4.2 for computing the area of the disk \(\textbf{K}:=\{\textbf{x}\in \mathbb {R}^2 : g(\textbf{x}) = 1/4 - (x_1-1/2)^2 - x^2_2 \ge 0\}\) included in the unit disk \(\textbf{B}:=\{\textbf{x}\in \mathbb {R}^2 : 1 - x^2_1 - x^2_2\ge 0\}\).

Fig. 2
figure 2

Degree-16 polynomial approximations of the disk’s area obtained without Stokes constraints (left) and with Stokes constraints (right)

The degree \(d=16\) polynomial approximation w obtained by solving the SOS strengthening of linear problem (5) is represented at the left of Fig. 2. We can see bumps and ripples typical of a Gibbs phenomenon, since the polynomial should approximate from above the discontinuous indicator function \(\mathbb {1}_\textbf{K}\) as closely as possible. A rather loose upper bound of 1.1626 is obtained on the volume \(\lambda (\textbf{K})=\pi /4\approx 0.7854\).

In comparison, the degree \(d=16\) polynomial approximation w obtained by solving the SOS strengthening of linear problem (14) is represented at the right of Fig. 2. As expected from the proof of Theorem 4.2, the polynomial should approximate from above the continuous function \(g\mathbb {1}_\textbf{K}\lambda (\textbf{K})/(\int g\lambda _\textbf{K})\). The resulting polynomial approximation is smoother and yields a much improved upper bound of 0.7870.

6.3 Higher Dimensions

In Table 1 we report on the dramatic acceleration brought by Stokes constraints in the case of the Euclidean ball \(\textbf{K}:=\{\textbf{x}\in \mathbb {R}^3 :g(\textbf{x}) = (3/4)^2 - |\textbf{x}|^2 \ge 0\}\) of dimension \(n=3\) included in the unit ball \(\textbf{B}\). We specify the relative errors on the bounds obtained by solving moment relaxations with and without Stokes constraints, together with the computational times (in seconds), for a relaxation degree d ranging from 4 to 20. We observe that tight bounds are obtained already at low degrees with Stokes constraints, sharply contrasting with the loose bounds obtained without Stokes constraints. However, we see also that the inclusion of Stokes constraints has a computational price.

Table 1 Relative errors (\(\%\)) and computational times (in brackets in seconds) for solving moment relaxations of increasing degrees d approximating the volume of ball of dimension \(n=3\)

In Table 2 we report the relative errors on the bounds obtained with and without Stokes constraints, together with the computational times (in seconds), for a relaxation degree equal to \(d=10\) (left) resp. \(d=4\) (right) and for dimension n ranging from 1 to 5 (left) resp. from 6 to 10 (right). When \(d=10\) and \(n=5\) the semidefinite relaxation features 6006 pseudo-moments without Stokes constraints, and 12194 pseudo-moments with Stokes constraints. We see that introducing Stokes constraints incurs a computational cost, to be compromised with the expected quality of the bounds.

Table 2 Relative errors (\(\%\)) and computational times (in brackets in seconds) for solving the degree \(d=10\) and \(d=4\) moment relaxation approximating the volume of a ball of increasing dimensions n

Higher dimensional problems can be addressed only if the problem description has some sparsity structure, as explained in [21]. Also, depending on the geometry of the problem, and for larger values of the relaxation degree, alternative polynomial bases may be preferable numerically than the monomial basis which is used by default in Moment and SOS parsers (see [8, Fig. 4.5]).

6.4 Changing the Bounding Box

Choosing the unit euclidean ball as our bounding box \(\textbf{B}\) is the easiest and most standard choice, but one could wonder what happens if we take another set, for example an \(\ell ^p\) ball for \(p > 2\). Let \(\textbf{B}^n_p := \bigl \{\textbf{x}\in \mathbb {R}^n : \Vert \textbf{x}\Vert _p^p= \sum _{i=1}^n |x_i|^p \le 1\bigr \}\) denote the unit \(\ell ^p\) ball in dimension n. We now compute the area of the bivariate disk \(\textbf{K}= \{\textbf{x}\in \mathbb {R}^2 : (3/4)^2 - x_1^2 - x_2^2 \ge 0\}\) included in the unit \(\ell ^p\) ball \(\textbf{B}= \textbf{B}^2_p\) for \(p = 2,4,6,8,10\). To that end, we use the closed formula for the Lebesgue moments on \(\textbf{B}^2_p\) (see Appendix A):

$$\begin{aligned} \int _{\textbf{B}^2_p} \textbf{x}^{\textbf{k}} \, d\textbf{x}&= 0, \qquad \forall \,\textbf{k}\in \mathbb {N}^n \setminus (2\mathbb {N})^n, \qquad \text {and}\\ \int _{\textbf{B}^2_p} \textbf{x}^{2\textbf{k}} \, d\textbf{x}&= \frac{2}{(1+|\textbf{k}|) p}\textrm{B}\biggl (\frac{1+2k_1}{p} , \frac{1+2k_2}{p}\biggr ), \end{aligned}$$

where \(\Gamma (x) :=\int _0^\infty \textrm{e}^{-t}t^{x-1}\,dt\) and \(\textrm{B}(x,y) :={\Gamma (x)\Gamma (y)}/{\Gamma (x+y)}\) are Euler’s Gamma and Beta functions, so that in particular one has \(\Gamma (1+x) = x\Gamma (x)\). We then perform the computations using Matlab’s beta or gamma commands. Our numerical results are reported in Table 3.

Table 3 Relative errors (\(\%\)) and computational times (in brackets in seconds) for solving the degree \(d=8\) and \(d = 16\) moment relaxations approximating the volume of ball of dimension \(n=2\) embedded in an \(\ell ^p\) ball bounding box for \(p=2,4,6,8,10\)

Unsurprisingly, as in the case of the euclidean bounding box, Stokes constraints drastically improve the accuracy of the moment relaxations. However, the number p has an influence on both accuracy (decreasing with p) and computational time (global tendency to slightly decrease with p) in the original as well as Stokes-augmented hierarchies. This can be explained by analyzing the influence of p on the SOS strengthenings described in Sect. 6.1: indeed, the only change is that \(1 - |\textbf{x}|^2\) is replaced with \(1 - \Vert \textbf{x}\Vert _p^p\) in the SOS representation of constraint \(w\in \mathcal {C}(\textbf{B})_+\), so that \(\sigma _1\) now has degree at most \(d-p\) instead of \(d-2\). Consequently, with increasing p, the size of the Gram matrix of \(\sigma _1\) becomes smaller, slightly reducing the size of the corresponding SDP problem, which can result in a reduction of the computational time (although other factors impact the computational time, hence a non monotonic function of p). Conversely, as the degree of \(\sigma _1\) is more limited, this brings less freedom in the search for an optimal solution, hence reducing the accuracy of the SOS strengthening for a fixed degree d.

In terms of the efficiency of Stokes constraints when the bounding box is an \(\ell ^p\) ball, we highlight the fact that the loss in accuracy is less important in the Stokes-augmented hierarchy than in the standard formulation: Stokes constraints are somewhat more robust to the increase in the degree of the polynomial describing the bounding box. Regarding computational times, when they decrease with p, we observe that this decrease is more important with Stokes constraints than without: here again, Stokes constraints lead to a better behaved hierarchy.

6.5 Other Sets

So far we only computed the volume of euclidean balls. In order to further explore the influence of the input polynomials on the efficiency of Stokes constraints as well as the Moment-SOS hierarchy in general, we now switch to computing the volume of more sophisticated semi-algebraic sets in two dimensions: first, we proceed from the euclidean ball to generic \(\ell ^p\) balls, as we did with the bounding box; second, we test the limits of the scheme by approximating the volume of a non-convex, non-connected double disk.

6.5.1 \(\ell ^4\) Disk

As discussed in Sect. 6.4, the degree of the polynomials involved in the Moment-SOS hierarchy has a direct influence on its accuracy, as a higher input degree means, for a fixed degree of the hierarchy, less degrees of freedom to optimize over. We now show on one example what the practical influence of the degree over our scheme’s accuracy is, by computing the approximate volume of the \(\ell ^4\) disk:

$$\begin{aligned} \textbf{K}:= \biggl \{\textbf{x}\in \mathbb {R}^2 : \biggl (\frac{25}{72}\biggr )^{\!4} - x_1^4 - x_2^4 \ge 0\biggr \}. \end{aligned}$$

It is clear that one has \(\lambda (\textbf{K}) =(25/72)^2 \lambda (\textbf{B}_4^2)\) with, using formula (21) from Appendix A, \(\lambda (\textbf{B}_4^2) ={\Gamma (1/4)^2}/({2\sqrt{\pi }})\) so that \(\lambda (\textbf{K}) \approx 0.4471\). We implement the degree-16 SOS strengthenings corresponding to the standard and Stokes-augmented problems, and plot the resulting w in Fig. 3.

Fig. 3
figure 3

Degree-16 polynomial approximations of the area of the \(\ell ^4\) disk obtained without Stokes constraints (left) and with Stokes constraints (right)

Again, the original SOS strengthening is flawed by a strong Gibbs phenomenon that introduces a large error in the volume approximation (we get a bound of 0.8511, i.e., a relative error of \(90\%\)), characterized by wide oscillations on the boundary of the \(\ell ^4\) disk \(\textbf{K}\). The Stokes-augmented version gives a tighter bound of 0.4653 (relative error \(4\%\), still more than for the euclidean disk, but much less than without Stokes constraints). Moreover, an interesting feature appears here that was not visible on Fig. 2 in the case of the euclidean disk: we observe small oscillations of w around 0 on \(\textbf{B}\setminus \textbf{K}\). This can be expected as w is a non-zero polynomial, so it cannot vanish on a set of positive Lebe sgue measure. These observations confirm our predictions that the lower the degree of the involved polynomials, the more accurate the SDP relaxations. However, we are now going to show that some other parameters should be considered when discussing the accuracy of the Moment-SOS hierarchy for volume computation, such as the geometry of the considered set \(\textbf{K}\).

6.5.2 Disconnected Double Disk

We finally test our numerical scheme on a non-convex, non-connected semi-algebraic set:

$$\begin{aligned} \textbf{K}:= \biggl \{\textbf{x}\in \mathbb {R}^2 : \biggl (\frac{1}{16}- \biggl (x_1 - \frac{1}{2}\biggr )^{\!2} - x_2^2\biggr )\biggl (\biggl (x_1 + \frac{1}{2}\biggr )^{\!2} + x_2^2 - \frac{1}{16}\biggr ) \ge 0 \biggr \}. \end{aligned}$$
Fig. 4
figure 4

Degree-16 polynomial approximations of the area of the double disk obtained without Stokes constraints (left) and with Stokes constraints (right)

As usual, in Fig. 4 we observe a Gibbs phenomenon in the standard volume approximation scheme (with a bound of 0.8551 instead of \(\pi /8\approx 0.3927\), i.e., a relative error of \(118\%\)), as well as wide oscillations near the boundary of \(\textbf{K}\). As for the Stokes-augmented scheme, again we get a better bound of 0.4671 (relative error \(19 \%\), interestingly higher than in all the previous cases with Stokes constraints, but still much more accurate than without Stokes constraints). More striking here, even in the Stokes-augmented SOS strengthening, w is seen clearly oscillating. However, this should not be mistaken for a consequence of the Gibbs phenomenon, as in this new formulation w is proved to approximate a Lipschitz continuous function. As a consequence to the Stone–Weierstrass theorem, those remaining oscillations are bound to ultimately vanish as the degree d goes to infinity, while in the case of the Gibbs phenomenon, the oscillations do not ultimately disappear (only their contribution to \(\int w\,d\lambda \) ultimately vanishes).

A possible explanation for this oscillatory phenomenon is that, despite the regularity of the optimizer in the infinite dimensional problem (14), the SOS strengthenings are still very demanding for the polynomial w: indeed, it is requested to be as close to 0 as possible outside \(\textbf{K}\) while being sufficiently large in \(\textbf{K}\) so that its integral is bigger than \(\lambda (\textbf{K})\). In the case of a non-connected \(\textbf{K}\), w is thus literally requested to oscillate.

To conclude on this example, we highlight the fact that, in addition to the degree of the polynomial g defining \(\textbf{K}\), the geometry of \(\textbf{K}\) (typically: its number of connected components and how they are distributed in the bounding box \(\textbf{B}\)) plays a key role in the accuracy of the Moment-SOS hierarchy for computing its volume, both in the original and Stokes-augmented versions. Indeed, it is this geometry that is likely to generate (or, on the contrary, prevent) an oscillatory behavior in the approximating polynomial w, when one gets rid of the Gibbs phenomenon by complementing the hierarchy with Stokes constraints. This is particularly visible when comparing our examples in Sects. 6.5.1 and 6.5.2, where \(\textbf{K}\) is described by degree-4 polynomials, but the schemes are far more accurate in the convex case than in the disconnected case, especially when one adds Stokes constraints.

7 Conclusion

In this paper we proposed a new primal-dual infinite-dimensional linear formulation of the problem of computing the volume of a smooth semi-algebraic set generated by a single polynomial, generalizing the approach of [8] while still allowing the application of the Moment-SOS hierarchy. The new dual formulation contains redundant linear constraints arising from Stokes’ theorem, generalizing the heuristic of [14]. A striking property of this new formulation is that the dual value is attained, contrary to the original formulation. As a consequence, the corresponding dual SOS hierarchy does not suffer from the Gibbs phenomenon, thereby accelerating the convergence.

Numerical experiments (not reported here) reveal that the values obtained with the new Stokes constraints (with a general vector field) are closely matching the values obtained with the original Stokes constraints of [14] (with the generating polynomial factoring the vector field). It may be then expected that the original and new Stokes constraints are equivalent. However at this stage we have not been able to prove equivalence.

The proof of dual attainment builds upon classical tools from linear PDE analysis, thereby building up a new bridge between infinite-dimensional convex duality and PDE theory, in the context of the Moment-SOS hierarchy. We expect that these ideas can be exploited to prove regularity properties of linear reformulations of other problems in data science, beyond volume approximation. For example, it would be desirable to design Stokes constraints tailored to the infinite-dimensional linear reformulation of the region of attraction problem [6] or its sparse version [20].

In terms of practical implementation, while still observed with Stokes constraints, the dependence on the degree of the input polynomials, already discussed in [8], seems to be of less importance. However, the dependence in the geometry of \(\textbf{K}\) now seems to prevail, as Stokes constraints add information on this geometry; more precisely, the simpler the geometry, the more efficient the constraints: the smooth and convex case leads to the best increase in accuracy, but the dual attainment still holds even on disconnected smooth sets. Also, experiments carried out in [14, 21] show that even in the non-smooth case, Stokes constraints drastically improve the accuracy of the volume computing scheme.