1 Introduction

In this study, we consider a method to obtain a good numerical integration formula over a probability measure. In other words, given a random variable X taking values in some abstract space, we want an accurate approximation of the expectation \(\mathrm {{E}}\left[ f(X)\right]\) for each integrand f.

If there are no conditions for the class of integrands (except that f(X) is integrable), it is expected that the Monte Carlo method [11, 20] should be the best way. For i.i.d. copies \(X_1,\ldots , X_N, \ldots\) of X, the sample mean of \(\frac{1}{N}\left( f(X_1)+\cdots +f(X_N)\right)\) is an unbiased estimator of \(\mathrm {E}\left[ f(X)\right]\) and, according to the law of large numbers, converges to the expectation with probability 1. If we assume \(\mathrm {E}\left[ |f(X)|^2\right] <\infty\), the standard variation of Monte Carlo integration is \(\mathrm {O}(\frac{1}{\sqrt{N}})\). Therefore, we can regard this as a numerical integration method with error \(\mathrm {O}(\frac{1}{\sqrt{N}})\).

However, in real applications, we often assume that the integrand f has good properties, such as smoothness, asymptotic behavior, and the decay of coefficients when expanded by a certain basis. Indeed, these itemized properties are not independent of each other; however, they are possibly suited for different numerical integration methods. For instance, if X is a uniform distribution and integrands have a certain smoothness over a hypercube \([0, 1]^s\), then the quasi Monte Carlo (QMC) sampling works better than the standard Monte Carlo method [8]. Also, variants of Monte Carlo construction methods have recently been considered by [13] and [1]. In particular, the authors of these two papers exploit determinantal point processes, which physically model the distribution of repulsing particles.

If X takes values in some Euclidean space, and the integrand f can be well-approximated by polynomials, cubature (or quadrature) formulas are useful [7, 23, 26]. The simplest one is the approximation

$$\begin{aligned} \int _0^1f(x)\,\mathrm {d}x\simeq \frac{1}{6}f(0)+\frac{2}{3}f\left( \frac{1}{2}\right) +\frac{1}{6}f(1), \end{aligned}$$

which is called Simpson’s rule. This ‘\(\simeq\)’ is replaced by ‘\(=\)’ if f is a polynomial with three degrees at most. In general, a cubature formula of degree t is composed of points \(x_1,\ldots , x_n\) and weights \(w_1,\ldots ,w_n>0\) satisfying

$$\begin{aligned} \mathrm {E}\left[ f(X)\right] =\sum _{i=1}^nw_if(x_i) \end{aligned}$$

for any polynomial f with degrees of t at most. The existence of this formula is assured by Tchakaloff’s theorem (Theorem 1), but the proof is not constructive and, except for concrete cases, construction is currently unknown. It is not necessary for us to choose polynomials as test functions, so we can consider the generalized cubature formulas on spaces and test functions that are non-necessarily Euclidean and polynomial. Indeed, such formulas are worthy of consideration; moreover, cubature formulas exists for the space of paths (Cubature on Wiener space [18]). As QMC methods and other widely used numerical integration formulas, such as DE (double exponential) methods [27], are limited to the integration of functions with a certain smoothness and domain, Monte Carlo integration is essentially the only available method for general settings. Accordingly, although it is difficult to choose proper test functions, it is valuable to consider general cubature formulas.

Our contribution In this paper, we propose an algorithm to construct generalized cubature formulas that satisfy Tchakaloff’s bound by subsampling points from i.i.d. samples \(X_1, X_2, \ldots\) of X. Therefore, the proposed method is referred to as Monte Carlo cubature construction. This construction is realized by finding a basic feasible solution of a linear programming (LP) problem, providing weights of a cubature formula. Although the proposed method is simple, it may fail if pathological samples \(X_1, X_2, \ldots\) appear. However, we show that the probability of such cases is zero. In other words, we show that one can construct a general cubature formula with probability 1 so long as the i.i.d. sequence \(X_1, X_2, \ldots\) of X and the expectation of test functions are both given. This is the main theoretical result presented by Theorem 11, which has been proven by techniques in discrete geometry.

In some cases, cubature formulas might exist with fewer points than the upper bound given by Tchakaloff’s theorem (\(n = t\) in the aforementioned one-dimensional example). For example, such cubature formulas have been found for particular probability measures and on (particular-dimensional hyper) cubes, triangles, circles (spheres), and balls [7, 23]. However, our construction is rather general as we have almost no restrictions on the probability measure, domain, and test functions.

Furthermore, we also demonstrate that one can construct an approximate cubature formula in the absence of knowing test-function expectations. This construction of an approximate cubature formula is based on the usual Monte Carlo integration, but we can compress the data via construction of an cubature formula.

Although there has been no concrete benefit of our generalized cubature construction, an application to the cubature on Wiener space has been made [12], in which the authors succeeds in constructing a cubature formula on Wiener space with parameters where no construction has been known.

Finally, contribution of the study is summarized as follows:

  • By considering the general situation, we show that we can always construct a cubature formula given appropriate information on the distribution and test fuctions.

  • Practically, our method has significantly contributed to the general construction of cubature on Wiener space given in [12], which is an efficient cubature formula on the space of paths but has been hard to construct.

Organization of the paper The rest of this paper is organized as follows: in Sect. 2, we briefly review cubature formulas, a subsampling technique, and theorems in discrete geometry; in Sect. 3, we present our main results; we give examples of applications and a simple numerical experiment in Sect. 4 to estimate the time complexity of our method; finally, we conclude the paper in Sect. 5.

2 Background

In this section, we outline related studies, including the classic formulation of cubature formulas (and its generalization) and theorems in discrete geometry that are used in the proofs of our results.

2.1 Cubature formula

Herein, we let \(\varOmega \subset \mathbb {R}^m\) be a Borel set and we consider a probability measure \(\mu\) on \(\varOmega\) (with Borel \(\sigma\)-field). Moreover, for an integer \(t\ge 0\), we denote by \(\mathscr {P}_t(\varOmega )\) the set of all real polynomials over \(\varOmega\) with degree t at most. A cubature formula of degree t is a set of points \(x_1,\ldots ,x_n\in \varOmega\) and positive weights \(w_1,\ldots ,w_n\) such that

$$\begin{aligned} \int _\varOmega f(x)\,\mathrm {d}\mu (x) =\sum _{j=1}^n w_jf(x_j) \end{aligned}$$
(1)

holds for any \(f\in \mathscr {P}_t(\varOmega )\). We have assumed that each \(f\in \mathscr {P}_t(\varOmega )\) is integrable with \(\mu\), which is satisfied, for instance, if \(\varOmega\) is bounded or \(\mu\) has a rapidly decreasing density. Once such a formula is obtained \(\sum _{j=1}^n w_jf(x_j)\) is a good approximation of \(\int _\varOmega f(x)\,\mathrm {d}\mu (x)\) for a function f that can be well-approximated by polynomials. In this sense, a cubature formula works as roughly compressed data of \(\mu\); moreover, we call \(\sum _{j=1}^n w_j\delta _{x_j}\) a cubature.

The existence of such a formula is assured by [28] and [2], as outlined below.

Theorem 1

(Tchakaloff’s theorem) If every function in \(\mathscr {P}_t(\varOmega )\) is integrable with \(\mu\); i.e.,

$$\begin{aligned} \int _\varOmega |f(x)|\,\mathrm {d}\mu (x)<\infty , \qquad f\in \mathscr {P}_t(\varOmega ) \end{aligned}$$

holds, then there exists an integer \(1\le n\le \dim \mathscr {P}_t(\varOmega )\), points \(x_1, \ldots , x_n\in {{\,\mathrm{supp}\,}}\mu \ (\subset \varOmega )\), and weights \(w_1,\ldots ,w_n>0\) that can be used to construct a cubature formula of degree t over \((\varOmega , \mu )\).

Note that \({{\,\mathrm{supp}\,}}\mu\) is the closed set defined by \({{\,\mathrm{supp}\,}}\mu :=\bigcap \{\varOmega {\setminus } O\mid O\text { is open}, \mu (O)=0\}\). Since a nonzero constant function belongs to \(\mathscr {P}_t(\varOmega )\), then \(w_1+\cdots +w_n\) must be equal to 1.

This theorem can be understood in the context of discrete geometry. Indeed, if we define a vector-valued function \(\varvec{\varphi }:\mathbb {R}^m\rightarrow \mathbb {R}^{\dim \mathscr {P}_t(\varOmega )}\) with a monomial in each component, then constructing a cubature formula is equivalent to finding a \(\varvec{y}_1,\ldots , \varvec{y}_n\in {{\,\mathrm{Im}\,}}\varvec{\varphi }|_{{{\,\mathrm{supp}\,}}\mu }\) such that the convex hull of \(\{\varvec{y}_1,\ldots ,\varvec{y}_n\}\) contains the point \(\int _\varOmega \varvec{\varphi }(x)\,\mathrm {d}\mu (x)\in \mathbb {R}^{\dim \mathscr {P}_t(\varOmega )}\). For further arguments from this viewpoint, see Sect. 2.3 or [2].

Tchakaloff’s theorem only gives an upper bound of the number of points used in a cubature formula. There exist, several lower bounds for the number of sampled points n. The Fisher-type bound is well-known [26]:

Theorem 2

(Fisher-type bound) For any cubature formula of degree t over the measure space \((\varOmega , \mu )\), the number of sample points n satisfies \(n\ge \dim \mathscr {P}_{\lfloor t/2\rfloor }(\varOmega )\).

However, this study is concerned with finding at least one cubature formula for general settings; in other words, minimizing the number of sample points is not the objective of this paper. See [23] for further general theories and examples of the cubature.

2.2 Carathéodory–Tchakaloff subsampling

Constructing a cubature formula is generally not an easy task. However, since the case \(\mu\) is a product measure of low-dimensional measures (typically one-dimensional), we can easily construct a grid-type cubature formula. Although the proposition outlined below is on the product measures of the same low-dimensional measure, it can be generalized for the product measure of different low-dimensional measures.

Proposition 3

Let points \(x_1, \ldots , x_n\in \varOmega\) and weights \(w_1,\ldots , w_n>0\) make a cubature of degree t. on \((\varOmega , \mu )\). Then, on the k-fold product measure space \((\varOmega ^{\otimes k}, \mu ^{\otimes k})\),

$$\begin{aligned} \sum _{(j_1,\ldots ,j_k)\in \{1,\ldots ,n\}^k} w_{j_1}\cdots w_{j_k} \delta _{(x_{j_1},\ldots ,x_{j_k})} \end{aligned}$$
(2)

is a cubature of degree t.

Proof

Any monomial \(f=f(z_1, \ldots , z_k)\) over \(\varOmega ^{\otimes k}\) with degree at most t can be written by using some monomials \(f_1, \ldots , f_k\) on \(\varOmega\) as \(f(z_1, \ldots , z_k) = \prod _{i=1}^k f_i(z_i)\). Since the sum of degree of \(f_i\) equals to the degree of f, each \(f_i\) belongs to \(\mathscr {P}_t(\varOmega )\). Therefore, by the definition of the cubature of a degree t, we obtain

$$\begin{aligned}&\int _{\varOmega ^{\otimes k}} f(z_1,\ldots ,z_k) \,\mathrm {d}\mu ^{\otimes k}(z_1,\ldots ,z_k)\\&\quad =\prod _{i=1}^k \int _\varOmega f_i(z_i) \,\mathrm {d}\mu (z_i)=\prod _{i=1}^k \sum _{j=1}^n w_jf_j(x_j)\\&\quad =\sum _{(j_1,\ldots ,j_k)\in \{1,\ldots ,n\}^k} w_{j_1}\cdots w_{j_k} f(x_{j_1},\ldots ,x_{j_k}). \end{aligned}$$

This means that (2) is indeed a cubature of degree t. \(\square\)

This construction uses \(n^k\) points and this is likely to be much larger than the Tchakaloff upper bound \(\dim \mathscr {P}_t(\varOmega ^{\otimes k})\). We can reduce the number of points used in such a cubature formula by using a LP problem. A detailed study on this direction (with generalization) is given by [29]. To explain the idea, the situation is generalized. Instead of polynomials, we can naturally take any sequence of basis functions to generalize the definition of a cubature formula given in the previous section. Indeed, if we know the class of functions that require integration, there might be more optimal test functions to use than polynomials. Accordingly, the definition outlined below is appropriate.

Definition 4

Given a random variable X taking values in a measurable space \(\mathscr {X}\) and d test functions \(\varphi _1, \ldots , \varphi _d: \mathscr {X}\rightarrow \mathbb {R}\) that satisfy the integrability \(\mathrm {E}\left[ |\varphi _i(X)|\right] <\infty\) for each \(i=1,\ldots , d\), a cubature with respect to X and \(\varphi _1,\ldots ,\varphi _d\) can be described as a set of points \(x_1, \ldots , x_n\in \mathscr {X}\) and positive weights \(w_1, \ldots , w_n\) that satisfies

$$\begin{aligned} \mathrm {E}\left[ \varphi _i(X)\right] =\sum _{j=1}^n w_j\varphi _i(x_j),\qquad i=1,\ldots , d. \end{aligned}$$
(3)

If we write \(\varvec{\varphi }= (\varphi _1, \ldots , \varphi _d)^\top : \mathscr {X}\rightarrow \mathbb {R}^d\), (3) can be equivalently rewritten as

$$\begin{aligned} \mathrm {E}\left[ \varvec{\varphi }(X)\right] =\sum _{j=1}^nw_j\varvec{\varphi }(x_j) \end{aligned}$$
(4)

For this generalization of the cubature formula, we have the generalized Tchakaloff’s theorem [2, 23].

Theorem 5

(Generalized Tchakaloff’s theorem) Under the same setting as Definition 4, there exists a cubature formula with \(n\le d+1\). Moreover, if there exists a vector \(c\in \mathbb {R}^d\) such that \(c^\top \varvec{\varphi }(X)\) is essentially a nonzero constant, there exists a cubature formula satisfying \(n\le d\) and \(w_1+\cdots +w_n=1\).

We can take \(x_1,\ldots ,x_n\in \mathscr {X}\) so as to meet \(\varvec{\varphi }(x_1),\ldots ,\varvec{\varphi }(x_n) \in {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\), where \(\mathrm {P}_{\varvec{\varphi }(X)}\) is the distribution of \(\varvec{\varphi }(X)\) over \(\mathbb {R}^d\), which corresponds to the condition \(x_1,\ldots ,x_n\in {{\,\mathrm{supp}\,}}\mu\) in Theorem 1. This theorem is essentially a direct consequence of Carathéodory’s theorem (Theorem 7).

Remark 1

We suppose that there is a probability space \((\varOmega , \mathscr {F}_\varOmega , \mathrm {P})\) \(\mathscr {X}\) has a natural \(\sigma\)-field \(\mathscr {F}_\mathscr {X}\), and that the random variable X is a measurable map from \((\varOmega , \mathscr {F}_\varOmega )\) to \((\mathscr {X}, \mathscr {F}_\mathscr {X})\). In addition, we also assume that the test function \(\varvec{\varphi }\) is a measurable map from \((\mathscr {X}, \mathscr {F}_\mathscr {X})\) to \((\mathbb {R}^d, \mathscr {B}(\mathbb {R}^d))\). Accordingly, \(\varvec{\varphi }(X)\) is a random variable on \(\mathbb {R}^d\) (measurable map from \((\varOmega , \mathscr {F})\) to \((\mathbb {R}^d, \mathscr {B}(\mathbb {R}^d))\)). Therefore, the support of its distribution

$$\begin{aligned} {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}=\{ x\in \mathbb {R}^d \mid \mathrm {P}(\varvec{\varphi }(X)\in A)>0 \text { for an arbitrary open neighborhood } A\text { of }x \} \end{aligned}$$

coincides with the smallest closed set B satisfying \(\mathrm {P}(\varvec{\varphi }(X)\in B)=1\).

We introduce a technique called Carathéodory–Tchakaloff subsampling [21], which reduces the number of points in a discrete measure. This is briefly explained below, where the argument is limited to probability measures.

Method 6

(Carathéodory–Tchakaloff subsampling) Here, we explain a method to obtain a compressed discrete measure \({\hat{\mu }}_{\mathrm {CT}}\) from a given discrete measure \({\hat{\mu }}\).

  1. (1)

    We have some discrete probability measure \({\hat{\mu }}\) on some space \(\mathscr {X}\), which can be written as

    $$\begin{aligned} {\hat{\mu }}=\sum _{x\in F} a_x \delta _{x} \end{aligned}$$

    for some finite set \(F\subset \mathscr {X}\) and \(a_x>0\). A typical case is that the measure \({\hat{\mu }}\) is already an approximation of some non-discrete measure.

  2. (2)

    Consider some test functions \(\varphi _1,\ldots ,\varphi _d \in L^1(X, {\hat{\mu }})\) that we want to integrate. For the purposes of simplicity, we assume \(\varphi _1\equiv 1\). In the case \(\mathscr {X}\) is Euclidean space, the natural choice of \(\varphi _i\) is a polynomial.

  3. (3)

    If |F| is much larger than d, we can reconstruct a discrete measure

    $$\begin{aligned} {\hat{\mu }}_{\mathrm {CT}}=\sum _{x\in G} b_x\delta _x \end{aligned}$$

    with \(G\subset F\), \(|G|\le d\), and \(b_x>0\), the existence of which is assured by Theorem 5. The obtained measure \({\hat{\mu }}_{\mathrm {CT}}\) is equal to \({\hat{\mu }}\) in integrating test functions \(\varphi _1,\ldots ,\varphi _d\), so it can be regarded as a compression of \({\hat{\mu }}\).

[21] gives a brief survey on this method. This resampling method has recently been developed in different backgrounds, such as numerical quadrature and stochastic analysis [3, 14, 16, 22, 24]. To use this method, we must constructively obtain \({\hat{\mu }}_{\mathrm {CT}}\). Indeed, this measure is obtained as a basic feasible solution of the following LP problem with a trivial objective function:

$$\begin{aligned} \begin{array}{rl} \text {minimize} &{} 0 \\ \text {subject to} &{} A\varvec{z}=\varvec{b},\ \varvec{z}\ge \varvec{0}, \end{array} \end{aligned}$$

where \(A\in \mathbb {R}^{d\times F}\) and \(\varvec{b}\in \mathbb {R}^{d}\) are defined as

$$\begin{aligned} A:=\left[ \varvec{\varphi }(x) \right] _{x\in F} \qquad \varvec{b}:=\int _\mathscr {X}\varvec{\varphi }(x) \,\mathrm {d}{\hat{\mu }}(x), \qquad \varvec{\varphi }:=(\varphi _1,\ldots ,\varphi _d)^\top . \end{aligned}$$

[29] compares the simplex method and the singular-value decomposition method in solving this problem. Note that this Carathéodory–Tchakaloff subsampling scheme is applicable to the reduction of points in (classical) cubature formulas, as in Proposition 3.

Remark 2

During the process of the above subsampling, we do not exploit the entire information of the measure \({\hat{\mu }}\). Rather, we only use the set F and the expectation vector \(\varvec{b}\). Therefore, the above method is also applicable to the problem outlined below.

Given a finite set F, a vector-valued function \(\varvec{\varphi }:\mathscr {X}\rightarrow \mathbb {R}^d\), and a vector \(\varvec{b}\in \mathbb {R}^d\), find a discrete measure \({\hat{\mu }}_{\mathrm {CT}}=\sum _{x\in G}b_x\delta _x\) such that

  • G is a subset of F with at most d elements;

  • \(\int _\mathscr {X}\varvec{\varphi }(x)\,\mathrm {d}{\hat{\mu }}_{\mathrm {CT}}(x)=\varvec{b}\) holds.

In this formulation, the problem is feasible if, and only if, \(\varvec{b}\) is contained in the convex hull of the set \(\{\varvec{\varphi }(x)\mid x\in F\}\).

In this paper, we are primarily interested in the construction of cubature formulas when the larger discrete measure \({\hat{\mu }}\) is not given. Accordingly, we also use the terminology “Carathéodory–Tchakaloff subsampling” in the sense of Remark 2. The precise explanation of our problem formulation is given in Sect. 3.

2.3 Preliminaries from discrete geometry

Let us introduce some useful notions and assertions in discrete geometry (see [19] for details). For \(S\subset \mathbb {R}^d\), the convex hull of S (denoted by \({{\,\mathrm{conv}\,}}S\)) is defined as

$$\begin{aligned} {{\,\mathrm{conv}\,}}S:=\left\{ \sum _{i=1}^m w_ix_i \,\Bigg |\,m\ge 1,\ w_i>0,\ \sum _{i=1}^mw_i=1,\ x_i\in S\right\} . \end{aligned}$$

The following theorem is a well-known result originally outlined by [5].

Theorem 7

(Carathéodory’s Theorem) If \(S\subset \mathbb {R}^d\) and \(x\in {{\,\mathrm{conv}\,}}S\), then \(x\in {{\,\mathrm{conv}\,}}T\) for some \(T\subset S\) with \(|T|\le d+1\).

According to this assertion, Tchakaloff’s theorem (Theorem 5) is essentially a straightforward consequence. Indeed, we can show that \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) under the same condition as in Theorem 5, and there exists \(x_1,\ldots , x_n\in \mathscr {X}\) such that \(n\le d+1\) and \(\varvec{\varphi }(x_1),\ldots ,\varvec{\varphi }(x_n)\in {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\). As to whether \(x\in {{\,\mathrm{conv}\,}}T\) holds is invariant under affine transformations, if we additionally assume \(1\in \mathrm {span}\{\varphi _1,\ldots ,\varphi _d\}\), then we can reduce the number of dimensions in Theorem 7 and, accordingly, a cubature formula can be obtained with \(n\le d\).

Consider point sets in \(\mathbb {R}^d\). For a set \(S\subset \mathbb {R}^d\), the affine hull of S is defined as

$$\begin{aligned} {{\,\mathrm{aff}\,}} S:=\left\{ \sum _{i=1}^m w_ix_i \,\Bigg |\Bigg|\,m\ge 1,\ w_i\in \mathbb {R},\ \sum _{i=1}^m w_i=1,\ x_i\in S \right\} . \end{aligned}$$

Notice that \({{\,\mathrm{aff}\,}} S\) is the smallest affine subspace (shifted linear subspace) including S (and \({{\,\mathrm{conv}\,}}S\)). Using the notion of affine hull, we can define the relative interior of \(S\subset \mathbb {R}^d\) as

$$\begin{aligned} {{\,\mathrm{ri}\,}}S :=\left\{ x\in S \,\Bigg |\,\exists \,\varepsilon >0\ \text {s.t.}\ y\in {{\,\mathrm{aff}\,}} S, \Vert y-x\Vert <\varepsilon \Longrightarrow y\in S \right\} , \end{aligned}$$

which is the interior of \(S\subset {{\,\mathrm{aff}\,}} S\) under the relative topology of \({{\,\mathrm{aff}\,}} S\). The theorem outlined below is a generalization of Carathéodory’s Theorem using the notion of relative interior.

Theorem 8

[4, 25] Let a set \(S\subset \mathbb {R}^d\) satisfy that \({{\,\mathrm{aff}\,}} S\) is a k-dimensional affine subspace of \(\mathbb {R}^d\). Then, for arbitrary point \(x\in {{\,\mathrm{ri}\,}}{{\,\mathrm{conv}\,}}S\), there exists some \(T\subset S\) that satisfies \({{\,\mathrm{aff}\,}} T= {{\,\mathrm{aff}\,}} S\), \(x\in {{\,\mathrm{ri}\,}}{{\,\mathrm{conv}\,}}T\) and \(|T|\le 2k\).

The well-known result outlined below is useful. It can be found in standard textbooks on discrete geometry, convex analysis, and functional analysis.

Theorem 9

(separating hyperplane theorem [19]) Let A and B be convex subsets of \(\mathbb {R}^d\) satisfying \(A\cap B=\emptyset\). Then, there exists a hyperplane H such that A and B are included in different closed-half spaces defined by H. In other words, there exists a unit vector \(c\in \mathbb {R}^d\) and a real number \(z\in \mathbb {R}\) such that \(c^\top x \ge z\) holds for all \(x\in A\) and \(c^\top y \le z\) holds for all \(y\in B\).

3 Cubature construction problems

3.1 Problem setting

In this study, we consider two different problems, both of which have a random variable X taking values in \(\mathscr {X}\) and a vector-valued function \(\varvec{\varphi }:\mathscr {X}\rightarrow \mathbb {R}^d\) with the condition \(\mathrm {E}\left[ \Vert \varvec{\varphi }(X)\Vert \right] <\infty\). For simplicity, we additionally assume that the first element of \(\varphi\) is identically 1. For both problems, we assume we can sample i.i.d. copies \(X_1,X_2,\ldots\) of X.

  1. P1

    Exact cubature problem: Assuming we can calculate the exact value of \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\), find n \((\le d)\) points \(x_1,\ldots ,x_n\in \mathscr {X}\) and weights \(w_1,\ldots ,w_n>0\) such that

    $$\begin{aligned} \varvec{\varphi }(x_1),\ldots ,\varvec{\varphi }(x_n) \in {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)},\qquad \mathrm {E}\left[ \varvec{\varphi }(X)\right] =\sum _{i=1}^n w_i\varphi (x_i). \end{aligned}$$
  2. P2

    Approximate cubature problem: Without any knowledge of the exact value of \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) (except for \(\mathrm {E}\left[ \varphi _1(X)\right] =1\)), find n \((\le d)\) points \(x_1,\ldots ,x_n\in \mathscr {X}\) and weights \(w_1,\ldots ,w_n>0\) such that

    $$\begin{aligned} \varvec{\varphi }(x_1),\ldots ,\varvec{\varphi }(x_n) \in {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)},\qquad \mathrm {E}\left[ \varvec{\varphi }(X)\right] \simeq \sum _{i=1}^n w_i\varphi (x_i). \end{aligned}$$

(P1) is the usual (generalized) cubature construction problem. In (P2), since we do not know \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\), it is almost impossible to find an exact cubature formula. However, Monte Carlo or QMC integration well-approximates the expectation \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) without requiring any prior knowledge of said expectation. Therefore, it is possible to construct an approximate cubature formula.

3.2 Monte Carlo approach to the exact cubature problem

To solve (P1), we can use the i.i.d. samples \(X_1,X_2,\ldots\) as candidates of the points used in the cubature formula. If, for some N, we have

$$\begin{aligned} \mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}\{\varvec{\varphi }(X_1), \ldots , \varvec{\varphi }(X_N)\}, \end{aligned}$$

then we can construct a cubature formula using Carathéodory–Tchakaloff subsampling (Method 6).

Indeed, we can prove that said N exists almost surely and \(\mathrm {E}\left[ \min N\right] <\infty\) (\(\min N\) is Borel measurable, because \(\{(x_1,\ldots , x_n)\in \mathbb {R}^{d\times n} \mid y\not \in {{\,\mathrm{conv}\,}}\{x_1,\ldots , x_n\}\}\) is an open set of \(\mathbb {R}^{d\times n}\) for each \(y\in \mathbb {R}^d\)). This fact can be proved using the lemma outlined below.

Lemma 10

The expectation \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) belongs to the relative interior of \({{\,\mathrm{conv}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\).

Proof

Denote \({{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) by S. First, we show that \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{aff}\,}} S\). If we suppose not, we can take a nonzero vector \(c\in \mathbb {R}^d\) such that \(c^\top x\) is constant for \(x\in S\) and \(c^\top \mathrm {E}\left[ \varvec{\varphi }(X)\right]\) is a different value. Indeed, this would be an absurd assumption to make, since \(c^\top \mathrm {E}\left[ \varvec{\varphi }(X)\right]\) is the expectation of \(c^\top \varvec{\varphi }(X)\), which should be equal to the aforementioned constant. Therefore, we have \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{aff}\,}} S\).

Suppose \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \not \in {{\,\mathrm{ri}\,}}{{\,\mathrm{conv}\,}}S\). By translating if necessary, we can assume \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] =0\) and \({{\,\mathrm{aff}\,}} S\) is a linear subspace of \(\mathbb {R}^d\). Then, for each \(n=1,2,\ldots\), \(A_n:=\{ x\in {{\,\mathrm{aff}\,}} S \mid \Vert x\Vert <1/n\}\) has a nonempty intersection with \(({{\,\mathrm{conv}\,}}S)^c\). Take a sequence \(a_1,a_2,\ldots\) such that \(a_n\in A_n{\setminus } {{\,\mathrm{conv}\,}}S\) for each n. Then, by the separation theorem, we have a sequence of unit vectors \(c_n\in {{\,\mathrm{aff}\,}} S\) that separate \(a_n\) from \({{\,\mathrm{conv}\,}}S\); i.e., they satisfy \(c_n^\top a_n \ge c_n^\top x\) for all \(x\in {{\,\mathrm{conv}\,}}S\). Here, by the compactness of the set \(C:=\{c\in {{\,\mathrm{aff}\,}} S \mid \Vert c\Vert =1\}\), the sequence \(c_1,c_2,\ldots\) has a subsequence that is convergent with \(c\in C\). Because \(c_n^\top a_n \le \Vert c_n\Vert \cdot \Vert a_n\Vert <1/n\) for each n, we have \(c^\top x \le 0\) for all \(x\in {{\,\mathrm{conv}\,}}S\). As we have assumed \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] =0\), we can obtain \(c^\top \varvec{\varphi }(X)=0\) almost surely, so \(S={{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) is included in the lower-dimensional subspace \(\{x\in {{\,\mathrm{aff}\,}} S \mid c^\top x=0\}\) (see Remark 1). This obviously contradicts the definition of \({{\,\mathrm{aff}\,}} S\) (the smallest affine subspace including S). Therefore, we have \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{ri}\,}}{{\,\mathrm{conv}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\). \(\square\)

The theorem outlined below, which shows the existence of desired N, is the main result of this paper.

Theorem 11

Let \(X_1, X_2, \ldots\) be i.i.d. copies of X. Then, with probability 1, there exists a positive integer N such that \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) is contained in \({{\,\mathrm{conv}\,}}\{\varvec{\varphi }(X_1),\ldots ,\varvec{\varphi }(X_N)\}.\)

Proof

Denote \({{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) by S. By Lemma 10 and Theorem 8, there exists a set \(T\subset S\) such that \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{ri}\,}}{{\,\mathrm{conv}\,}}T\) and \(|T|\le 2d\). Then, there exists an \(r>0\) satisfying

$$\begin{aligned} B_r:=\{x\in {{\,\mathrm{aff}\,}} S \mid \Vert x-\mathrm {E}\left[ \varvec{\varphi }(X)\right] \Vert \le r\}\subset {{\,\mathrm{conv}\,}}T. \end{aligned}$$

If the elements of T are subscripted as \(T=\{x_1,\ldots , x_{|T|}\}\), we can regard \({{\,\mathrm{conv}\,}}T\) as the image of the mapping

$$\begin{aligned} \tau :\Delta ^{|T|}\rightarrow {{\,\mathrm{aff}\,}} S;\qquad (t_1, \ldots , t_{|T|})\mapsto t_1x_1+\cdots +t_{|T|}x_{|T|}, \end{aligned}$$

where \(\Delta ^{|T|}:=\{(t_1,\ldots ,t_{|T|})\in \mathbb {R}^{|T|}\mid t_1,\ldots ,t_{|T|}\ge 0,\ t_1+\cdots +t_{|T|}=1\}\).

We prove that, if points \(y_1, \ldots , y_{|T|}\in S\) satisfy \(\Vert y_i-x_i\Vert <r\) for each \(i=1,\ldots , T\), then \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}\{y_1,\ldots , y_{|T|}\}\) holds. If this is not true, there exist the points \(y_1,\ldots , y_{|T|}\) with \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \not \in {{\,\mathrm{conv}\,}}\{y_1,\ldots ,y_{|T|}\}\). Then, by the separation theorem, there exists a hyperplane \(H\subset {{\,\mathrm{aff}\,}} S\) going through \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) such that all the points \(y_1,\ldots ,y_{|T|}\) are contained in one closed-half space (of \({{\,\mathrm{aff}\,}} S\)) made by H. We now can take a point \(z\in B_r\) such that \(\Vert z-\mathrm {E}\left[ \varvec{\varphi }(X)\right] \Vert =r\), \(z-\mathrm {E}\left[ \varvec{\varphi }(X)\right] \perp H\) and z is lying on the other side of H than \(y_1,\ldots ,y_{|T|}\). Since \(B_r\subset {{\,\mathrm{conv}\,}}T\), we can take a weight \((t_1,\ldots ,t_{|T|})\in \tau ^{-1}(z)\) and, accordingly, we have

$$\begin{aligned} \Vert (t_1y_1+\cdots +t_{|T|}y_{|T|})-z\Vert \le t_1\Vert y_1-x_1\Vert +\cdots +t_{|T|}\Vert y_{|T|}-x_{|T|}\Vert < r. \end{aligned}$$

This means that \(t_1y_1+\cdots +t_{|T|}y_{|T|}\) lies on the other side of H than \(y_1,\ldots ,y_{|T|}\), which is impossible. Therefore, the aforementioned assertion is true.

By the definition of \({{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\), we have \(\mathrm {P}(\varvec{\varphi }(X)\in S,\ \Vert \varvec{\varphi }(X)-x_i\Vert <r)>0\) for each \(i=1,\ldots , |T|\). Thus, the assertion of the theorem follows. \(\square\)

As mentioned above, we can also see from the above proof that \(\mathrm {E}\left[ \min N\right] <\infty\). There exists a concept called the (random) degree function that generalizes \(\min N\):

$$\begin{aligned} \deg (x; \mathrm {P}_X):=\min \{n \mid x\in {{\,\mathrm{conv}\,}}\{X_1, \ldots , X_n\}\}, \end{aligned}$$

where \(x\in \mathbb {R}^d\), \(\mathrm {P}_X\) is a probability law over \(\mathbb {R}^d\) and \(X_1,X_2,\ldots\) is an i.i.d. sequence following the law \(\mathrm {P}_X\). This function and the depth, which is related to the degree, have been treated by [6] and [17]. Our interest is \(\deg (\mathrm {E}\left[ X\right] ; \mathrm {P}_X)\), but existing results are based on assuming an absolute continuity (and angular symmetry) of the law \(\mathrm {P}_X\). Indeed, these are strong assumptions; however, the results for depths functions may still be useful in analyzing our method for concrete distributions in future work.

Remark 3

An important note is that we do not need to have an exact sampler of X. Indeed, in addition to the fact \(X_1, X_2,\ldots\) are i.i.d., we have only used the following two conditions in the above proof:

  • \({{\,\mathrm{aff}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X_i)} = {{\,\mathrm{aff}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) holds;

  • \({{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X_i)}\) is a dense subset of \({{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X_i)} \cup {{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\).

Although the first condition may be difficult check in a general situation, in the case \({{\,\mathrm{aff}\,}}{{\,\mathrm{supp}\,}}\mathrm {P}_{\varvec{\varphi }(X)}\) coincide with the full space \(\mathbb {R}^d\) (or its quotient by a subspace taken obviously from the shape of function \(\varvec{\varphi }\)), the situation becomes much easier.

3.3 Approaches to the approximate cubature problem

Considering (P2), there are two cases, which are outlined below.

  1. (a)

    We are familiar with the distribution of X, but we do not know the exact value of \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\).

  2. (b)

    We are not familiar with X.

In (a), we assume there exists known QMC formulas or some general discrete approximation of \(\mathrm {P}_X\). In this case, if \(\varvec{\varphi }\) is a good basis in some sense (e.g., the space of functions we want to integrate), it is not necessary to use all of the information of existing formulas. Mathematically speaking, from a given discrete approximation \({\hat{\mu }}\) of \(\mathrm {P}_X\), we can make a Carathéodory–Tchakaloff subsampling \({\hat{\mu }}_{\mathrm {CT}}\) of \({\hat{\mu }}\). This is one of the usual applications of the subsampling method.

In (b), however, we do not have such a measure in advance. Rather, in such a situation, we generally have to carry out Monte Carlo integration every time we want the integration of some integrand. However, once we have a sufficiently large i.i.d. sample \(X_1,X_2,\ldots ,X_N\) of X, we can construct a subsampling \({\hat{\mu }}_{\mathrm {CT}}\) of the measure \({\hat{\mu }}=\frac{1}{N}\sum _{i=1}^N\delta _{X_i}\). This \({\hat{\mu }}_{\mathrm {CT}}\) is random but statistically satisfies

$$\begin{aligned}&\mathrm {E}\left[ \int _\mathscr {X}\varvec{\varphi }(x)\,\mathrm {d}{\hat{\mu }}_{\mathrm {CT}}(x)\right] =\mathrm {E}\left[ \int _\mathscr {X}\varvec{\varphi }(x)\,\mathrm {d}{\hat{\mu }}(x)\right] =\mathrm {E}\left[ \varvec{\varphi }(X)\right] ,\\&\quad \mathrm {E}\left[ \left| \int _\mathscr {X}\varvec{\varphi }(x)\,\mathrm {d}{\hat{\mu }}_{\mathrm {CT}}(x) -\mathrm {E}\left[ \varvec{\varphi }(X)\right] \right| ^2\right] \\&\quad =\mathrm {E}\left[ \left| \int _\mathscr {X}\varvec{\varphi }(x)\,\mathrm {d}{\hat{\mu }}(x) -\mathrm {E}\left[ \varvec{\varphi }(X)\right] \right| ^2\right] =\mathrm {O}\left( \frac{1}{N}\right) \end{aligned}$$

under an additional moment condition \(\mathrm {E}\left[ \Vert \varvec{\varphi }(X)\Vert ^2\right] <\infty\). The merit of constructing \({\hat{\mu }}_{\mathrm {CT}}\) is that we only need the value at d points for each integrand, whereas the standard Monte Carlo requires N points valuation of each integrand. Therefore, the Monte Carlo approach to (P2) is also effective.

4 Examples and experiments

In this section, we address several examples and give estimated bound of the effective number of samples N in constructing cubature formulas. Examples include ones on which some concrete cubature constructions are already known, but we provide them not for cubature construction but for estimating N in general cases by analogy from concrete examples.

4.1 Uniform measures on a domain with symmetry

If we set the degree t to be 1 in the classic cubature problem, then the problem becomes simple: given a distribution of X on \(\varOmega \subset \mathbb {R}^d\), find weights \(\lambda _1,\ldots ,\lambda _n>0\) and points \(x_1, \ldots , x_n\in \varOmega\) such that \(\sum _{i=1}^n\lambda _i=1\) and \(\sum _{i=1}^n\lambda _ix_i=\mathrm {E}\left[ X\right]\). Note that in such a case the number of test functions is not d but \(d+1\).

Unit sphere One of the most natural examples is that \(\varOmega =S^{d-1}\) is the surface of the unit ball and X is the uniform random variable on \(\varOmega\). In such a case, an explicit calculation of the desired probability is known [30]:

$$\begin{aligned} \mathrm {P}\left( 0 \in {{\,\mathrm{conv}\,}}\{X_1,\ldots ,X_N\}\right) =2^{-N+1}\sum _{k=0}^{N-d-1}\left( {\begin{array}{c}N-1\\ k\end{array}}\right) , \end{aligned}$$

where \(X_1,\ldots ,X_N\) are i.i.d. copies of X. Then, for example, by letting \(N=2d\), the left-hand side probability becomes 1/2.

Vertices of a hypercube Another example is the uniform X on the discrete set \(\varOmega =\{(z_1,\ldots ,z_d)\in \mathbb {R}^d\mid z_i=\pm 1,\ i=1,\ldots ,d\}\). In this case, an asymptotic estimate by [10] is known:

$$\begin{aligned} \mathrm {P}\left( 0 \in {{\,\mathrm{conv}\,}}\{X_1,\ldots ,X_{2d+\lceil c\sqrt{2d}\rceil }\}\right) =\varPhi (c) + o(1),\quad d\rightarrow \infty , \end{aligned}$$

where c is an arbitrary fixed constant and \(\varPhi (x)=\int _{-\infty }^x (2\pi )^{-1/2}e^{-t^2/2}\,\mathrm {d}t\) is the cumulative distribution function of a standard Gaussian. Therefore, taking \(N=2d\) again yields the approximate probability 1/2.

In the above examples, taking N approximately 2d has been efficient. However, the next example illustrates that we cannot always take N linear.

Vertices of a simplex Let us consider the vertices of a d-dimensional simplex. For example, let \(e_1,\ldots ,e_d\) be the standard basis of \(\mathbb {R}^d\) and consider \(\varOmega =\{e_1,\ldots ,e_d, -(e_1+\cdots +e_d)\}\). Let X be the uniform random variable on the discrete set \(\varOmega\) (so we have \(\mathrm {E}\left[ X\right] =0\)). Then, for \(X_1, \ldots , X_N\in \varOmega\), \(0\in {{\,\mathrm{conv}\,}}\{X_1,\ldots ,X_N\}\) is equivalent to \(\{X_1,\ldots ,X_N\}=\varOmega\). Therefore, the expectation of \(M=\min \{N\mid 0\in {{\,\mathrm{conv}\,}}\{X_1,\ldots ,X_N\}\}\) is a solution of the so-called coupon collector problem [9] and explicitly known as

$$\begin{aligned} \mathrm {E}\left[ M\right] =(d+1)\left( 1+\frac{1}{2}+\cdots +\frac{1}{d+1}\right) \sim (d+1)\log (d+1). \end{aligned}$$
(5)

Therefore, unlike the previous examples, we need at least the \(d\log d\) order as a number of samples in constructing a cubature.

4.2 Polynomials on a hypercube

To identify how many samples we should take in constructing cubature formulas, we conducted numerical experiments. For each pair \((s, m) \in \{1,2,3,4,5\}\times \{1,2,3,4,5\}\), we estimated the smallest value of N with

$$\begin{aligned} \mathrm {P}\left( \mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}\{\varvec{\varphi }(X_1),\ldots ,\varvec{\varphi }(X_N)\} \right) \ge \frac{1}{2}, \end{aligned}$$
(6)

where \(\varvec{\varphi }\) is a vector-valued function, the entries of which are composed of all s-variate monomials with degree m at most, and X is a uniform random variable on \([0, 1)^s\), and \(X_1, X_2,\ldots\) are i.i.d. copies of X.

In judging whether \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}\{\varvec{\varphi }(X_1),\ldots ,\varvec{\varphi }(X_n)\}\) holds given the values of \(X_1,\ldots ,X_n\), we used the LP solver glpsol in the package GNU LP KitFootnote 1 (GLPK, version 4.65). Note that we do not have to construct a cubature explicitly in confirming \(\mathrm {E}\left[ \varvec{\varphi }(X)\right]\) is contained in the convex hull of given vectors. Although we used the simplex method in the experiment and actually constructed a cubature formula with the desired size, in this section we can use other means, such as the interior-point method. To estimate the smallest N satisfying Eq. (6), we conducted binary search on \([d_{s, m}, 10000]\), where \(d_{s, m}\) is the dimension of the vector \(\varvec{\varphi }\). During the binary search (with \(n\in [d_{s, m}, 10000]\)), we independently sample \((X^i_1,\ldots ,X^i_n)\) 20 times (\(i=1,\ldots , 20\)); when there were at least 10 indices \(i\in \{1,\ldots ,20\}\) such that \(\mathrm {E}\left[ \varvec{\varphi }(X)\right] \in {{\,\mathrm{conv}\,}}\{\varvec{\varphi }(X^i_1),\ldots ,\varvec{\varphi }(X^i_n)\}\) held, n was judged to be larger than (or equal to) N; otherwise it was judged as being smaller than N. The experimental results are shown in Table 1. Unfortunately, they are not the exact values of N as statistical and numerical errors exist (due to the precision of double); however, they work as estimated values of N.

Table 1 Estimation of N for each (sm). In each entry, “\(A\ (B)\)” means \((\text {estimated N}) = A\) and \(d_{s, m}=B\)

In this polynomial case, N takes values equivalent to roughly \(2d_{s, m}\) or \(3d_{s, m}\). Accordingly, we can obtain a cubature formula with realistic computational costs, as long as \(d_{s, m}\), which is the number of test functions, is not so large.

4.3 Cubature on Wiener space

Cubature on Wiener space [18] is one of the most important examples where we can make use of the generality of the approach in this paper. Given a stochastic process X following the Storatonovich stochastic differential equation

$$\begin{aligned} \,\mathrm {d}X_t = V_0(X_t) + \sum _{i=1}^k V_i(X_t)\circ \mathrm {d}B^i_t, \end{aligned}$$

the problem of computing \(\mathrm {E}\left[ f(X_T)\right]\) efficiently is crucial in mathematical finance and PDE theory. As the randomness of X comes from the Brownian motion \(B=(B^1,\ldots ,B^d)\), \(\mathrm {E}\left[ f(X_T)\right]\) can be regarded as the expectation on the paths space (called Wiener space). Under a certain regularity condition about \(V_i\) and f, the weights \(\lambda _1,\ldots ,\lambda _n>0\) and n \(\mathbb {R}^k\)-valued paths of bounded variaiton \(\omega _j=(\omega _j^1,\ldots ,\omega _j^k):[0, T]\rightarrow \mathbb {R}^k\) satisfying

$$\begin{aligned}&\sum _{j=1}^n\lambda _j \int _{0<t_1<\cdots<t_\ell<T} \,\mathrm {d}\omega _j^{i_1}(t_1)\cdots \,\mathrm {d}\omega _j^{i_\ell }(t_\ell )\nonumber \\&\quad =\mathrm {E}\left[ \int _{0<t_1<\cdots<t_\ell <T}\circ \mathrm {d}B^{i_1}_{t_1}\cdots \circ \mathrm {d}B^{i_\ell }_{t_\ell }\right] \end{aligned}$$
(7)

for every \((i_1,\ldots ,i_\ell )\in \{0,1,\ldots ,k\}^\ell\) with \(\ell + \#\{j\mid i_j=0\}\le m\), where \(\omega _j^0(t)=B^0_t=t\), is called a cubature formula on Wiener space of degree m and known to yield a good approxmation for \(\mathrm {E}\left[ f(X_T)\right]\). This can be regarded as a generalized cubature on a paths space with iterated integrals as test functions. Cubature on Wiener space of degree \(m\ge 5\) is known to outperform the usual Monte Carlo based methods such as the Euler-Maruyama method [15, 18].

The point here is that concrete construction of weights and paths satisfying (7) has not been known for general k and m. However, by applying this paper’s approach, Hayakawa [12] recently proved that one can obtain cubature formulas on Wiener space in general via Monte Carlo sampling of piecewise linear paths and LP feasibility. Although we cannot generate i.i.d. samples of Brownian motion, the argument in Remark 3 enabled the cubature construction by piecewise linear paths. Thanks to this example, our generalized cubature construction turns out to be not an abstact nonsense.

In each experiment given in [12] for \((k, m)=(2,3),(3,3),(4,3),(2,5),(3,5),(2,7)\), where the corresponding Tchakaloff bounds are 20, 47, 94, 119, 516, 696, sampling \(4\times (\text {Tchakaloff bound})\) points was sufficient for constructing a cubature on Wiener space.

4.4 Summary

Within the range of small Tchakaloff bound, namely the number of test functions, it has been sufficient to set N a few times larger than the Tchakaloff bound in experiments for polynomials and iterated integrals as well as explicit calculations for uniform distribution on a unit sphere or vertices of a hypercube.

However, as the estimate (5) indicates, even in a simple example with symmetry N has to be taken as large as \(T\log T\), where T stands for the Tchakaloff bound. Therefore, as future work, characterizing N by using the information of random variables such as moments and dimension should be important.

5 Concluding remarks

In this study, we have shown that we can construct a general cubature formula of a probability measure if we have an i.i.d. sampler from the measure and if we know the exact mean values of the given test functions. We can construct a cubature formula by solving an LP problem given sufficiently large i.i.d. sampling. The proof of this fact is based on the discrete geometry results. We have also seen that our Monte Carlo approach is effective even if we do not know the mean values of the test functions. Indeed, we can construct an approximate cubature formula using points given by the Monte Carlo sampling, a process that can be regarded as a data compression (Carathéodory–Tchakaloff subsampling).

By numerical experiments, we have empirically seen that the size of the LP problem we were required to solve should not be so large compared with the number of test functions. However, there is a counterexample to this empirical assertion as we have seen in (5), so it is important that we estimate the necessary number of samples in general cases.

Although we assumed that we can sample the exact i.i.d. sequence from the given probability measure, in future work we should actually consider the possibility of not having exact samplers (typically the case we use MCMC to sample). Note also that Remark 3 moderately works as such generalization, and it actually works in the construction of cubature on Wiener space.