Monte Carlo integration of $$C^r$$ functions with adaptive variance reduction: an asymptotic analysis

Plaskota, Leszek; Przybyłowicz, Paweł; Stȩpień, Łukasz

doi:10.1007/s10543-023-00972-0

Monte Carlo integration of $C^r$ functions with adaptive variance reduction: an asymptotic analysis

Open access
Published: 22 May 2023

Volume 63, article number 32, (2023)
Cite this article

Download PDF

You have full access to this open access article

BIT Numerical Mathematics Aims and scope Submit manuscript

Monte Carlo integration of $C^r$ functions with adaptive variance reduction: an asymptotic analysis

Download PDF

985 Accesses
1 Altmetric
Explore all metrics

Abstract

The theme of the present paper is numerical integration of $C^r$ functions using randomized methods. We consider variance reduction methods that consist in two steps. First the initial interval is partitioned into subintervals and the integrand is approximated by a piecewise polynomial interpolant that is based on the obtained partition. Then a randomized approximation is applied on the difference of the integrand and its interpolant. The final approximation of the integral is the sum of both. The optimal convergence rate is already achieved by uniform (nonadaptive) partition plus the crude Monte Carlo; however, special adaptive techniques can substantially lower the asymptotic factor depending on the integrand. The improvement can be huge in comparison to the nonadaptive method, especially for functions with rapidly varying rth derivatives, which has serious implications for practical computations. In addition, the proposed adaptive methods are easily implementable and can be well used for automatic integration.

Toeplitz Monte Carlo

Article 06 January 2021

On efficient weighted integration via a change of variables

Article 24 September 2020

Automatic integration using asymptotically optimal adaptive Simpson quadrature

Article Open access 25 November 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Adaption is a useful tool to improve performance of algorithms. The problems of numerical integration and related to it $L^1$ approximation are not exceptions, see, e.g., [9] for a survey of theoretical results on the subject. If an underlying function possesses some singularities and is otherwise smooth, then using adaption is necessary to localize the singular points and restore the convergence rate typical for smooth functions, see, e.g., [6, 12,13,14]. For functions that are smooth in the whole domain, adaptive algorithms usually do not offer a better convergence rate than nonadaptive algorithms; however, they can essentially lower asymptotic constants. This is why adaptive quadratures are widely used for numerical integration, see, e.g., [1, 3, 7]. Their superiority over nonadaptive quadratures is rather obvious, but precise answers to the quantitative question of “how much adaption helps” are usually missing. This gap was partially filled by recent results of [2, 10, 11], where best asymptotic constants of deterministic algorithms that use piecewise polynomial interpolation were determined for integration and $L^1$ approximation of r-times continuously differentiable functions $f:[a,b]\rightarrow {\mathbb {R}}.$ In this case, adaption relies on adjusting the partition of the interval [a, b] to the underlying function. While the convergence rate is of order $N^{-r},$ where N is the number of function evaluations used, it turns out that the asymptotic constant depends on f via the factor of $(b-a)^r\big \Vert f^{(r)}\big \Vert _{L^1}$ for uniform partition, and $\big \Vert f^{(r)}\Vert _{L^{1/(r+1)}}$ for best adaptive partition.

In the current paper, we present the line of thinking similar to that of the aforementioned papers. The difference is that now we want to carry out the analysis and obtain asymptotic constants for randomized algorithms.

Our goal is the numerical approximation of the integral

$$\begin{aligned} Sf=\int _a^bf(x)\,\mathrm dx. \end{aligned}$$

(1.1)

It is well known that for $f\in L^2(a,b)$ the crude Monte Carlo,

$$\begin{aligned} M_Nf=\frac{b-a}{N}\sum _{i=1}^N f(t_i),\quad \text{ where }\quad t_i{\mathop {\sim }\limits ^{iid}}U(a,b), \end{aligned}$$

(1.2)

returns an approximation with expectation ${\mathbb {E}}(M_Nf)=Sf$ and error (standard deviation)

$$\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-M_Nf\big )^2}=\frac{\sigma (f)}{\sqrt{N}},\quad \text{ where }\quad \sigma (f)^2=(b-a)S(f^2)-(Sf)^2. \end{aligned}$$

(1.3)

Suppose that the function enjoys more smoothness, i.e.,

$$\begin{aligned} f\in C^r([a,b]). \end{aligned}$$

Then a much higher convergence rate $N^{-(r+1/2)}$ can be achieved using various techniques of variance reduction, see, e.g., [4]. One way is to apply a randomized approximation of the form

$$\begin{aligned} {\overline{M}}_{N,r}(f)=S(L_{m,r}f)+M_n(f-L_{m,r}f), \end{aligned}$$

(1.4)

where $L_{m,r}$ is the piecewise polynomial interpolation of f of degree $r-1$ using a partition of the interval [a, b] into m subintervals, $M_n$ is a Monte Carlo type algorithm using n random samples of f, and N is the total number of function evaluations used (for arguments chosen either deterministically or randomly). The optimal rate is already achieved for uniform (nonadaptive) partition and crude Monte Carlo. Then, see Theorem 2.1 with $\beta =0,$ the error asymptotically equals

$$\begin{aligned} c\,(b-a)^{r+1/2}\big \Vert f^{(r)}\big \Vert _{L^2(a,b)}\,N^{-(r+1/2)}, \end{aligned}$$

where c depends only on the choice of the interpolation points within subintervals. The main result of this paper relies on showing that with the help of adaption the asymptotic error of the methods (1.4) can be reduced to

$$\begin{aligned} c\,\big \Vert f^{(r)}\big \Vert _{L^{1/(r+1)}(a,b)}\,N^{-(r+1/2)}, \end{aligned}$$

(1.5)

see Theorems 3.1 and 4.1. Observe that the gain can be significant, especially when the derivative $f^{(r)}$ drastically changes. For instance, for $r=4,$ $[a,b]=[0,1],$ and $f(x)=1/(x+d),$ adaption is asymptotically roughly $5.7*10^{12}$ times better than nonadaption if $d=10^{-4},$ and $1.8*10^{29}$ times if $d=10^{-8}.$

We construct two randomized algorithms, denoted ${\overline{M}}_{N,r}^{\,*}$ and ${\overline{M}}_{N,r}^{\,**},$ that achieve the error (1.5). Although they use different initial approaches; namely, stratification versus importance sampling, in the limit they both reach essentially the same partition, such that the $L^1$ errors of Lagrange interpolation in all subintervals are equalized. However, numerical tests of Sect. 5 show that the algorithm ${\overline{M}}_{N,r}^{\,*}$ achieves the error (1.5) with some delay, which makes ${\overline{M}}_{N,r}^{\,**}$ worth recommending rather than ${\overline{M}}_{N,r}^{\,*}$ in practical computations.

Other advantages of ${\overline{M}}_{N,r}^{\,**}$ are that it is easily implementable and, as shown in Sect. 6, it can be successfully used for automatic Monte Carlo integration.

Our analysis has been so far restricted to one-dimensional integrals only. In a future work it will be extended and corresponding adaptive Monte Carlo algorithms will be constructed for multivariate integration, where randomization finds its major application. The current paper is the first step in this direction.

In the sequel, we use the following notation. For two functions of N we write $g_1(N)\lessapprox g_2(N)$ iff $\limsup _{N\rightarrow \infty }g_1(N)/g_2(N)\le 1,$ and we write $g_1(N)\approx g_2(N)$ iff $\lim _{N\rightarrow \infty }g_1(N)/g_2(N)=1.$ Similarly, for functions of $\varepsilon $ we write $h_1(\varepsilon )\lessapprox h_2(\varepsilon )$ iff $\limsup _{\varepsilon \rightarrow 0^+}h_1(\varepsilon )/h_2(\varepsilon )\le 1,$ and $h_1(\varepsilon )\approx h_2(\varepsilon )$ iff $\lim _{\varepsilon \rightarrow 0^+}h_1(\varepsilon )/h_2(\varepsilon )=1.$

2 Variance reduction using Lagrange interpolation

We first derive some general error estimates for the variance reduction algorithms of the form (1.4), where the standard Monte Carlo is applied for the error of piecewise Lagrange interpolation. Specifically, we divide the interval [a, b] into m subintervals using a partition $a=x_0<x_1<\cdots <x_m=b,$ and on each subinterval $[x_{j-1},x_j]$ we approximate f using Lagrange interpolation of degree $r-1$ with the interpolation points

$$\begin{aligned} x_{j,s}=x_{j-1}+z_s(x_j-x_{j-1}),\qquad 1\le s\le r, \end{aligned}$$

where

$$\begin{aligned} 0\le z_1<z_2<\cdots <z_r\le 1 \end{aligned}$$

(2.1)

are fixed (independent of the partition). Denote such an approximation by $L_{m,r}f.$ Then $f=L_{m,r}f+R_{m,r}f$ with $R_{m,r}f=f-L_{m,r}f.$ The integral Sf is finally approximated by

$$\begin{aligned} {\overline{M}}_{m,n,r}f\,=\,S(L_{m,r}f)+M_n(R_{m,r}f), \end{aligned}$$

where $M_n$ is the crude Monte Carlo (1.2). We obviously have ${\mathbb {E}}({\overline{M}}_{m,n,r}f)=Sf.$ Since

$$\begin{aligned} Sf-{\overline{M}}_{m,n,r}f\,=\,Sf-S(L_{m,r}f)-M_n(R_{m,r}f)\,=\,S(R_{m,r}f)-M_n(R_{m,r}f), \end{aligned}$$

by (1.3) we have

$$\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{m,n,r}f\big )^2\,=\, \frac{1}{n}\left( (b-a)S\big ((R_{m,r}f)^2\big )-\big (S(R_{m,r}f)\big )^2\right) . \end{aligned}$$

Note that

$$\begin{aligned} S\big ((R_{m,r}f)^2\big )\,=\,\int _a^b(f-L_{m,r}f)^2(x)\,\mathrm dx \,=\,\Vert f-L_{m,r}f\Vert _{L^2(a,b)}^2 \end{aligned}$$

is the squared $L^2$-error of the applied (piecewise) polynomial interpolation, while

$$\begin{aligned} S(R_{m,r}f)\,=\,\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\,=\,S(f)-S(L_{m,r}f) \end{aligned}$$

is the error of the quadrature ${\overline{Q}}_{m,r}f=S(L_{m,r}f).$

From now on we assume that f is not a polynomial of degree smaller than or equal to $r-1,$ since otherwise ${\overline{M}}_{m,n,r}f=Sf.$ Define the polynomial

$$\begin{aligned} P(z)=(z-z_1)(z-z_2)\cdots (z-z_r). \end{aligned}$$

(2.2)

We first consider the interpolation error $\Vert f-L_{m,r}f\Vert _{L^2(a,b)}.$ Let

$$\begin{aligned} \alpha \,=\,\Vert P\Vert _{L^2(0,1)}=\bigg (\int _0^1|P(z)|^2\mathrm dz\bigg )^{1/2}. \end{aligned}$$

(2.3)

For each j, the local interpolation error equals

$$\begin{aligned} \big \Vert f-L_{m,r}f\big \Vert _{L^2(x_{j-1},x_j)}= & {} \bigg (\int _{x_{j-1}}^{x_j}\bigl |\,(x-x_{j,1})\cdots (x-x_{j,r})f[x_{j,1},\ldots ,x_{j,r},x]\,\bigr |^2\mathrm dx\bigg )^{1/2} \\= & {} \alpha \,h_j^{r+1/2}\,\frac{|f^{(r)}(\xi _j)|}{r!},\qquad \qquad \xi _j\in [x_{j-1},x_j]. \end{aligned}$$

Hence

$$\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^2(a,b)}\,=\,\frac{\alpha }{r!}\bigg (\sum _{j=1}^m h_j^{2r+1}\big |f^{(r)}(\xi _j)\big |^2\bigg )^{1/2}. \end{aligned}$$

In particular, for the equispaced partition, in which case $h_j=(b-a)/m,$ we have

$$\begin{aligned} \big \Vert f-L_{m,r}f\big \Vert _{L^2(a,b)}= & {} \frac{\alpha }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r \bigg (\frac{b-a}{m}\sum _{j=1}^m|f^{(r)}(\xi _j)|^2\bigg )^{1/2} \\\approx & {} \frac{\alpha }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r\,\big \Vert f^{(r)}\big \Vert _{L^2(a,b)} \qquad \text{ as }\quad m\rightarrow +\infty . \end{aligned}$$

Now, we consider the quadrature error $Sf-{\overline{Q}}_{m,r}f.$ Let

$$\begin{aligned} \beta \,=\,\int _0^1 P(z)\,\mathrm dz. \end{aligned}$$

(2.4)

The local integration errors equal

$$\begin{aligned}{} & {} {\int _{x_{j-1}}^{x_j}(f-L_{m,r}f)(x)\,\mathrm dx = \int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f[x_{j,1},\ldots ,x_{j,r},x]\,\mathrm dx}\\{} & {} \quad =\frac{1}{r!}\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f^{(r)}(\xi _j(x))\,\mathrm dx, \qquad \xi _j(x)\in [x_{j-1},x_j]. \end{aligned}$$

Choose arbitrary $\zeta _j\in [x_{j-1},x_j]$ for $1\le j\le m.$ Then

$$\begin{aligned}{} & {} \bigg |\frac{1}{r!}\,\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f^{(r)}(\xi _j(x))\,\mathrm dx\,\\{} & {} \qquad -\frac{f^{(r)}(\zeta _j)}{r!}\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})\,\mathrm dx\bigg |\\{} & {} \quad = \frac{1}{r!}\,\bigg |\int _{x_{j-1}}^{x_j}(x-x_{j,1})\cdots (x-x_{j,r})\left( f^{(r)}(\xi _j(x))-f^{(r)}(\zeta _j)\right) \, \mathrm dx\bigg |\;\\{} & {} \quad \le \;\omega (h_j)\,\frac{h_j^{r+1}}{r!}\,\Vert P\Vert _{L^1(0,1)}, \end{aligned}$$

where $\omega $ is the modulus of continuity of $f^{(r)}.$ We also have

$$\begin{aligned} \frac{f^{(r)}(\zeta _j)}{r!}\int _{x_{j-1}}^{x_j}(x-x_{j,1})\cdots (x-x_{j,r})\,\mathrm dx\,=\, \frac{\beta }{r!}\,h_j^{r+1}f^{(r)}(\zeta _j). \end{aligned}$$

Hence $Sf-{\overline{Q}}_{m,r}f\,=\,X_m\,+\,Y_m,$ where

$$\begin{aligned} X_m \,=\, \frac{\beta }{r!}\,\sum _{j=1}^mh_j^{r+1}f^{(r)}(\zeta _j)\qquad \text{ and }\qquad |Y_m| \,\le \, \frac{\Vert P\Vert _{L^1(0,1)}}{r!}\sum _{j=1}^m\omega (h_j)h_j^{r+1}. \end{aligned}$$

In particular, for the equispaced partition,

$$\begin{aligned} X_m= & {} \frac{\beta }{r!}\,(b-a)^r\bigg (\sum _{j=1}^m\frac{b-a}{m}f^{(r)}(\zeta _j)\bigg )\,m^{-r},\\ |Y_m|\le & {} \frac{\Vert P\Vert _{L^1(0,1)}}{r!}\;\omega \bigg (\frac{b-a}{m}\bigg )(b-a)^{r+1}m^{-r}. \end{aligned}$$

Suppose that $\beta \ne 0$ and $\int _a^bf^{(r)}(x)\,\mathrm dx\ne 0.$ Then $X_m\approx \frac{\beta }{r!}(b-a)^r\Big (\int _a^bf^{(r)}(x) \mathrm dx\Big )m^{-r}.$ Since $\omega (h)$ goes to zero as $h\rightarrow 0^+,$ the component $X_m$ dominates $Y_m$ as $m\rightarrow +\infty .$ Hence

$$\begin{aligned} Sf-{\overline{Q}}_{m,r}f \,\approx \, \frac{\beta }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r\,\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg ) \qquad \text{ as }\quad m\rightarrow +\infty . \end{aligned}$$

On the other hand, if $\beta =0$ or $\int _a^bf^{(r)}(x)\,\mathrm dx=0$ then the quadrature error converges to zero faster than $m^{-r},$ i.e.

$$\begin{aligned} \lim _{m\rightarrow +\infty }\big (Sf-{\overline{Q}}_{m,r}f\big )\,m^r\,=\,0. \end{aligned}$$

Note that $\beta =0$ if and only if the quadrature ${\overline{Q}}_{m,r}$ has the degree of exactness at least r, i.e., it is exact for all polynomials of degree r or less. Obviously, the maximal degree of exactness equals $2r-1.$

We see that for the equidistant partition of the interval [a, b] the error $\big ({\mathbb {E}}(Sf-{\overline{M}}_{m,n,r}f)^2\big )^{1/2}$ is asymptotically proportional to

$$\begin{aligned} \phi (m,n)=n^{-1/2}m^{-r}, \end{aligned}$$

regardless of the choice of points $z_i$ in (2.1). Let us minimize $\phi (m,n)$ assuming the total number of points used is at most N. We have two cases depending on whether both endpoints of each subinterval are used in interpolation. If so, i.e., if $z_1=0$ and $z_r=1$ (in this case $r\ge 2$) then $N=(r-1)m+1+n.$ The optimal values are

$$\begin{aligned} m^*=\frac{2r(N-1)}{(r-1)(2r+1)},\qquad n^*=\frac{N-1}{2r+1}, \end{aligned}$$

(2.5)

for which

$$\begin{aligned} \phi (m^*,n^*)\,=\, \sqrt{2}\,\bigg (1-\frac{1}{r}\bigg )^r\bigg (\frac{r+1/2}{N}\bigg )^{r+1/2}. \end{aligned}$$

Otherwise we have $N=rm+n.$ The optimal values are

$$\begin{aligned} m^*=\frac{2N}{2r+1},\qquad n^*=\frac{N}{2r+1}, \end{aligned}$$

(2.6)

for which

$$\begin{aligned} \phi (m^*,n^*)\,=\,\sqrt{2}\,\bigg (\frac{r+1/2}{N}\bigg )^{r+1/2}. \end{aligned}$$

Denote by ${\overline{M}}_{N,r}$ the corresponding algorithm with the equidistant partition, where for given N the values of n and m equal correspondingly $\lfloor n^*\rfloor $ and $\lfloor m^*\rfloor .$ Our analysis is summarized in the following theorem.

Theorem 2.1

We have as $N\rightarrow +\infty $ that

$$\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}_{N,r}f\big )^2}\;\approx \; c_r\,(b-a)^r\,C(P,f)\,N^{-(r+1/2)}, \end{aligned}$$

where

$$\begin{aligned} C(P,f)=\sqrt{\alpha ^2\,(b-a)\bigg (\int _a^b\big |f^{(r)}(x)\big |^2\mathrm dx\bigg )\,-\, \beta ^2\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg )^2}, \end{aligned}$$

$\alpha $ and $\beta $ are given by (2.3) and (2.4), and

$$\begin{aligned} c_r=\left\{ \begin{array}{ll}\sqrt{2}\,\big (1-\frac{1}{r}\big )^r\frac{(r+1/2)^{r+1/2}}{r!},&{} \quad \text{ if }\quad r\ge 2,\,z_1=0,\,z_r=1,\\ \ \sqrt{2}\,\frac{(r+1/2)^{r+1/2}}{r!},&{} \quad \text{ otherwise }.\end{array}\right. \end{aligned}$$

(2.7)

We add that the algorithm ${\overline{M}}_{N,r}$ is fully implementable since we assume that we have access to function evaluations at points from [a, b].

3 First adaptive algorithm

Now we add a stratification strategy to our algorithm of Theorem 2.1 to obtain an adaptive algorithm with a much better asymptotic constant. That is, we divide the initial interval [a, b] into k equal length subintervals $I_i, \ 1\le i \le k,$ and on each subinterval we apply the approximation of Theorem 2.1 with some $N_i,$ where

$$\begin{aligned} \sum _{i=1}^k N_i\le N. \end{aligned}$$

(3.1)

Denote such an approximation by ${\overline{M}}_{N,k,r}.$ (Note that ${\overline{M}}_{N,r}={\overline{M}}_{N,1,r}$.) Then, by Theorem 2.1, for fixed k we have as all $N_i\rightarrow +\infty $ that

$$\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}_{N,k,r}f\big )^2}\;\approx \;c_r h^r \bigg (\sum _{i=1}^k\frac{C_i^2}{N_i^{2r+1}}\bigg )^{1/2}, \end{aligned}$$

where

$$\begin{aligned} C_i=C_i(P,f)=\sqrt{\alpha ^2\,h\,\int _{I_i}\big |f^{(r)}(x)\big |^2\mathrm dx-\beta ^2\, \left( \int _{I_i}f^{(r)}(x)\,\mathrm dx\right) ^2},\qquad h=\frac{b-a}{k}. \end{aligned}$$

Minimization of $\psi (N_1,\ldots ,N_k)=\left( \sum _{i=1}^kC_i^2N_i^{-(2r+1)}\right) ^{1/2}$ with respect to (3.1) gives

$$\begin{aligned} N_i^*\,=\,\frac{C_i^{1/(r+1)}}{\sum _{j=1}^k C_j^{1/(r+1)}}\,N,\qquad 1\le i\le k, \end{aligned}$$

and then

$$\begin{aligned} \psi (N_1^*,\ldots ,N_k^*)= \bigg (\sum _{i=1}^k C_i^{1/(r+1)}\bigg )^{r+1}N^{-(r+1/2)}. \end{aligned}$$

Let $\xi _i,\eta _i\in I_i$ satisfy $\int _{I_i}\big |f^{(r)}(x)\big |^2\mathrm dx=h\big |f^{(r)}(\xi _i)\big |^2$ and $\int _{I_i}f^{(r)}(x)\,\mathrm dx=hf^{(r)}(\eta _i).$ Then

$$\begin{aligned} C_i=h\sqrt{\alpha ^2|f^{(r)}(\xi _i)|^2-\beta ^2|f^{(r)}(\eta _i)|^2} \end{aligned}$$

and we have as $k\rightarrow +\infty $ that

$$\begin{aligned} \bigg (\sum _{i=1}^kC_i^{1/(r+1)}\bigg )^{r+1}= & {} h\,\bigg (\sum _{i=1}^k \big (\alpha ^2|f^{(r)}(\xi _i)|^2-\beta ^2|f^{(r)}(\eta _i)|^2\big )^\frac{1}{2(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h\,(\alpha ^2-\beta ^2)^{1/2}\bigg (\sum _{i=1}^k\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h^{-r}(\alpha ^2-\beta ^2)^{1/2}\bigg (\sum _{i=1}^kh\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h^{-r}(\alpha ^2-\beta ^2)^{1/2}\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}. \end{aligned}$$

It is clear that we have to take $N_i$ to be an integer and at least r, for instance

$$\begin{aligned} N_i=\left\lfloor N_i^*\left( 1-\frac{kr}{N}\right) +r\right\rfloor ,\qquad 1\le i\le k. \end{aligned}$$

Then the corresponding number $m_i$ of subintervals and number $n_i$ of random points in $I_i$ can be chosen as

$$\begin{aligned} m_i=\max \left( \lfloor m_i^*\rfloor ,1\right) ,\qquad n_i=\lfloor n_i^*\rfloor , \end{aligned}$$

where $m_i^*$ and $n_i^*$ are given by (2.5) and (2.6) with N replaced by $N_i.$

Denote by ${\overline{M}}^{\,*}_{N,r}$ the above constructed approximation ${\overline{M}}_{N,k_N,r}$ with $k_N$ such that $k_N\rightarrow +\infty $ and $k_N/N\rightarrow 0$ as $N\rightarrow +\infty .$ For instance, $k_N=N^\kappa $ with $0<\kappa <1.$ Our analysis gives the following result.

Theorem 3.1

We have as $N\rightarrow +\infty $ that

$$\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}^{\,*}_{N,r}f\big )^2}\,\approx \, c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}. \end{aligned}$$

The asymptotic constant of the approximation ${\overline{M}}^{\,*}_{N,r}$ of Theorem 3.1 is never worse than that of ${\overline{M}}_{N,r}$ of Theorem 2.1. Indeed, comparing both constants we have

$$\begin{aligned}{} & {} c_r(b-a)^r\sqrt{\alpha ^2\,(b-a)\bigg (\int _a^b\big |f^{(r)}(x)\big |^2\mathrm dx\bigg )\,-\, \beta ^2\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg )^2}\\{} & {} \qquad \ge \;c_r\sqrt{\alpha ^2-\beta ^2}\,(b-a)^{r+1/2} \bigg (\int _a^b\big |f^{(r)}(x)\big |^2\bigg )^{1/2}\mathrm dx\\{} & {} \qquad \ge \,c_r\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}, \end{aligned}$$

where the first inequality follows from the Schwarz inequality and the second one from Hölder’s inequality for integrals. As shown in the introduction, the gain can be significant, especially when the derivative $f^{(r)}$ drastically changes.

The approximation ${\overline{M}}^{\,*}_{N,r}$ possesses good asymptotic properties, but is not feasible since we do not have a direct access to the $C_i$s. In a feasible implementation one can approximate $C_i$ using divided differences, i.e.,

$$\begin{aligned} {\widetilde{C}}_i=h\sqrt{\alpha ^2-\beta ^2}\,|d_i|\,r!\,\qquad \text{ where }\qquad d_i=f[x_{i,0},x_{i,1},\ldots ,x_{i,r}] \end{aligned}$$

(3.2)

and $x_{i,j}$ are arbitrary points from $I_i.$ Then

$$\begin{aligned} N_i^*=\frac{|d_i|^{1/(r+1)}}{\sum _{j=1}^{k_N}|d_j|^{1/(r+1)}}\,N. \end{aligned}$$

This works well for functions f for which the rth derivative does not nullify at any point in [a, b]. Indeed, then $f^{(r)}$ does not change its sign and, moreover, it is separated away from zero. This means that

$$\begin{aligned} \lim _{N\rightarrow \infty }\,\max _{1\le i\le k_N}\,{C_i}/{{\widetilde{C}}_i}=1, \end{aligned}$$

which is enough for the asymptotic equality of Theorem 3.1 to hold true.

If $f^{(r)}$ is not separated away from zero then we may have problems with proper approximations of $C_i$ in the intervals $I_i$ where $|f^{(r)}|$ assumes extremely small values or even zeros. A possible and simple remedy is to choose ‘small’ $\Delta >0$ and modify ${\widetilde{C}}_i$ as follows:

$$\begin{aligned} {\widetilde{C}}_i=\left\{ \begin{array}{rl} h\,\sqrt{\alpha ^2-\beta ^2}\,|d_i|\,r! &{}\;\,\text{ for }\;|d_i|r!\ge \Delta ,\\ h\,\alpha \,\Delta \,r! &{}\;\,\text{ for }\;|d_i|r!<\Delta .\end{array}\right. \end{aligned}$$

(3.3)

Then, letting $A_1=\big \{a\le x\le b:\,|f^{(r)}(x)|\ge \Delta \big \}$ and $A_2=[a,b]\setminus A_1,$ we have as $k\rightarrow +\infty $ that

Hence, the approximation of $C_i$ by (3.3) results in an algorithm whose error is approximately upper bounded by

$$\begin{aligned} c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}$$

where

$$\begin{aligned} \big |f_\Delta ^{(r)}(x)\big |=\max \bigg (\big |f^{(r)}(x)\big |, \sqrt{\tfrac{\alpha ^2}{\alpha ^2-\beta ^2}}\,\Delta \bigg ). \end{aligned}$$

(3.4)

We obviously have $\lim _{\Delta \rightarrow 0^+}\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx =\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx.$

A closer look at the deterministic part of ${\overline{M}}^{\,*}_{N,r}$ shows that the final partition of the interval [a, b] tends to equalize the $L^1$ errors in all of the m subintervals. As shown in [11], such a partition is optimal in the sense that it minimizes the asymptotic constant in the error $\Vert f-L_{m,r}f\Vert _{L^1(a,b)}$ among all possible piecewise Langrange interpolations $L_{m,r}$. A disadvantage of the optimal partition is that it is not nested. This makes it necessary to start the computations from scratch when N is updated to a higher value. Also, a proper choice of the sequence $k_N=N^\kappa $ is problematic, especially when N is still relatively small. On one hand, the larger $\kappa $ the better the approximation of $C_i$ by ${\widetilde{C}}_i,$ but also the more far away the partition from the optimal one. On the other hand, the smaller $\kappa $ the closer the partition to the optimal one, but also the worse the approximation of $C_i.$ This trade–off significantly affects the actual behavior of the algorithm, which can be seen in numerical experiments of Sect. 5.

In the following section, we propose another adaptive approach leading to an easily implementable algorithm that produces nested partitions close to optimal and possesses asymptotic properties similar to that of ${\overline{M}}^{\,*}_{N,r}.$ As we shall see in Sect. 6, nested partitions are vital for automatic Monte Carlo integration.

4 Second adaptive algorithm

Consider a $\rho $-weighted integration problem

$$\begin{aligned} S_\rho f=\int _a^bf(x)\rho (x)\,\mathrm dx, \end{aligned}$$

where the function $\rho :[a,b]\rightarrow {\mathbb {R}}$ is integrable and positive a.e. and $\int _a^b\rho (x)\,\mathrm dx=1.$ The corresponding Monte Carlo algorithm is

$$\begin{aligned} M_{n,\rho }f=\frac{1}{n}\sum _{i=1}^nf(t_i),\qquad t_i{\mathop {\sim }\limits ^{iid}}\mu _\rho , \end{aligned}$$

where $\mu _\rho $ is the probability distribution on [a, b] with density $\rho .$ Then

$$\begin{aligned} {\mathbb {E}}(S_\rho f-M_{n,\rho }f)^2=\frac{1}{n}\big (S_\rho (f^2)-(S_\rho f)^2\big ). \end{aligned}$$

Now, the non-weighted integral (1.1) can be written as

$$\begin{aligned} S(f)=\int _a^b h(x)\rho (x)\,\mathrm dx=S_\rho (h), \quad \text{ where }\quad h(x)=\frac{f(x)}{\rho (x)}. \end{aligned}$$

Then

$$\begin{aligned} {\mathbb {E}}(Sf-M_{n,\rho }h)^2={\mathbb {E}}(S_\rho h-M_{n,\rho }h)^2 =\frac{1}{n}\big (S_\rho (h^2)-(S_\rho h)^2\big )=\frac{1}{n}\big (S_\rho (f/\rho )^2-(Sf)^2\big ). \end{aligned}$$

Let’s go further on and apply a variance reduction,

$$\begin{aligned} {\overline{M}}_{n,\rho }f=S(Lf)+M_{n,\rho }\bigg (\frac{f-Lf}{\rho }\bigg ), \end{aligned}$$

(4.1)

where Lf is an approximation to f. Then

$$\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{n,\rho }f\big )^2= \frac{1}{n}\left( \int _a^b\frac{(f-Lf)^2(x)}{\rho (x)}\,\mathrm dx- \bigg (\int _a^b (f-Lf)(x)\,\mathrm dx\bigg )^2\right) . \end{aligned}$$

The question is how to choose L and $\rho $ to make the quantity

$$\begin{aligned} \int _a^b\frac{(f-Lf)^2(x)}{\rho (x)}\,\mathrm dx- \bigg (\int _a^b (f-Lf)(x)\,\mathrm dx\bigg )^2 \end{aligned}$$

as small as possible.

Observe that if

$$\begin{aligned} \rho (x)=\frac{|(f-Lf)(x)|}{\Vert f-Lf\Vert _{L^1(a,b)}} \end{aligned}$$

then

$$\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{n,\rho }f\big )^2= \frac{1}{n}\left( \Vert f-Lf\Vert _{L^1(a,b)}^2-\bigg (\int _a^b(f-Lf)(x)\mathrm dx\bigg )^2\right) \end{aligned}$$

and this error is even zero if $(f-Lf)(x)$ does not change its sign. This suggests the following algorithm.

Suppose that $Lf=L_{m,r}f$ is based on a partition of [a, b] such that the $L^1$ errors in all m subintervals $I_i$ have the same value, i.e.,

$$\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^1(I_i)}=\frac{1}{m}\,\Vert f-L_{m,r}f\Vert _{L^1(a,b)},\quad 1\le i\le m. \end{aligned}$$

(4.2)

Then we apply the variance reduction (4.1) with density

$$\begin{aligned} \rho (x)=\frac{1}{mh_i},\qquad x\in I_i,\quad 1\le i\le m, \end{aligned}$$

(4.3)

where $h_i$ is the length of $I_i.$ That is, for the corresponding probability measure $\mu _\rho $ we have $\mu _\rho (I_i)=\tfrac{1}{m}$ and the conditional distribution $\mu _\rho (\cdot |I_i)$ is uniform on $I_i.$

We now derive an error formula for such an approximation. Let $\gamma =\Vert P\Vert _{L^1(a,b)}=\int _0^1|P(z)|\,\mathrm dz.$ (Recall that P is given by (2.2).) We have

$$\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^1(I_i)}=\frac{\gamma }{r!}\,h_i^{r+1}\big |f^{(r)}(\xi _i)\big |\quad \text{ and }\quad \Vert f-L_{m,r}f\Vert _{L^2(I_i)}=\frac{\alpha }{r!}\,h_i^{r+1/2}\big |f^{(r)}(\zeta _i)\big | \end{aligned}$$

for some $\xi _i,\zeta _i\in I_i.$ Denoting

$$\begin{aligned} A=h_i^{r+1}\big |f^{(r)}(\xi _i)\big | \end{aligned}$$

(4.4)

(which is independent of i) we have as $m\rightarrow +\infty $ that

$$\begin{aligned}{} & {} {\bigg (\int _a^b\frac{(f-L_{m,r}f)^2(x)}{\rho (x)}\,\mathrm dx\bigg )^{1/2}\;=\; \bigg (m\sum _{i=1}^mh_i\int _{I_i}(f-L_{m,r}f)^2(x)\,\mathrm dx\bigg )^{1/2}} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^m h_i^{2r+2}\big |f^{(r)}(\zeta _i)\big |^{2}\bigg )^{1/2} \,\approx \,\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^m h_i^{2r+2}\big |f^{(r)}(\xi _i)\big |^2\bigg )^{1/2} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^mA^2\bigg )^{1/2}\,=\,\frac{\alpha }{r!}\,mA\,=\, \frac{\alpha }{r!}\,\big (mA^{1/(r+1)}\big )^{r+1}m^{-r} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (\sum _{i=1}^m h_i\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}m^{-r} \;\approx \;\frac{\alpha }{r!}\,\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}m^{-r}. \end{aligned}$$

To get an asymptotic formula for $\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx$ we use the analysis done in Sect. 2. If $\beta =0$ then the integral decreases faster than $m^{-r}.$ Let $\beta \ne 0.$ Then

$$\begin{aligned} \int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\;\approx \; \frac{\beta }{r!}\,\sum _{i=1}^m h_i^{r+1}f^{(r)}(\xi _i)=\frac{\beta }{r!}(m_+-m_-)A, \end{aligned}$$

where $\xi _i$s are as in (4.4), and $m_+$ and $m_-$ are the numbers of indexes i for which $f^{(r)}(\xi _i)\ge 0$ and $f^{(r)}(\xi _i)<0,$ respectively. Let

$$\begin{aligned} D_+=\{x\in [a,b]:\,f^{(r)}(x)\ge 0\},\qquad D_-=\{x\in [a,b]:\,f^{(r)}(x)<0\}. \end{aligned}$$

Since $A\approx \Vert f^{(r)}\Vert _{L^{1/(r+1)}(a,b)}m^{-(r+1)}$ and $m_+A^{1/(r+1)}\approx \int _{D_+}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx,$ we have

$$\begin{aligned} m_+\approx \frac{\int _{D_+}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}{\int _{D_+\cup D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx},\qquad m_-\approx \frac{\int _{D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}{\int _{D_+\cup D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}. \end{aligned}$$

Thus

$$\begin{aligned} \int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\approx \frac{\beta }{r!} \bigg (\frac{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn} f^{(r)}(x)\,\mathrm dx}{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}\bigg )\Vert f^{(r)}\Vert _{L^{1/(r+1)}(a,b)}\,m^{-r} \end{aligned}$$

provided $\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn} f^{(r)}(x)\,\mathrm dx\ne 0,$ and otherwise the convergence is faster than $m^{-r}.$

Our analysis above shows that if m and n are chosen as in (2.5) and (2.6) then the error of the described algorithm asymptotically equals (as $N\rightarrow +\infty $)

$$\begin{aligned}{} & {} c_r\,\sqrt{\alpha ^2-\beta ^2 \bigg (\frac{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn}f^{(r)}(x)\,\mathrm dx}{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}\bigg )^2} \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} N^{-(r+1/2)}.\nonumber \\ \end{aligned}$$

(4.5)

The factor at $\beta ^2$ in (4.5) can be easily replaced by 1 with the help of stratified sampling. Indeed, instead of randomly sampling n times with density (4.3) on the whole interval [a, b], one can apply the same sampling strategy independently on k groups $G_j$ of subintervals. Each group consists of $s=m/k$ subintervals,

$$\begin{aligned} G_j=\bigcup _{\ell =1}^s I_{(j-1)s+\ell },\qquad 1\le j\le k, \end{aligned}$$

and the number of samples for each $G_j$ equals n/k. As in the algorithm ${\overline{M}}^{\,*}_{N,r},$ we combine $k=k_N$ and N in such a way that $k_N\rightarrow +\infty $ and $k_N/N\rightarrow 0$ as $N\rightarrow \infty .$ Then the total number of points used in each $G_j$ is $N_j=N/k.$ Denoting

$$\begin{aligned} C_j=\bigg (\int _{G_j}\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} =\bigg (\frac{1}{k}\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} \end{aligned}$$

and using the fact that the factor at $\beta ^2$ equals 1 if $f^{(r)}$ does not change its sign, the error of such an approximation asymptotically equals

$$\begin{aligned} c_r\,\sqrt{\alpha ^2-\beta ^2}\,\bigg (\sum _{j=1}^k\frac{C_j^2}{N_j^{2r+1}}\bigg )^{1/2} =c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}$$

as claimed. (Note that $N_j=N/k$ minimize the sum $\sum _{j=1}^kC_j^2N_j^{-(2r+1)}$ with respect to $\sum _{j=1}^kN_j=N;$ compare with the analysis in Sect. 3.)

Thus we obtained exactly the same error formula as in Theorem 3.1 for ${\overline{M}}_{N,r}^{\,*}.$

It remains to show a feasible construction of a nested partition that is close to the one satisfying (4.2). To that end, we utilize the iterative method presented in [11], where the $L^p$ error of piecewise Lagrange interpolation is examined.

We first consider the case when

$$\begin{aligned} f^{(r)}>0\quad \text{ or }\quad f^{(r)}<0. \end{aligned}$$

(4.6)

In the following construction, we use a priority queue ${\mathscr {S}}$ whose elements are subintervals. For each subinterval $I_i$ of length $h_i,$ its priority is given as

$$\begin{aligned} p_f(I_i)\,=\,h_i^{r+1}|d_i|, \end{aligned}$$

where $d_i$ is the divided difference (3.2). In the following pseudocode, $\textrm{insert}({\mathscr {S}},I)$ and $I:=\mathrm {extract\_max}({\mathscr {S}})$ implement correspondingly the actions of inserting an interval to ${\mathscr {S}},$ and extracting from ${\mathscr {S}}$ an interval with the highest priority.

$\textbf{algorithm}\;\textrm{PARTITION}$

${\mathscr {S}}=\emptyset ;\;\textrm{insert}({\mathscr {S}},[a,b]);$

$\textbf{for}\; k=2:m$

$\quad [l,r]=\mathrm {extract\_max}({\mathscr {S}});$

$\quad c=(l+r)/2;$

$\quad \textrm{insert}({\mathscr {S}},[l,c]); \textrm{insert}({\mathscr {S}},[c,r]);$

$\textbf{endfor}$

After execution, the elements of ${\mathscr {S}}$ form a partition into m subintervals $I_i.$ Note that if the priority queue is implemented through a heap then the running time of $\textrm{PARTITION}$ is proportional to $m\log m.$

Denote by ${\overline{M}}^{\,**}_{N,r}$ the corresponding algorithm that uses the above nested partition and density (4.3), and N is related to the number m of subintervals and the number n of random samples as in (2.5) and (2.6). We want to see how much worse is this algorithm than that using the (not nested) partition (4.2).

Let $A=(A_1,A_2,\ldots ,A_m)$ with

$$\begin{aligned} A_i=p_f(I_i)\,r!=h_i^{r+1}\big |f^{(r)}(\omega _i)\big |,\qquad \omega _i\in I_i, \end{aligned}$$

and $\Vert A\Vert _p=\big (\sum _{i=1}^mA_i^p\big )^{1/p}.$ For the corresponding piecewise Lagrange approximation $L_{m,r}f$ and density $\rho $ given by (4.3) we have

$$\begin{aligned}{} & {} {\int _a^b\frac{(f-L_{m,r}f)^2(x)}{\rho (x)}\,\mathrm dx \,-\,\bigg (\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\bigg )^2}\\{} & {} \quad \approx \,\frac{1}{(r!)^2}\bigg (\alpha ^2 m\sum _{i=1}^m A_i^2 -\beta ^2\bigg (\sum _{i=1}^mA_i\bigg )^2\,\bigg )\,=\,\frac{1}{(r!)^2} \left( \,\alpha ^2m\Vert A\Vert _2^2-\beta ^2\Vert A\Vert _1^2\,\right) . \end{aligned}$$

We also have $\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}\approx \big (\sum _{i=1}^mA_i^{1/(r+1)}\big )^{r+1}=\Vert A\Vert _{\frac{1}{r+1}}.$ Hence

$$\begin{aligned} \sqrt{{\mathbb {E}}(Sf-{\overline{M}}^{**}_{N,r}f)^2}\approx & {} \left( \,\alpha ^2m\Vert A\Vert _2^2-\beta ^2\Vert A\Vert _1^2\,\right) ^{1/2}n^{-1/2}m^{-r}\\\approx & {} K_{m,r}(A)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}$$

where

$$\begin{aligned} K_{m,r}(A)=\frac{\sqrt{\kappa _\alpha ^2\,m\,\Vert A\Vert _2^2 -\kappa _\beta ^2\,\Vert A\Vert _1^2}}{\Vert A\Vert _{\frac{1}{r+1}}}\,m^r, \qquad \kappa _\alpha =\frac{\alpha }{\sqrt{\alpha ^2-\beta ^2}}, \quad \kappa _\beta =\frac{\beta }{\sqrt{\alpha ^2-\beta ^2}}. \end{aligned}$$

Observe that halving an interval results in two subintervals whose priorities are asymptotically (as $m\rightarrow +\infty $) $2^{r+1}$ times smaller than the priority of the original interval. This means that $K_{m,r}(A)$ is asymptotically not larger than

$$\begin{aligned} K^*(r)\;=\;\limsup _{m\rightarrow \infty }\;\max \, \left\{ K_{m,r}(A):\;A=(A_1,\ldots ,A_m),\,\max _{1\le i,j\le m}\frac{A_i}{A_j}\le 2^{r+1}\,\right\} . \nonumber \\ \end{aligned}$$

(4.7)

Thus we obtained the following result.

Theorem 4.1

If the function f satisfies (4.6) then we have as $N\rightarrow +\infty $ that

$$\begin{aligned} \sqrt{{\mathbb {E}}(Sf-{\overline{M}}^{\,**}_{N,r}f)^2}\,\lessapprox \,K^*(r)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}$$

where $K^*(r)$ is given by (4.7).

We numerically calculated $K^*(r)$ in some special cases. For instance, if the points $z_i$ in (2.1) are equispaced, $z_i=(i-1)/(r-1),$ $1\le i\le r,$ then for $r=2,3,4,5,6$ we correspondingly have

$$\begin{aligned} K^*(r)\,=\,4.250,\;3.587,\;7.077,\;11.463,\;23.130, \end{aligned}$$

while for any $z_i$s satisfying $\beta =\int _0^1(z-z_1)\cdots (z-z_r)\,\mathrm dz=0$ we have

$$\begin{aligned} K^*(r)\,=\,2.138,\;3.587,\;6.323,\;11.463,\;21.140. \end{aligned}$$

If f does not satisfy (4.6) then the algorithm ${\overline{M}}^{**}_{N,r}f$ may fail. Indeed, it may happen that $p_f(I_i)=0$ while $f^{(r)}\not =0$ in $I_i.$ Then this subinterval may never be further subdivided. In this case, we can repeat the same construction, but with the modified priority

$$\begin{aligned} p_f(I_i)\,=\,h_i^{r+1}\max \big (\,|d_i|,\,\Delta /r!\big ), \end{aligned}$$

where $\Delta >0.$ Then the error is asymptotically upper bounded by

$$\begin{aligned} K^*(r)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}$$

where $\big |f_\Delta ^{(r)}(x)\big |$ is given by (3.4).

5 Numerical experiments

In this section, we present results of two numerical experiments that illustrate the performance of the nonadaptive Monte Carlo algorithm ${\overline{M}}_{N,r}$ and the adaptive algorithms ${\overline{M}}^{\,*}_{N,r}$ and ${\overline{M}}^{**}_{N,r}.$ Our test integral is

$$\begin{aligned} \int _0^1\frac{1}{x+10^{-4}}\,\mathrm dx. \end{aligned}$$

Since for $r\in {\mathbb {N}}$ we have $(-1)^r f^{(r)}>0,$ the parameter $\Delta $ is set to zero.

The three algorithms are verified for $r=2$ and $r=4$. In both cases, the interpolation nodes are equispaced, i.e., in (2.1) we take

$$\begin{aligned} z_i =\frac{i-1}{r-1},\qquad 1\le i \le r. \end{aligned}$$

In addition, for the first adaptive algorithm ${\overline{M}}_{N,r}^{\,*}$ we take $k_N = N^\kappa $ with $\kappa = 0.8.$ This exponent was chosen to ensure a trade–off as per our discussion in Sect. 3, and some empirical results. Also, for a fixed N we plot a single output instead of the expected value estimator. Therefore the error fluctuations are visible. For completeness, we also show the asymptotes corresponding to the theoretical errors from Theorems 2.1 and 3.1, and the upper bound from Theorem 4.1. The scale is logarithmic, $-\log _{10}(\textrm{error})$ versus $\log _{10}N.$

The results for $r=2$ are presented in Fig. 1.

As it can be observed, both adaptive algorithms significantly outperform the nonadaptive MC; however, the right asymptotic behaviour of the first adaptive algorithm is visible only for large N.

Similar conclusions can be inferred from validation performed for $r=4,$ with all other parameters unchanged. We add that the results for N larger than $10^{4.8}$ are not illustrative any more, since the process is disturbed by a serious reduction of significant digits when calculating divided differences in the partition part (Fig. 2).

Notably, both adaptive algorithms attain their asymptotic errors, but this is not the case for nonadaptive MC for which the output is not stable. Initially, the first adaptive algorithm does not leverage additional sampling since for all intervals $I_i$ we have $N_i/(2r+1)< 1.$ The Monte Carlo adjustments are visible only for $N \ge 10^3$ and the error tends to the theoretical asymptote.

In conclusion, the numerical experiments confirm our theoretical findings and, in particular, superiority of the second adaptive algorithm ${\overline{M}}^{\,**}_{N,r}.$

6 Automatic Monte Carlo integration

We now use the results of Sect. 4 for automatic Monte Carlo integration. The goal is to construct an algorithm that for given $\varepsilon >0$ and $0<\delta <1$ returns an $\varepsilon $-approximation to the integral Sf with probability at least $1-\delta ,$ asymptotically as $\varepsilon \rightarrow 0^+.$ To that end, we shall use the approximation ${\overline{M}}_{N,r}^{\,**}f$ with N determined automatically depending on $\varepsilon $ and $\delta .$

Let $X_i$ for $1\le i\le n$ be independent copies of the random variable

$$\begin{aligned} X=S(f-L_{m,r}f)-\frac{(f-L_{m,r}f)(t)}{\rho (t)},\qquad t\sim \mu _{\rho }, \end{aligned}$$

where $L_{m,r},$ n and $\rho $ are as in ${\overline{M}}_{N,r}^{\,**}f.$ Then ${\mathbb {E}}(X)=0$ and

$$\begin{aligned} Sf-{\overline{M}}_{N,r}^{\,**}f=\frac{X_1+X_2+\cdots +X_n}{n}. \end{aligned}$$

By Hoeffding’s inequality [5] we have

$$\begin{aligned} \textrm{Prob}\left( \big |Sf-{\overline{M}}_{N,r}^{\,**}f\big |>\varepsilon \right) \,\le \, 2\,\exp \left( \frac{-\varepsilon ^2n}{2\,B_m^2}\right) , \end{aligned}$$

where $B_m=\max _{a\le t\le b}|X(t)|.$ Hence we fail with probability at most $\delta $ if

$$\begin{aligned} \frac{\varepsilon ^2n}{2B_m^2}\,\ge \,\ln \frac{2}{\delta }. \end{aligned}$$

(6.1)

Now we estimate $B_m.$ Let $\lambda =\Vert P\Vert _{L^\infty (0,1)}=\max _{0\le t\le 1}|P(t)|,$ and

$$\begin{aligned} {\mathscr {L}}_r(f)=\bigg (\int _a^b\big |f^{(r)}_\Delta (x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}, \end{aligned}$$

where $\Delta =0$ if $f^{(r)}>0$ or $f^{(r)}<0,$ and $\Delta >0$ otherwise. Let $A=(A_1,A_2,\ldots ,A_m)$ with

$$\begin{aligned} A_i=h_i^{r+1}\max _{x\in I_i}|f^{(r)}(x)|,\quad 1\le i\le m, \end{aligned}$$

where, as before, $\{I_i\}_{i=1}^m,$ is the partition used by ${\overline{M}}_{N,r}^{\,**}f$ and $h_i$ is the length of $I_i.$ Since $\Vert A\Vert _\frac{1}{r+1}=\big (\sum _{i=1}^mA_i^{1/(r+1)}\big )^{r+1}\lessapprox {\mathscr {L}}_r(f),\;$ for $x\in I_i$ we have

$$\begin{aligned} \frac{\big |f(x)-L_{m,r}f(x)\big |}{\rho (x)}\,\le \,\frac{\lambda }{r!}\,m\,A_i\,\lessapprox \, \frac{\lambda }{r!}\,\bigg (\frac{m\,\Vert A\Vert _\infty }{\Vert A\Vert _\frac{1}{r+1}}\bigg ){\mathscr {L}}_r(f) \,\lessapprox \,2^{r+1}\frac{\lambda }{r!}\,{\mathscr {L}}_r(f)\,m^{-r}. \end{aligned}$$

We have the same upper bound for $S(f-L_{m,r}f)$ since by mean-value theorem

$$\begin{aligned} S(f-L_{m,r}f)=\int _a^b\frac{(f-L_{m,r}f)(x)}{\rho (x)}\,\rho (x)\,\mathrm dx =\frac{(f-L_{m,r}f)(\xi )}{\rho (\xi )},\qquad \xi \in [a,b]. \end{aligned}$$

Hence

$$\begin{aligned} B_m\,\lessapprox \,2^{r+2}\frac{\lambda }{r!}\,{\mathscr {L}}_r(f)\,m^{-r}. \end{aligned}$$

Using the above inequality and the fact that $\sqrt{n}\,m^r\approx N^{r+1/2}/(c_rr!)$ with $c_r$ given by (2.7), we get

$$\begin{aligned} \frac{\varepsilon ^2n}{2B_m^2}\gtrapprox \left( \frac{\varepsilon \,N^{r+1/2}}{{\hat{c}}_r\,{\mathscr {L}}_r(f)}\right) ^{\!\!2},\quad \text{ where }\quad {\hat{c}}_r=2^{r+5/2}\lambda c_r. \end{aligned}$$

The last inequality and (6.1) imply that we fail to have error $\varepsilon $ with probability at most $\delta $ for

$$\begin{aligned} N\,\gtrapprox \,\left( {\hat{c}}_r\,{\mathscr {L}}_r(f)\,\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\right) ^{\frac{1}{r+1/2}}, \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}$$

(6.2)

Now the question is how to obtain the random approximation ${\overline{M}}_{N,r}^{\,**}f$ for N satisfying (6.2).

One possibility is as follows. We first execute the iteration $\textbf{for}$ in the algorithm $\textrm{PARTITION}$ of Sect. 4 for $k=2:m,$ where m satisfies $\lim _{\varepsilon \rightarrow 0^+}m\,\varepsilon ^{\frac{1}{r+1/2}}=0,$ e.g.,

$$\begin{aligned} m=\left\lfloor \bigg (\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\bigg )^\frac{1}{r+1}\right\rfloor . \end{aligned}$$

Let $\{I_i\}_{i=1}^m$ be the obtained partition. Then we replace ${\mathscr {L}}_r(f)$ in (6.2) by its asymptotic equivalent

$$\begin{aligned} \widetilde{{\mathscr {L}}}_r(f)=\bigg (\sum _{i=1}^{m}p_f(I_i)^\frac{1}{r+1}\bigg )^{r+1}, \end{aligned}$$

(6.3)

set

$$\begin{aligned} N_\varepsilon =\left\lfloor \left( {\hat{c}}_r\,\widetilde{{\mathscr {L}}}_r(f)\,\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\right) ^{\frac{1}{r+1/2}}\right\rfloor , \end{aligned}$$

(6.4)

and continue the iteration for $k=m+1:m_\varepsilon ,$ where $m_\varepsilon $ is the number of subintervals corresponding to $N_\varepsilon .$ Finally, we complete the algorithm by $n_\varepsilon $ random samples.

Denote the final randomized approximation by ${\mathscr {A}}_{\varepsilon ,\delta }f.$ Then we have ${\mathscr {A}}_{\varepsilon ,\delta }f={\overline{M}}_{N_\varepsilon ,r}^{\,**}f$ and

$$\begin{aligned} \textrm{Prob}\big (\,\big |Sf-{\mathscr {A}}_{\varepsilon ,\delta }f|>\varepsilon \big )\,\lessapprox \,\delta , \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}$$

A disadvantage of the above algorithm is that it uses a priority queue and therefore its total running time is proportional to $N\log N.$ It turns out that by using recursion the running time can be reduced to N.

A crucial component of the algorithm with the running time proportional to N is the following recursive procedure, in which ${\mathscr {S}}$ is a set of intervals.

$\textbf{procedure}\;\textrm{AUTO}\,(f,a,b,e)$

$\textbf{if}\;p_f([a,b])\le e$

$\quad \textrm{insert}({\mathscr {S}},[a,b]);$

$\textbf{else}$

$\quad c:=(a+b)/2;$

$\quad \textrm{AUTO}(f,a,c,e);$

$\quad \textrm{AUTO}(f,c,b,e);$

$\textbf{endif}$

Similarly to ${\mathscr {A}}_{\varepsilon ,\delta },$ the algorithm consists of two steps. First $\textrm{AUTO}$ is run for $e=\varepsilon '$ satisfying $\varepsilon '\rightarrow 0^+$ and $\varepsilon /\varepsilon '\rightarrow 0^+$ as $\varepsilon \rightarrow 0^+,$ e.g.,

$$\begin{aligned} \varepsilon '=\varepsilon ^\kappa , \quad \text{ where }\quad 0<\kappa <1. \end{aligned}$$

Then ${\mathscr {L}}_r(f)$ in (6.2) is replaced by $\widetilde{{\mathscr {L}}}_r(f)$ given by (6.3), and $N_\varepsilon $ found from (6.4). The recursion is resumed with the target value $e=\varepsilon '',$ where

$$\begin{aligned} \varepsilon ''=\widetilde{{\mathscr {L}}}_r(f)\,m_\varepsilon ^{-(r+1)}. \end{aligned}$$

The algorithm is complemented by the corresponding $n_\varepsilon $ random samples.

Observe that the number $m''$ of subintervals in the final partition is asymptotically at least $m_\varepsilon .$ Indeed, for any function $g\in C^{r}([a,b])$ with $g^{(r)}(x)=\big |f_\Delta ^{(r)}(x)\big |$ we have ${\mathscr {L}}_r(g)={\mathscr {L}}_r(f)$ and

where the first inequality above follows from Proposition 2 of [11]. This implies

$$\begin{aligned} m''\,\gtrapprox \,\bigg (\frac{\widetilde{{\mathscr {L}}}_r(f)}{\varepsilon ''}\bigg )^\frac{1}{r+1}\approx \,m_\varepsilon , \end{aligned}$$

as claimed.

Denote the resulting approximation by ${\mathscr {A}}_{\varepsilon ,\delta }^*f.$ Observe that its running time is proportional to $N_\varepsilon $ since recursion can be implemented in linear time.

Theorem 6.1

We have

$$\begin{aligned} \textrm{Prob}\big (\,\big |Sf-{\mathscr {A}}_{\varepsilon ,\delta }^*f|>\varepsilon \big )\,\lessapprox \,\delta , \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}$$

Now we present outcomes of the second automatic procedure ${\mathscr {A}}_{\varepsilon ,\delta }^*$ for the test integral

$$\begin{aligned} \int _0^1\cos \bigg (\dfrac{100\,x}{x+10^{-4}}\bigg )\,\mathrm dx. \end{aligned}$$

(6.5)

Although the derivatives fluctuate and nullify many times in this case, we take $\Delta =~0.$ We confront the outcomes for $r=2$ and $r=4.$ In each case, we compute the number of breaches (i.e. when the absolute error is greater than $\varepsilon = 10^{-3}$) based on $K=10\,000$ independent executions of the code (Table 1). We also take $\varepsilon ' = \varepsilon ^{1/2}.$ In our testing, we expect the empirical probability of the breach to be less than $\delta =0.05.$ For completeness, we also present the maximum error from all executions together with obtained $N_\varepsilon .$

Table 1 Performance of the second automatic algorithm for the integral (6.5)

Full size table

Note that in both cases we did not identify any exceptions. The magnitude of the maximum errors indicate a serious overestimation of $N_\varepsilon ,$ but the results are satisfactory given the upper bound estimate of Theorem 6.1.

References

Davis, P., Rabinowitz, P.: Methods of Numerical Integration, 2nd edn. Academic Press, New York (1984)
MATH Google Scholar
Goćwin, M.: On optimal adaptive quadratures for automatic integration. BIT Numer. Math. 61, 411–439 (2021)
Article MathSciNet MATH Google Scholar
Gonnet, P.: A review of error estimation in adaptive quadrature. ACM Comput. Surv. 44, 1–36 (2012)
Article MATH Google Scholar
Heinrich, S.: Random approximation in numerical analysis. In: Bierstedt, K.D., et al. (eds.) Proceedings of the Functional Analysis Conference, Essen 1991, pp. 123–171. Marcel Dekker, New York (1993)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)
Article MathSciNet MATH Google Scholar
Kacewicz, B., Przybyłowicz, P.: Complexity of the derivative-free solution of systems of IVPs with unknown singularity hypersurface. J. Complex. 31, 75–97 (2015)
Article MathSciNet MATH Google Scholar
Lyness, J.N.: Guidelines for automatic quadrature routines. In: Freeman, C.V. (ed.) Information Processing 71, vol. 2, pp. 1351–1355. North-Holland Publ. (1972)
Novak, E.: Deterministic and Stochastic Error Bounds in Numerical Analysis. Vol. 1349 of Lecture Notes in Math. Springer, Berlin (1988)
Novak, E.: On the power of adaption. J. Complex. 12, 199–237 (1996)
Article MathSciNet MATH Google Scholar
Plaskota, L.: Automatic integration using asymptotically optimal adaptive Simpson quadrature. Numer. Math. 131, 173–198 (2015)
Article MathSciNet MATH Google Scholar
Plaskota, L., Samoraj, P.: Automatic approximation using asymptotically optimal adaptive interpolation. Numer. Algorithms 89, 277–302 (2022)
Article MathSciNet MATH Google Scholar
Plaskota, L., Wasilkowski, G.W.: Adaption allows efficient integration of functions with unknown singularities. Numer. Math. 102, 123–144 (2005)
Article MathSciNet MATH Google Scholar
Plaskota, L., Wasilkowski, G.W., Zhao, Y.: The power of adaption for approximating functions with singularities. Math. Comput. 77, 2309–2338 (2008)
Article MathSciNet MATH Google Scholar
Przybyłowicz, P.: Adaptive Itô-Taylor algorithm can optimally approximate the Itô integrals of singular functions. J. Comput. Appl. Math. 235, 203–217 (2010)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The work of L. Plaskota and P. Przybyłowicz was partially supported by the National Science Centre, Poland, under project 2017/25/B/ST1/00945.

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ul. S. Banacha 2, 02-097, Warsaw, Poland
Leszek Plaskota
Faculty of Applied Mathematics, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059, Kraków, Poland
Paweł Przybyłowicz & Łukasz Stȩpień

Authors

Leszek Plaskota
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Przybyłowicz
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Stȩpień
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leszek Plaskota.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Communicated by Stefano De Marchi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Below we present a crucial part of the code in the Python programming language, where all the algorithms were implemented. In addition, we provide relevant comments linked to particular fragments of the code.

(1)
The almost optimal partition main_nodes is derived out of this function in order to save computation time when the trajectories are computed subsequently. Moreover, node_type argument lets the user insert his own partitions, e.g. those based on Chebyshev polynomials of the second kind.
(2)
In order to minimize errors resulting from (possibly) adding relatively small adjustments to the estimated quadrature value, we use Decimal library. It enables us to increase the precision of intermediate computations, which is now set to 28 digits in decimal system.
(3)
In our case, the interpolating polynomial is based on equidistant mesh including endpoints of a subinterval $I_i$. By np we understand the references to NumPy library.
(4)
Initializing the variables which control Monte Carlo adjustments for our quadrature. In particular, l stores the number of currently used random points, while we loop through the subintervals.
(5)
The program calculates all interpolation nodes in the interval $I_i.$ For that reason, the function optimalt_uniform is executed to provide distinct $z_1, \ldots , z_r \in [0,1].$
(6)
Depending on the value of r, different formulas for (nonadaptive, deterministic) quadrature $SL_{m,r}$ are leveraged.
(7)
Below, we calculate the Monte Carlo adjustment on interval $I_i.$
(8)
This code yields random points used for. MC_init function reports them in a from of a number from 0 to m. The integer part points the index i of subinterval, while the fractional part - its position within $I_i.$ Both parameters are sourced by using math.modf function.
(9)
For stability reasons, the coefficients of interpolating polynomial in canonical base are not stored. Therefore, for every point, lagrange function is invoked separately.
(10)
We decided to add $SL_{m,r}$ for each subinterval and then add the cumulative adjustments. Since latter are usually relatively much smaller than the quadrature values, this might result in neglecting the actual adjustment values. Please note that Decimal library was also used to address such constraints.
(11)
Ultimately, we add Monte Carlo result to the previous approximation.

As it can be observed, the current solution enables the user to insert own interpolation meshes, increase the precision of computations, as well as extend the method to arbitrary regularity $r \in {\mathbb {N}}.$

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Plaskota, L., Przybyłowicz, P. & Stȩpień, Ł. Monte Carlo integration of $C^r$ functions with adaptive variance reduction: an asymptotic analysis. Bit Numer Math 63, 32 (2023). https://doi.org/10.1007/s10543-023-00972-0

Download citation

Received: 24 November 2022
Accepted: 16 April 2023
Published: 22 May 2023
DOI: https://doi.org/10.1007/s10543-023-00972-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Monte Carlo integration of \(C^r\) functions with adaptive variance reduction: an asymptotic analysis

Abstract

Similar content being viewed by others

Toeplitz Monte Carlo

On efficient weighted integration via a change of variables

Automatic integration using asymptotically optimal adaptive Simpson quadrature

1 Introduction

2 Variance reduction using Lagrange interpolation

Theorem 2.1

3 First adaptive algorithm

Theorem 3.1

4 Second adaptive algorithm

Theorem 4.1

5 Numerical experiments

6 Automatic Monte Carlo integration

Theorem 6.1

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Monte Carlo integration of \(C^r\) functions with adaptive variance reduction: an asymptotic analysis

Abstract

Similar content being viewed by others

Toeplitz Monte Carlo

On efficient weighted integration via a change of variables

Automatic integration using asymptotically optimal adaptive Simpson quadrature

1 Introduction

2 Variance reduction using Lagrange interpolation

Theorem 2.1

3 First adaptive algorithm

Theorem 3.1

4 Second adaptive algorithm

Theorem 4.1

5 Numerical experiments

6 Automatic Monte Carlo integration

Theorem 6.1

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation