## 1 Introduction

Adaption is a useful tool to improve performance of algorithms. The problems of numerical integration and related to it $$L^1$$ approximation are not exceptions, see, e.g., [9] for a survey of theoretical results on the subject. If an underlying function possesses some singularities and is otherwise smooth, then using adaption is necessary to localize the singular points and restore the convergence rate typical for smooth functions, see, e.g., [6, 12,13,14]. For functions that are smooth in the whole domain, adaptive algorithms usually do not offer a better convergence rate than nonadaptive algorithms; however, they can essentially lower asymptotic constants. This is why adaptive quadratures are widely used for numerical integration, see, e.g., [1, 3, 7]. Their superiority over nonadaptive quadratures is rather obvious, but precise answers to the quantitative question of “how much adaption helps” are usually missing. This gap was partially filled by recent results of [2, 10, 11], where best asymptotic constants of deterministic algorithms that use piecewise polynomial interpolation were determined for integration and $$L^1$$ approximation of r-times continuously differentiable functions $$f:[a,b]\rightarrow {\mathbb {R}}.$$ In this case, adaption relies on adjusting the partition of the interval [ab] to the underlying function. While the convergence rate is of order $$N^{-r},$$ where N is the number of function evaluations used, it turns out that the asymptotic constant depends on f via the factor of $$(b-a)^r\big \Vert f^{(r)}\big \Vert _{L^1}$$ for uniform partition, and $$\big \Vert f^{(r)}\Vert _{L^{1/(r+1)}}$$ for best adaptive partition.

In the current paper, we present the line of thinking similar to that of the aforementioned papers. The difference is that now we want to carry out the analysis and obtain asymptotic constants for randomized algorithms.

Our goal is the numerical approximation of the integral

\begin{aligned} Sf=\int _a^bf(x)\,\mathrm dx. \end{aligned}
(1.1)

It is well known that for $$f\in L^2(a,b)$$ the crude Monte Carlo,

\begin{aligned} M_Nf=\frac{b-a}{N}\sum _{i=1}^N f(t_i),\quad \text{ where }\quad t_i{\mathop {\sim }\limits ^{iid}}U(a,b), \end{aligned}
(1.2)

returns an approximation with expectation $${\mathbb {E}}(M_Nf)=Sf$$ and error (standard deviation)

\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-M_Nf\big )^2}=\frac{\sigma (f)}{\sqrt{N}},\quad \text{ where }\quad \sigma (f)^2=(b-a)S(f^2)-(Sf)^2. \end{aligned}
(1.3)

Suppose that the function enjoys more smoothness, i.e.,

\begin{aligned} f\in C^r([a,b]). \end{aligned}

Then a much higher convergence rate $$N^{-(r+1/2)}$$ can be achieved using various techniques of variance reduction, see, e.g., [4]. One way is to apply a randomized approximation of the form

\begin{aligned} {\overline{M}}_{N,r}(f)=S(L_{m,r}f)+M_n(f-L_{m,r}f), \end{aligned}
(1.4)

where $$L_{m,r}$$ is the piecewise polynomial interpolation of f of degree $$r-1$$ using a partition of the interval [ab] into m subintervals, $$M_n$$ is a Monte Carlo type algorithm using n random samples of f,  and N is the total number of function evaluations used (for arguments chosen either deterministically or randomly). The optimal rate is already achieved for uniform (nonadaptive) partition and crude Monte Carlo. Then, see Theorem 2.1 with $$\beta =0,$$ the error asymptotically equals

\begin{aligned} c\,(b-a)^{r+1/2}\big \Vert f^{(r)}\big \Vert _{L^2(a,b)}\,N^{-(r+1/2)}, \end{aligned}

where c depends only on the choice of the interpolation points within subintervals. The main result of this paper relies on showing that with the help of adaption the asymptotic error of the methods (1.4) can be reduced to

\begin{aligned} c\,\big \Vert f^{(r)}\big \Vert _{L^{1/(r+1)}(a,b)}\,N^{-(r+1/2)}, \end{aligned}
(1.5)

see Theorems 3.1 and 4.1. Observe that the gain can be significant, especially when the derivative $$f^{(r)}$$ drastically changes. For instance, for $$r=4,$$ $$[a,b]=[0,1],$$ and $$f(x)=1/(x+d),$$ adaption is asymptotically roughly $$5.7*10^{12}$$ times better than nonadaption if $$d=10^{-4},$$ and $$1.8*10^{29}$$ times if $$d=10^{-8}.$$

We construct two randomized algorithms, denoted $${\overline{M}}_{N,r}^{\,*}$$ and $${\overline{M}}_{N,r}^{\,**},$$ that achieve the error (1.5). Although they use different initial approaches; namely, stratification versus importance sampling, in the limit they both reach essentially the same partition, such that the $$L^1$$ errors of Lagrange interpolation in all subintervals are equalized. However, numerical tests of Sect. 5 show that the algorithm $${\overline{M}}_{N,r}^{\,*}$$ achieves the error (1.5) with some delay, which makes $${\overline{M}}_{N,r}^{\,**}$$ worth recommending rather than $${\overline{M}}_{N,r}^{\,*}$$ in practical computations.

Other advantages of $${\overline{M}}_{N,r}^{\,**}$$ are that it is easily implementable and, as shown in Sect. 6, it can be successfully used for automatic Monte Carlo integration.

Our analysis has been so far restricted to one-dimensional integrals only. In a future work it will be extended and corresponding adaptive Monte Carlo algorithms will be constructed for multivariate integration, where randomization finds its major application. The current paper is the first step in this direction.

In the sequel, we use the following notation. For two functions of N we write $$g_1(N)\lessapprox g_2(N)$$ iff $$\limsup _{N\rightarrow \infty }g_1(N)/g_2(N)\le 1,$$ and we write $$g_1(N)\approx g_2(N)$$ iff $$\lim _{N\rightarrow \infty }g_1(N)/g_2(N)=1.$$ Similarly, for functions of $$\varepsilon$$ we write $$h_1(\varepsilon )\lessapprox h_2(\varepsilon )$$ iff $$\limsup _{\varepsilon \rightarrow 0^+}h_1(\varepsilon )/h_2(\varepsilon )\le 1,$$ and $$h_1(\varepsilon )\approx h_2(\varepsilon )$$ iff $$\lim _{\varepsilon \rightarrow 0^+}h_1(\varepsilon )/h_2(\varepsilon )=1.$$

## 2 Variance reduction using Lagrange interpolation

We first derive some general error estimates for the variance reduction algorithms of the form (1.4), where the standard Monte Carlo is applied for the error of piecewise Lagrange interpolation. Specifically, we divide the interval [ab] into m subintervals using a partition $$a=x_0<x_1<\cdots <x_m=b,$$ and on each subinterval $$[x_{j-1},x_j]$$ we approximate f using Lagrange interpolation of degree $$r-1$$ with the interpolation points

\begin{aligned} x_{j,s}=x_{j-1}+z_s(x_j-x_{j-1}),\qquad 1\le s\le r, \end{aligned}

where

\begin{aligned} 0\le z_1<z_2<\cdots <z_r\le 1 \end{aligned}
(2.1)

are fixed (independent of the partition). Denote such an approximation by $$L_{m,r}f.$$ Then $$f=L_{m,r}f+R_{m,r}f$$ with $$R_{m,r}f=f-L_{m,r}f.$$ The integral Sf is finally approximated by

\begin{aligned} {\overline{M}}_{m,n,r}f\,=\,S(L_{m,r}f)+M_n(R_{m,r}f), \end{aligned}

where $$M_n$$ is the crude Monte Carlo (1.2). We obviously have $${\mathbb {E}}({\overline{M}}_{m,n,r}f)=Sf.$$ Since

\begin{aligned} Sf-{\overline{M}}_{m,n,r}f\,=\,Sf-S(L_{m,r}f)-M_n(R_{m,r}f)\,=\,S(R_{m,r}f)-M_n(R_{m,r}f), \end{aligned}

by (1.3) we have

\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{m,n,r}f\big )^2\,=\, \frac{1}{n}\left( (b-a)S\big ((R_{m,r}f)^2\big )-\big (S(R_{m,r}f)\big )^2\right) . \end{aligned}

Note that

\begin{aligned} S\big ((R_{m,r}f)^2\big )\,=\,\int _a^b(f-L_{m,r}f)^2(x)\,\mathrm dx \,=\,\Vert f-L_{m,r}f\Vert _{L^2(a,b)}^2 \end{aligned}

is the squared $$L^2$$-error of the applied (piecewise) polynomial interpolation, while

\begin{aligned} S(R_{m,r}f)\,=\,\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\,=\,S(f)-S(L_{m,r}f) \end{aligned}

is the error of the quadrature $${\overline{Q}}_{m,r}f=S(L_{m,r}f).$$

From now on we assume that f is not a polynomial of degree smaller than or equal to $$r-1,$$ since otherwise $${\overline{M}}_{m,n,r}f=Sf.$$ Define the polynomial

\begin{aligned} P(z)=(z-z_1)(z-z_2)\cdots (z-z_r). \end{aligned}
(2.2)

We first consider the interpolation error $$\Vert f-L_{m,r}f\Vert _{L^2(a,b)}.$$ Let

\begin{aligned} \alpha \,=\,\Vert P\Vert _{L^2(0,1)}=\bigg (\int _0^1|P(z)|^2\mathrm dz\bigg )^{1/2}. \end{aligned}
(2.3)

For each j,  the local interpolation error equals

\begin{aligned} \big \Vert f-L_{m,r}f\big \Vert _{L^2(x_{j-1},x_j)}= & {} \bigg (\int _{x_{j-1}}^{x_j}\bigl |\,(x-x_{j,1})\cdots (x-x_{j,r})f[x_{j,1},\ldots ,x_{j,r},x]\,\bigr |^2\mathrm dx\bigg )^{1/2} \\= & {} \alpha \,h_j^{r+1/2}\,\frac{|f^{(r)}(\xi _j)|}{r!},\qquad \qquad \xi _j\in [x_{j-1},x_j]. \end{aligned}

Hence

\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^2(a,b)}\,=\,\frac{\alpha }{r!}\bigg (\sum _{j=1}^m h_j^{2r+1}\big |f^{(r)}(\xi _j)\big |^2\bigg )^{1/2}. \end{aligned}

In particular, for the equispaced partition, in which case $$h_j=(b-a)/m,$$ we have

\begin{aligned} \big \Vert f-L_{m,r}f\big \Vert _{L^2(a,b)}= & {} \frac{\alpha }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r \bigg (\frac{b-a}{m}\sum _{j=1}^m|f^{(r)}(\xi _j)|^2\bigg )^{1/2} \\\approx & {} \frac{\alpha }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r\,\big \Vert f^{(r)}\big \Vert _{L^2(a,b)} \qquad \text{ as }\quad m\rightarrow +\infty . \end{aligned}

Now, we consider the quadrature error $$Sf-{\overline{Q}}_{m,r}f.$$ Let

\begin{aligned} \beta \,=\,\int _0^1 P(z)\,\mathrm dz. \end{aligned}
(2.4)

The local integration errors equal

\begin{aligned}{} & {} {\int _{x_{j-1}}^{x_j}(f-L_{m,r}f)(x)\,\mathrm dx = \int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f[x_{j,1},\ldots ,x_{j,r},x]\,\mathrm dx}\\{} & {} \quad =\frac{1}{r!}\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f^{(r)}(\xi _j(x))\,\mathrm dx, \qquad \xi _j(x)\in [x_{j-1},x_j]. \end{aligned}

Choose arbitrary $$\zeta _j\in [x_{j-1},x_j]$$ for $$1\le j\le m.$$ Then

\begin{aligned}{} & {} \bigg |\frac{1}{r!}\,\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})f^{(r)}(\xi _j(x))\,\mathrm dx\,\\{} & {} \qquad -\frac{f^{(r)}(\zeta _j)}{r!}\int _{x_{j-1}}^{x_j} (x-x_{j,1})\cdots (x-x_{j,r})\,\mathrm dx\bigg |\\{} & {} \quad = \frac{1}{r!}\,\bigg |\int _{x_{j-1}}^{x_j}(x-x_{j,1})\cdots (x-x_{j,r})\left( f^{(r)}(\xi _j(x))-f^{(r)}(\zeta _j)\right) \, \mathrm dx\bigg |\;\\{} & {} \quad \le \;\omega (h_j)\,\frac{h_j^{r+1}}{r!}\,\Vert P\Vert _{L^1(0,1)}, \end{aligned}

where $$\omega$$ is the modulus of continuity of $$f^{(r)}.$$ We also have

\begin{aligned} \frac{f^{(r)}(\zeta _j)}{r!}\int _{x_{j-1}}^{x_j}(x-x_{j,1})\cdots (x-x_{j,r})\,\mathrm dx\,=\, \frac{\beta }{r!}\,h_j^{r+1}f^{(r)}(\zeta _j). \end{aligned}

Hence $$Sf-{\overline{Q}}_{m,r}f\,=\,X_m\,+\,Y_m,$$ where

\begin{aligned} X_m \,=\, \frac{\beta }{r!}\,\sum _{j=1}^mh_j^{r+1}f^{(r)}(\zeta _j)\qquad \text{ and }\qquad |Y_m| \,\le \, \frac{\Vert P\Vert _{L^1(0,1)}}{r!}\sum _{j=1}^m\omega (h_j)h_j^{r+1}. \end{aligned}

In particular, for the equispaced partition,

\begin{aligned} X_m= & {} \frac{\beta }{r!}\,(b-a)^r\bigg (\sum _{j=1}^m\frac{b-a}{m}f^{(r)}(\zeta _j)\bigg )\,m^{-r},\\ |Y_m|\le & {} \frac{\Vert P\Vert _{L^1(0,1)}}{r!}\;\omega \bigg (\frac{b-a}{m}\bigg )(b-a)^{r+1}m^{-r}. \end{aligned}

Suppose that $$\beta \ne 0$$ and $$\int _a^bf^{(r)}(x)\,\mathrm dx\ne 0.$$ Then $$X_m\approx \frac{\beta }{r!}(b-a)^r\Big (\int _a^bf^{(r)}(x) \mathrm dx\Big )m^{-r}.$$ Since $$\omega (h)$$ goes to zero as $$h\rightarrow 0^+,$$ the component $$X_m$$ dominates $$Y_m$$ as $$m\rightarrow +\infty .$$ Hence

\begin{aligned} Sf-{\overline{Q}}_{m,r}f \,\approx \, \frac{\beta }{r!}\,\bigg (\frac{b-a}{m}\bigg )^r\,\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg ) \qquad \text{ as }\quad m\rightarrow +\infty . \end{aligned}

On the other hand, if $$\beta =0$$ or $$\int _a^bf^{(r)}(x)\,\mathrm dx=0$$ then the quadrature error converges to zero faster than $$m^{-r},$$ i.e.

\begin{aligned} \lim _{m\rightarrow +\infty }\big (Sf-{\overline{Q}}_{m,r}f\big )\,m^r\,=\,0. \end{aligned}

Note that $$\beta =0$$ if and only if the quadrature $${\overline{Q}}_{m,r}$$ has the degree of exactness at least r,  i.e., it is exact for all polynomials of degree r or less. Obviously, the maximal degree of exactness equals $$2r-1.$$

We see that for the equidistant partition of the interval [ab] the error $$\big ({\mathbb {E}}(Sf-{\overline{M}}_{m,n,r}f)^2\big )^{1/2}$$ is asymptotically proportional to

\begin{aligned} \phi (m,n)=n^{-1/2}m^{-r}, \end{aligned}

regardless of the choice of points $$z_i$$ in (2.1). Let us minimize $$\phi (m,n)$$ assuming the total number of points used is at most N. We have two cases depending on whether both endpoints of each subinterval are used in interpolation. If so, i.e., if $$z_1=0$$ and $$z_r=1$$ (in this case $$r\ge 2$$) then $$N=(r-1)m+1+n.$$ The optimal values are

\begin{aligned} m^*=\frac{2r(N-1)}{(r-1)(2r+1)},\qquad n^*=\frac{N-1}{2r+1}, \end{aligned}
(2.5)

for which

\begin{aligned} \phi (m^*,n^*)\,=\, \sqrt{2}\,\bigg (1-\frac{1}{r}\bigg )^r\bigg (\frac{r+1/2}{N}\bigg )^{r+1/2}. \end{aligned}

Otherwise we have $$N=rm+n.$$ The optimal values are

\begin{aligned} m^*=\frac{2N}{2r+1},\qquad n^*=\frac{N}{2r+1}, \end{aligned}
(2.6)

for which

\begin{aligned} \phi (m^*,n^*)\,=\,\sqrt{2}\,\bigg (\frac{r+1/2}{N}\bigg )^{r+1/2}. \end{aligned}

Denote by $${\overline{M}}_{N,r}$$ the corresponding algorithm with the equidistant partition, where for given N the values of n and m equal correspondingly $$\lfloor n^*\rfloor$$ and $$\lfloor m^*\rfloor .$$ Our analysis is summarized in the following theorem.

### Theorem 2.1

We have as $$N\rightarrow +\infty$$ that

\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}_{N,r}f\big )^2}\;\approx \; c_r\,(b-a)^r\,C(P,f)\,N^{-(r+1/2)}, \end{aligned}

where

\begin{aligned} C(P,f)=\sqrt{\alpha ^2\,(b-a)\bigg (\int _a^b\big |f^{(r)}(x)\big |^2\mathrm dx\bigg )\,-\, \beta ^2\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg )^2}, \end{aligned}

$$\alpha$$ and $$\beta$$ are given by (2.3) and (2.4), and

\begin{aligned} c_r=\left\{ \begin{array}{ll}\sqrt{2}\,\big (1-\frac{1}{r}\big )^r\frac{(r+1/2)^{r+1/2}}{r!},&{} \quad \text{ if }\quad r\ge 2,\,z_1=0,\,z_r=1,\\ \ \sqrt{2}\,\frac{(r+1/2)^{r+1/2}}{r!},&{} \quad \text{ otherwise }.\end{array}\right. \end{aligned}
(2.7)

We add that the algorithm $${\overline{M}}_{N,r}$$ is fully implementable since we assume that we have access to function evaluations at points from [ab].

Now we add a stratification strategy to our algorithm of Theorem 2.1 to obtain an adaptive algorithm with a much better asymptotic constant. That is, we divide the initial interval [ab] into k equal length subintervals $$I_i, \ 1\le i \le k,$$ and on each subinterval we apply the approximation of Theorem 2.1 with some $$N_i,$$ where

\begin{aligned} \sum _{i=1}^k N_i\le N. \end{aligned}
(3.1)

Denote such an approximation by $${\overline{M}}_{N,k,r}.$$ (Note that $${\overline{M}}_{N,r}={\overline{M}}_{N,1,r}$$.) Then, by Theorem 2.1, for fixed k we have as all $$N_i\rightarrow +\infty$$ that

\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}_{N,k,r}f\big )^2}\;\approx \;c_r h^r \bigg (\sum _{i=1}^k\frac{C_i^2}{N_i^{2r+1}}\bigg )^{1/2}, \end{aligned}

where

\begin{aligned} C_i=C_i(P,f)=\sqrt{\alpha ^2\,h\,\int _{I_i}\big |f^{(r)}(x)\big |^2\mathrm dx-\beta ^2\, \left( \int _{I_i}f^{(r)}(x)\,\mathrm dx\right) ^2},\qquad h=\frac{b-a}{k}. \end{aligned}

Minimization of $$\psi (N_1,\ldots ,N_k)=\left( \sum _{i=1}^kC_i^2N_i^{-(2r+1)}\right) ^{1/2}$$ with respect to (3.1) gives

\begin{aligned} N_i^*\,=\,\frac{C_i^{1/(r+1)}}{\sum _{j=1}^k C_j^{1/(r+1)}}\,N,\qquad 1\le i\le k, \end{aligned}

and then

\begin{aligned} \psi (N_1^*,\ldots ,N_k^*)= \bigg (\sum _{i=1}^k C_i^{1/(r+1)}\bigg )^{r+1}N^{-(r+1/2)}. \end{aligned}

Let $$\xi _i,\eta _i\in I_i$$ satisfy $$\int _{I_i}\big |f^{(r)}(x)\big |^2\mathrm dx=h\big |f^{(r)}(\xi _i)\big |^2$$ and $$\int _{I_i}f^{(r)}(x)\,\mathrm dx=hf^{(r)}(\eta _i).$$ Then

\begin{aligned} C_i=h\sqrt{\alpha ^2|f^{(r)}(\xi _i)|^2-\beta ^2|f^{(r)}(\eta _i)|^2} \end{aligned}

and we have as $$k\rightarrow +\infty$$ that

\begin{aligned} \bigg (\sum _{i=1}^kC_i^{1/(r+1)}\bigg )^{r+1}= & {} h\,\bigg (\sum _{i=1}^k \big (\alpha ^2|f^{(r)}(\xi _i)|^2-\beta ^2|f^{(r)}(\eta _i)|^2\big )^\frac{1}{2(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h\,(\alpha ^2-\beta ^2)^{1/2}\bigg (\sum _{i=1}^k\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h^{-r}(\alpha ^2-\beta ^2)^{1/2}\bigg (\sum _{i=1}^kh\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}\nonumber \\\approx & {} h^{-r}(\alpha ^2-\beta ^2)^{1/2}\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}. \end{aligned}

It is clear that we have to take $$N_i$$ to be an integer and at least r,  for instance

\begin{aligned} N_i=\left\lfloor N_i^*\left( 1-\frac{kr}{N}\right) +r\right\rfloor ,\qquad 1\le i\le k. \end{aligned}

Then the corresponding number $$m_i$$ of subintervals and number $$n_i$$ of random points in $$I_i$$ can be chosen as

\begin{aligned} m_i=\max \left( \lfloor m_i^*\rfloor ,1\right) ,\qquad n_i=\lfloor n_i^*\rfloor , \end{aligned}

where $$m_i^*$$ and $$n_i^*$$ are given by (2.5) and (2.6) with N replaced by $$N_i.$$

Denote by $${\overline{M}}^{\,*}_{N,r}$$ the above constructed approximation $${\overline{M}}_{N,k_N,r}$$ with $$k_N$$ such that $$k_N\rightarrow +\infty$$ and $$k_N/N\rightarrow 0$$ as $$N\rightarrow +\infty .$$ For instance, $$k_N=N^\kappa$$ with $$0<\kappa <1.$$ Our analysis gives the following result.

### Theorem 3.1

We have as $$N\rightarrow +\infty$$ that

\begin{aligned} \sqrt{{\mathbb {E}}\big (Sf-{\overline{M}}^{\,*}_{N,r}f\big )^2}\,\approx \, c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}. \end{aligned}

The asymptotic constant of the approximation $${\overline{M}}^{\,*}_{N,r}$$ of Theorem 3.1 is never worse than that of $${\overline{M}}_{N,r}$$ of Theorem 2.1. Indeed, comparing both constants we have

\begin{aligned}{} & {} c_r(b-a)^r\sqrt{\alpha ^2\,(b-a)\bigg (\int _a^b\big |f^{(r)}(x)\big |^2\mathrm dx\bigg )\,-\, \beta ^2\bigg (\int _a^bf^{(r)}(x)\,\mathrm dx\bigg )^2}\\{} & {} \qquad \ge \;c_r\sqrt{\alpha ^2-\beta ^2}\,(b-a)^{r+1/2} \bigg (\int _a^b\big |f^{(r)}(x)\big |^2\bigg )^{1/2}\mathrm dx\\{} & {} \qquad \ge \,c_r\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}, \end{aligned}

where the first inequality follows from the Schwarz inequality and the second one from Hölder’s inequality for integrals. As shown in the introduction, the gain can be significant, especially when the derivative $$f^{(r)}$$ drastically changes.

The approximation $${\overline{M}}^{\,*}_{N,r}$$ possesses good asymptotic properties, but is not feasible since we do not have a direct access to the $$C_i$$s. In a feasible implementation one can approximate $$C_i$$ using divided differences, i.e.,

\begin{aligned} {\widetilde{C}}_i=h\sqrt{\alpha ^2-\beta ^2}\,|d_i|\,r!\,\qquad \text{ where }\qquad d_i=f[x_{i,0},x_{i,1},\ldots ,x_{i,r}] \end{aligned}
(3.2)

and $$x_{i,j}$$ are arbitrary points from $$I_i.$$ Then

\begin{aligned} N_i^*=\frac{|d_i|^{1/(r+1)}}{\sum _{j=1}^{k_N}|d_j|^{1/(r+1)}}\,N. \end{aligned}

This works well for functions f for which the rth derivative does not nullify at any point in [ab]. Indeed, then $$f^{(r)}$$ does not change its sign and, moreover, it is separated away from zero. This means that

\begin{aligned} \lim _{N\rightarrow \infty }\,\max _{1\le i\le k_N}\,{C_i}/{{\widetilde{C}}_i}=1, \end{aligned}

which is enough for the asymptotic equality of Theorem 3.1 to hold true.

If $$f^{(r)}$$ is not separated away from zero then we may have problems with proper approximations of $$C_i$$ in the intervals $$I_i$$ where $$|f^{(r)}|$$ assumes extremely small values or even zeros. A possible and simple remedy is to choose ‘small’ $$\Delta >0$$ and modify $${\widetilde{C}}_i$$ as follows:

\begin{aligned} {\widetilde{C}}_i=\left\{ \begin{array}{rl} h\,\sqrt{\alpha ^2-\beta ^2}\,|d_i|\,r! &{}\;\,\text{ for }\;|d_i|r!\ge \Delta ,\\ h\,\alpha \,\Delta \,r! &{}\;\,\text{ for }\;|d_i|r!<\Delta .\end{array}\right. \end{aligned}
(3.3)

Then, letting $$A_1=\big \{a\le x\le b:\,|f^{(r)}(x)|\ge \Delta \big \}$$ and $$A_2=[a,b]\setminus A_1,$$ we have as $$k\rightarrow +\infty$$ that

Hence, the approximation of $$C_i$$ by (3.3) results in an algorithm whose error is approximately upper bounded by

\begin{aligned} c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}

where

\begin{aligned} \big |f_\Delta ^{(r)}(x)\big |=\max \bigg (\big |f^{(r)}(x)\big |, \sqrt{\tfrac{\alpha ^2}{\alpha ^2-\beta ^2}}\,\Delta \bigg ). \end{aligned}
(3.4)

We obviously have $$\lim _{\Delta \rightarrow 0^+}\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx =\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx.$$

A closer look at the deterministic part of $${\overline{M}}^{\,*}_{N,r}$$ shows that the final partition of the interval [ab] tends to equalize the $$L^1$$ errors in all of the m subintervals. As shown in [11], such a partition is optimal in the sense that it minimizes the asymptotic constant in the error $$\Vert f-L_{m,r}f\Vert _{L^1(a,b)}$$ among all possible piecewise Langrange interpolations $$L_{m,r}$$. A disadvantage of the optimal partition is that it is not nested. This makes it necessary to start the computations from scratch when N is updated to a higher value. Also, a proper choice of the sequence $$k_N=N^\kappa$$ is problematic, especially when N is still relatively small. On one hand, the larger $$\kappa$$ the better the approximation of $$C_i$$ by $${\widetilde{C}}_i,$$ but also the more far away the partition from the optimal one. On the other hand, the smaller $$\kappa$$ the closer the partition to the optimal one, but also the worse the approximation of $$C_i.$$ This trade–off significantly affects the actual behavior of the algorithm, which can be seen in numerical experiments of Sect. 5.

In the following section, we propose another adaptive approach leading to an easily implementable algorithm that produces nested partitions close to optimal and possesses asymptotic properties similar to that of $${\overline{M}}^{\,*}_{N,r}.$$ As we shall see in Sect. 6, nested partitions are vital for automatic Monte Carlo integration.

Consider a $$\rho$$-weighted integration problem

\begin{aligned} S_\rho f=\int _a^bf(x)\rho (x)\,\mathrm dx, \end{aligned}

where the function $$\rho :[a,b]\rightarrow {\mathbb {R}}$$ is integrable and positive a.e. and $$\int _a^b\rho (x)\,\mathrm dx=1.$$ The corresponding Monte Carlo algorithm is

\begin{aligned} M_{n,\rho }f=\frac{1}{n}\sum _{i=1}^nf(t_i),\qquad t_i{\mathop {\sim }\limits ^{iid}}\mu _\rho , \end{aligned}

where $$\mu _\rho$$ is the probability distribution on [ab] with density $$\rho .$$ Then

\begin{aligned} {\mathbb {E}}(S_\rho f-M_{n,\rho }f)^2=\frac{1}{n}\big (S_\rho (f^2)-(S_\rho f)^2\big ). \end{aligned}

Now, the non-weighted integral (1.1) can be written as

\begin{aligned} S(f)=\int _a^b h(x)\rho (x)\,\mathrm dx=S_\rho (h), \quad \text{ where }\quad h(x)=\frac{f(x)}{\rho (x)}. \end{aligned}

Then

\begin{aligned} {\mathbb {E}}(Sf-M_{n,\rho }h)^2={\mathbb {E}}(S_\rho h-M_{n,\rho }h)^2 =\frac{1}{n}\big (S_\rho (h^2)-(S_\rho h)^2\big )=\frac{1}{n}\big (S_\rho (f/\rho )^2-(Sf)^2\big ). \end{aligned}

Let’s go further on and apply a variance reduction,

\begin{aligned} {\overline{M}}_{n,\rho }f=S(Lf)+M_{n,\rho }\bigg (\frac{f-Lf}{\rho }\bigg ), \end{aligned}
(4.1)

where Lf is an approximation to f. Then

\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{n,\rho }f\big )^2= \frac{1}{n}\left( \int _a^b\frac{(f-Lf)^2(x)}{\rho (x)}\,\mathrm dx- \bigg (\int _a^b (f-Lf)(x)\,\mathrm dx\bigg )^2\right) . \end{aligned}

The question is how to choose L and $$\rho$$ to make the quantity

\begin{aligned} \int _a^b\frac{(f-Lf)^2(x)}{\rho (x)}\,\mathrm dx- \bigg (\int _a^b (f-Lf)(x)\,\mathrm dx\bigg )^2 \end{aligned}

as small as possible.

Observe that if

\begin{aligned} \rho (x)=\frac{|(f-Lf)(x)|}{\Vert f-Lf\Vert _{L^1(a,b)}} \end{aligned}

then

\begin{aligned} {\mathbb {E}}\big (Sf-{\overline{M}}_{n,\rho }f\big )^2= \frac{1}{n}\left( \Vert f-Lf\Vert _{L^1(a,b)}^2-\bigg (\int _a^b(f-Lf)(x)\mathrm dx\bigg )^2\right) \end{aligned}

and this error is even zero if $$(f-Lf)(x)$$ does not change its sign. This suggests the following algorithm.

Suppose that $$Lf=L_{m,r}f$$ is based on a partition of [ab] such that the $$L^1$$ errors in all m subintervals $$I_i$$ have the same value, i.e.,

\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^1(I_i)}=\frac{1}{m}\,\Vert f-L_{m,r}f\Vert _{L^1(a,b)},\quad 1\le i\le m. \end{aligned}
(4.2)

Then we apply the variance reduction (4.1) with density

\begin{aligned} \rho (x)=\frac{1}{mh_i},\qquad x\in I_i,\quad 1\le i\le m, \end{aligned}
(4.3)

where $$h_i$$ is the length of $$I_i.$$ That is, for the corresponding probability measure $$\mu _\rho$$ we have $$\mu _\rho (I_i)=\tfrac{1}{m}$$ and the conditional distribution $$\mu _\rho (\cdot |I_i)$$ is uniform on $$I_i.$$

We now derive an error formula for such an approximation. Let $$\gamma =\Vert P\Vert _{L^1(a,b)}=\int _0^1|P(z)|\,\mathrm dz.$$ (Recall that P is given by (2.2).) We have

\begin{aligned} \Vert f-L_{m,r}f\Vert _{L^1(I_i)}=\frac{\gamma }{r!}\,h_i^{r+1}\big |f^{(r)}(\xi _i)\big |\quad \text{ and }\quad \Vert f-L_{m,r}f\Vert _{L^2(I_i)}=\frac{\alpha }{r!}\,h_i^{r+1/2}\big |f^{(r)}(\zeta _i)\big | \end{aligned}

for some $$\xi _i,\zeta _i\in I_i.$$ Denoting

\begin{aligned} A=h_i^{r+1}\big |f^{(r)}(\xi _i)\big | \end{aligned}
(4.4)

(which is independent of i) we have as $$m\rightarrow +\infty$$ that

\begin{aligned}{} & {} {\bigg (\int _a^b\frac{(f-L_{m,r}f)^2(x)}{\rho (x)}\,\mathrm dx\bigg )^{1/2}\;=\; \bigg (m\sum _{i=1}^mh_i\int _{I_i}(f-L_{m,r}f)^2(x)\,\mathrm dx\bigg )^{1/2}} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^m h_i^{2r+2}\big |f^{(r)}(\zeta _i)\big |^{2}\bigg )^{1/2} \,\approx \,\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^m h_i^{2r+2}\big |f^{(r)}(\xi _i)\big |^2\bigg )^{1/2} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (m\sum _{i=1}^mA^2\bigg )^{1/2}\,=\,\frac{\alpha }{r!}\,mA\,=\, \frac{\alpha }{r!}\,\big (mA^{1/(r+1)}\big )^{r+1}m^{-r} \\{} & {} \quad \;=\;\frac{\alpha }{r!}\,\bigg (\sum _{i=1}^m h_i\big |f^{(r)}(\xi _i)\big |^{1/(r+1)}\bigg )^{r+1}m^{-r} \;\approx \;\frac{\alpha }{r!}\,\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}m^{-r}. \end{aligned}

To get an asymptotic formula for $$\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx$$ we use the analysis done in Sect. 2. If $$\beta =0$$ then the integral decreases faster than $$m^{-r}.$$ Let $$\beta \ne 0.$$ Then

\begin{aligned} \int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\;\approx \; \frac{\beta }{r!}\,\sum _{i=1}^m h_i^{r+1}f^{(r)}(\xi _i)=\frac{\beta }{r!}(m_+-m_-)A, \end{aligned}

where $$\xi _i$$s are as in (4.4), and $$m_+$$ and $$m_-$$ are the numbers of indexes i for which $$f^{(r)}(\xi _i)\ge 0$$ and $$f^{(r)}(\xi _i)<0,$$ respectively. Let

\begin{aligned} D_+=\{x\in [a,b]:\,f^{(r)}(x)\ge 0\},\qquad D_-=\{x\in [a,b]:\,f^{(r)}(x)<0\}. \end{aligned}

Since $$A\approx \Vert f^{(r)}\Vert _{L^{1/(r+1)}(a,b)}m^{-(r+1)}$$ and $$m_+A^{1/(r+1)}\approx \int _{D_+}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx,$$ we have

\begin{aligned} m_+\approx \frac{\int _{D_+}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}{\int _{D_+\cup D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx},\qquad m_-\approx \frac{\int _{D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}{\int _{D_+\cup D_-}|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}. \end{aligned}

Thus

\begin{aligned} \int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\approx \frac{\beta }{r!} \bigg (\frac{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn} f^{(r)}(x)\,\mathrm dx}{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}\bigg )\Vert f^{(r)}\Vert _{L^{1/(r+1)}(a,b)}\,m^{-r} \end{aligned}

provided $$\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn} f^{(r)}(x)\,\mathrm dx\ne 0,$$ and otherwise the convergence is faster than $$m^{-r}.$$

Our analysis above shows that if m and n are chosen as in (2.5) and (2.6) then the error of the described algorithm asymptotically equals (as $$N\rightarrow +\infty$$)

\begin{aligned}{} & {} c_r\,\sqrt{\alpha ^2-\beta ^2 \bigg (\frac{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\textrm{sgn}f^{(r)}(x)\,\mathrm dx}{\int _a^b|f^{(r)}(x)|^{1/(r+1)}\mathrm dx}\bigg )^2} \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} N^{-(r+1/2)}.\nonumber \\ \end{aligned}
(4.5)

The factor at $$\beta ^2$$ in (4.5) can be easily replaced by 1 with the help of stratified sampling. Indeed, instead of randomly sampling n times with density (4.3) on the whole interval [ab],  one can apply the same sampling strategy independently on k groups $$G_j$$ of subintervals. Each group consists of $$s=m/k$$ subintervals,

\begin{aligned} G_j=\bigcup _{\ell =1}^s I_{(j-1)s+\ell },\qquad 1\le j\le k, \end{aligned}

and the number of samples for each $$G_j$$ equals n/k. As in the algorithm $${\overline{M}}^{\,*}_{N,r},$$ we combine $$k=k_N$$ and N in such a way that $$k_N\rightarrow +\infty$$ and $$k_N/N\rightarrow 0$$ as $$N\rightarrow \infty .$$ Then the total number of points used in each $$G_j$$ is $$N_j=N/k.$$ Denoting

\begin{aligned} C_j=\bigg (\int _{G_j}\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} =\bigg (\frac{1}{k}\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1} \end{aligned}

and using the fact that the factor at $$\beta ^2$$ equals 1 if $$f^{(r)}$$ does not change its sign, the error of such an approximation asymptotically equals

\begin{aligned} c_r\,\sqrt{\alpha ^2-\beta ^2}\,\bigg (\sum _{j=1}^k\frac{C_j^2}{N_j^{2r+1}}\bigg )^{1/2} =c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}

as claimed. (Note that $$N_j=N/k$$ minimize the sum $$\sum _{j=1}^kC_j^2N_j^{-(2r+1)}$$ with respect to $$\sum _{j=1}^kN_j=N;$$ compare with the analysis in Sect. 3.)

Thus we obtained exactly the same error formula as in Theorem 3.1 for $${\overline{M}}_{N,r}^{\,*}.$$

It remains to show a feasible construction of a nested partition that is close to the one satisfying (4.2). To that end, we utilize the iterative method presented in [11], where the $$L^p$$ error of piecewise Lagrange interpolation is examined.

We first consider the case when

\begin{aligned} f^{(r)}>0\quad \text{ or }\quad f^{(r)}<0. \end{aligned}
(4.6)

In the following construction, we use a priority queue $${\mathscr {S}}$$ whose elements are subintervals. For each subinterval $$I_i$$ of length $$h_i,$$ its priority is given as

\begin{aligned} p_f(I_i)\,=\,h_i^{r+1}|d_i|, \end{aligned}

where $$d_i$$ is the divided difference (3.2). In the following pseudocode, $$\textrm{insert}({\mathscr {S}},I)$$ and $$I:=\mathrm {extract\_max}({\mathscr {S}})$$ implement correspondingly the actions of inserting an interval to $${\mathscr {S}},$$ and extracting from $${\mathscr {S}}$$ an interval with the highest priority.

$$\textbf{algorithm}\;\textrm{PARTITION}$$

$${\mathscr {S}}=\emptyset ;\;\textrm{insert}({\mathscr {S}},[a,b]);$$

$$\textbf{for}\; k=2:m$$

$$\quad [l,r]=\mathrm {extract\_max}({\mathscr {S}});$$

$$\quad c=(l+r)/2;$$

$$\quad \textrm{insert}({\mathscr {S}},[l,c]); \textrm{insert}({\mathscr {S}},[c,r]);$$

$$\textbf{endfor}$$

After execution, the elements of $${\mathscr {S}}$$ form a partition into m subintervals $$I_i.$$ Note that if the priority queue is implemented through a heap then the running time of $$\textrm{PARTITION}$$ is proportional to $$m\log m.$$

Denote by $${\overline{M}}^{\,**}_{N,r}$$ the corresponding algorithm that uses the above nested partition and density (4.3), and N is related to the number m of subintervals and the number n of random samples as in (2.5) and (2.6). We want to see how much worse is this algorithm than that using the (not nested) partition (4.2).

Let $$A=(A_1,A_2,\ldots ,A_m)$$ with

\begin{aligned} A_i=p_f(I_i)\,r!=h_i^{r+1}\big |f^{(r)}(\omega _i)\big |,\qquad \omega _i\in I_i, \end{aligned}

and $$\Vert A\Vert _p=\big (\sum _{i=1}^mA_i^p\big )^{1/p}.$$ For the corresponding piecewise Lagrange approximation $$L_{m,r}f$$ and density $$\rho$$ given by (4.3) we have

\begin{aligned}{} & {} {\int _a^b\frac{(f-L_{m,r}f)^2(x)}{\rho (x)}\,\mathrm dx \,-\,\bigg (\int _a^b(f-L_{m,r}f)(x)\,\mathrm dx\bigg )^2}\\{} & {} \quad \approx \,\frac{1}{(r!)^2}\bigg (\alpha ^2 m\sum _{i=1}^m A_i^2 -\beta ^2\bigg (\sum _{i=1}^mA_i\bigg )^2\,\bigg )\,=\,\frac{1}{(r!)^2} \left( \,\alpha ^2m\Vert A\Vert _2^2-\beta ^2\Vert A\Vert _1^2\,\right) . \end{aligned}

We also have $$\bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}\approx \big (\sum _{i=1}^mA_i^{1/(r+1)}\big )^{r+1}=\Vert A\Vert _{\frac{1}{r+1}}.$$ Hence

\begin{aligned} \sqrt{{\mathbb {E}}(Sf-{\overline{M}}^{**}_{N,r}f)^2}\approx & {} \left( \,\alpha ^2m\Vert A\Vert _2^2-\beta ^2\Vert A\Vert _1^2\,\right) ^{1/2}n^{-1/2}m^{-r}\\\approx & {} K_{m,r}(A)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}

where

\begin{aligned} K_{m,r}(A)=\frac{\sqrt{\kappa _\alpha ^2\,m\,\Vert A\Vert _2^2 -\kappa _\beta ^2\,\Vert A\Vert _1^2}}{\Vert A\Vert _{\frac{1}{r+1}}}\,m^r, \qquad \kappa _\alpha =\frac{\alpha }{\sqrt{\alpha ^2-\beta ^2}}, \quad \kappa _\beta =\frac{\beta }{\sqrt{\alpha ^2-\beta ^2}}. \end{aligned}

Observe that halving an interval results in two subintervals whose priorities are asymptotically (as $$m\rightarrow +\infty$$) $$2^{r+1}$$ times smaller than the priority of the original interval. This means that $$K_{m,r}(A)$$ is asymptotically not larger than

\begin{aligned} K^*(r)\;=\;\limsup _{m\rightarrow \infty }\;\max \, \left\{ K_{m,r}(A):\;A=(A_1,\ldots ,A_m),\,\max _{1\le i,j\le m}\frac{A_i}{A_j}\le 2^{r+1}\,\right\} . \nonumber \\ \end{aligned}
(4.7)

Thus we obtained the following result.

### Theorem 4.1

If the function f satisfies (4.6) then we have as $$N\rightarrow +\infty$$ that

\begin{aligned} \sqrt{{\mathbb {E}}(Sf-{\overline{M}}^{\,**}_{N,r}f)^2}\,\lessapprox \,K^*(r)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}

where $$K^*(r)$$ is given by (4.7).

We numerically calculated $$K^*(r)$$ in some special cases. For instance, if the points $$z_i$$ in (2.1) are equispaced, $$z_i=(i-1)/(r-1),$$ $$1\le i\le r,$$ then for $$r=2,3,4,5,6$$ we correspondingly have

\begin{aligned} K^*(r)\,=\,4.250,\;3.587,\;7.077,\;11.463,\;23.130, \end{aligned}

while for any $$z_i$$s satisfying $$\beta =\int _0^1(z-z_1)\cdots (z-z_r)\,\mathrm dz=0$$ we have

\begin{aligned} K^*(r)\,=\,2.138,\;3.587,\;6.323,\;11.463,\;21.140. \end{aligned}

If f does not satisfy (4.6) then the algorithm $${\overline{M}}^{**}_{N,r}f$$ may fail. Indeed, it may happen that $$p_f(I_i)=0$$ while $$f^{(r)}\not =0$$ in $$I_i.$$ Then this subinterval may never be further subdivided. In this case, we can repeat the same construction, but with the modified priority

\begin{aligned} p_f(I_i)\,=\,h_i^{r+1}\max \big (\,|d_i|,\,\Delta /r!\big ), \end{aligned}

where $$\Delta >0.$$ Then the error is asymptotically upper bounded by

\begin{aligned} K^*(r)\,c_r\,\sqrt{\alpha ^2-\beta ^2}\, \bigg (\int _a^b\big |f_\Delta ^{(r)}(x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}N^{-(r+1/2)}, \end{aligned}

where $$\big |f_\Delta ^{(r)}(x)\big |$$ is given by (3.4).

## 5 Numerical experiments

In this section, we present results of two numerical experiments that illustrate the performance of the nonadaptive Monte Carlo algorithm $${\overline{M}}_{N,r}$$ and the adaptive algorithms $${\overline{M}}^{\,*}_{N,r}$$ and $${\overline{M}}^{**}_{N,r}.$$ Our test integral is

\begin{aligned} \int _0^1\frac{1}{x+10^{-4}}\,\mathrm dx. \end{aligned}

Since for $$r\in {\mathbb {N}}$$ we have $$(-1)^r f^{(r)}>0,$$ the parameter $$\Delta$$ is set to zero.

The three algorithms are verified for $$r=2$$ and $$r=4$$. In both cases, the interpolation nodes are equispaced, i.e., in (2.1) we take

\begin{aligned} z_i =\frac{i-1}{r-1},\qquad 1\le i \le r. \end{aligned}

In addition, for the first adaptive algorithm $${\overline{M}}_{N,r}^{\,*}$$ we take $$k_N = N^\kappa$$ with $$\kappa = 0.8.$$ This exponent was chosen to ensure a trade–off as per our discussion in Sect. 3, and some empirical results. Also, for a fixed N we plot a single output instead of the expected value estimator. Therefore the error fluctuations are visible. For completeness, we also show the asymptotes corresponding to the theoretical errors from Theorems 2.1 and 3.1, and the upper bound from Theorem 4.1. The scale is logarithmic, $$-\log _{10}(\textrm{error})$$ versus $$\log _{10}N.$$

The results for $$r=2$$ are presented in Fig. 1.

As it can be observed, both adaptive algorithms significantly outperform the nonadaptive MC; however, the right asymptotic behaviour of the first adaptive algorithm is visible only for large N.

Similar conclusions can be inferred from validation performed for $$r=4,$$ with all other parameters unchanged. We add that the results for N larger than $$10^{4.8}$$ are not illustrative any more, since the process is disturbed by a serious reduction of significant digits when calculating divided differences in the partition part (Fig. 2).

Notably, both adaptive algorithms attain their asymptotic errors, but this is not the case for nonadaptive MC for which the output is not stable. Initially, the first adaptive algorithm does not leverage additional sampling since for all intervals $$I_i$$ we have $$N_i/(2r+1)< 1.$$ The Monte Carlo adjustments are visible only for $$N \ge 10^3$$ and the error tends to the theoretical asymptote.

In conclusion, the numerical experiments confirm our theoretical findings and, in particular, superiority of the second adaptive algorithm $${\overline{M}}^{\,**}_{N,r}.$$

## 6 Automatic Monte Carlo integration

We now use the results of Sect. 4 for automatic Monte Carlo integration. The goal is to construct an algorithm that for given $$\varepsilon >0$$ and $$0<\delta <1$$ returns an $$\varepsilon$$-approximation to the integral Sf with probability at least $$1-\delta ,$$ asymptotically as $$\varepsilon \rightarrow 0^+.$$ To that end, we shall use the approximation $${\overline{M}}_{N,r}^{\,**}f$$ with N determined automatically depending on $$\varepsilon$$ and $$\delta .$$

Let $$X_i$$ for $$1\le i\le n$$ be independent copies of the random variable

\begin{aligned} X=S(f-L_{m,r}f)-\frac{(f-L_{m,r}f)(t)}{\rho (t)},\qquad t\sim \mu _{\rho }, \end{aligned}

where $$L_{m,r},$$ n and $$\rho$$ are as in $${\overline{M}}_{N,r}^{\,**}f.$$ Then $${\mathbb {E}}(X)=0$$ and

\begin{aligned} Sf-{\overline{M}}_{N,r}^{\,**}f=\frac{X_1+X_2+\cdots +X_n}{n}. \end{aligned}

By Hoeffding’s inequality [5] we have

\begin{aligned} \textrm{Prob}\left( \big |Sf-{\overline{M}}_{N,r}^{\,**}f\big |>\varepsilon \right) \,\le \, 2\,\exp \left( \frac{-\varepsilon ^2n}{2\,B_m^2}\right) , \end{aligned}

where $$B_m=\max _{a\le t\le b}|X(t)|.$$ Hence we fail with probability at most $$\delta$$ if

\begin{aligned} \frac{\varepsilon ^2n}{2B_m^2}\,\ge \,\ln \frac{2}{\delta }. \end{aligned}
(6.1)

Now we estimate $$B_m.$$ Let $$\lambda =\Vert P\Vert _{L^\infty (0,1)}=\max _{0\le t\le 1}|P(t)|,$$ and

\begin{aligned} {\mathscr {L}}_r(f)=\bigg (\int _a^b\big |f^{(r)}_\Delta (x)\big |^{1/(r+1)}\mathrm dx\bigg )^{r+1}, \end{aligned}

where $$\Delta =0$$ if $$f^{(r)}>0$$ or $$f^{(r)}<0,$$ and $$\Delta >0$$ otherwise. Let $$A=(A_1,A_2,\ldots ,A_m)$$ with

\begin{aligned} A_i=h_i^{r+1}\max _{x\in I_i}|f^{(r)}(x)|,\quad 1\le i\le m, \end{aligned}

where, as before, $$\{I_i\}_{i=1}^m,$$ is the partition used by $${\overline{M}}_{N,r}^{\,**}f$$ and $$h_i$$ is the length of $$I_i.$$ Since $$\Vert A\Vert _\frac{1}{r+1}=\big (\sum _{i=1}^mA_i^{1/(r+1)}\big )^{r+1}\lessapprox {\mathscr {L}}_r(f),\;$$ for $$x\in I_i$$ we have

\begin{aligned} \frac{\big |f(x)-L_{m,r}f(x)\big |}{\rho (x)}\,\le \,\frac{\lambda }{r!}\,m\,A_i\,\lessapprox \, \frac{\lambda }{r!}\,\bigg (\frac{m\,\Vert A\Vert _\infty }{\Vert A\Vert _\frac{1}{r+1}}\bigg ){\mathscr {L}}_r(f) \,\lessapprox \,2^{r+1}\frac{\lambda }{r!}\,{\mathscr {L}}_r(f)\,m^{-r}. \end{aligned}

We have the same upper bound for $$S(f-L_{m,r}f)$$ since by mean-value theorem

\begin{aligned} S(f-L_{m,r}f)=\int _a^b\frac{(f-L_{m,r}f)(x)}{\rho (x)}\,\rho (x)\,\mathrm dx =\frac{(f-L_{m,r}f)(\xi )}{\rho (\xi )},\qquad \xi \in [a,b]. \end{aligned}

Hence

\begin{aligned} B_m\,\lessapprox \,2^{r+2}\frac{\lambda }{r!}\,{\mathscr {L}}_r(f)\,m^{-r}. \end{aligned}

Using the above inequality and the fact that $$\sqrt{n}\,m^r\approx N^{r+1/2}/(c_rr!)$$ with $$c_r$$ given by (2.7), we get

\begin{aligned} \frac{\varepsilon ^2n}{2B_m^2}\gtrapprox \left( \frac{\varepsilon \,N^{r+1/2}}{{\hat{c}}_r\,{\mathscr {L}}_r(f)}\right) ^{\!\!2},\quad \text{ where }\quad {\hat{c}}_r=2^{r+5/2}\lambda c_r. \end{aligned}

The last inequality and (6.1) imply that we fail to have error $$\varepsilon$$ with probability at most $$\delta$$ for

\begin{aligned} N\,\gtrapprox \,\left( {\hat{c}}_r\,{\mathscr {L}}_r(f)\,\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\right) ^{\frac{1}{r+1/2}}, \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}
(6.2)

Now the question is how to obtain the random approximation $${\overline{M}}_{N,r}^{\,**}f$$ for N satisfying (6.2).

One possibility is as follows. We first execute the iteration $$\textbf{for}$$ in the algorithm $$\textrm{PARTITION}$$ of Sect. 4 for $$k=2:m,$$ where m satisfies $$\lim _{\varepsilon \rightarrow 0^+}m\,\varepsilon ^{\frac{1}{r+1/2}}=0,$$ e.g.,

\begin{aligned} m=\left\lfloor \bigg (\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\bigg )^\frac{1}{r+1}\right\rfloor . \end{aligned}

Let $$\{I_i\}_{i=1}^m$$ be the obtained partition. Then we replace $${\mathscr {L}}_r(f)$$ in (6.2) by its asymptotic equivalent

\begin{aligned} \widetilde{{\mathscr {L}}}_r(f)=\bigg (\sum _{i=1}^{m}p_f(I_i)^\frac{1}{r+1}\bigg )^{r+1}, \end{aligned}
(6.3)

set

\begin{aligned} N_\varepsilon =\left\lfloor \left( {\hat{c}}_r\,\widetilde{{\mathscr {L}}}_r(f)\,\frac{\sqrt{\ln (2/\delta )}}{\varepsilon }\right) ^{\frac{1}{r+1/2}}\right\rfloor , \end{aligned}
(6.4)

and continue the iteration for $$k=m+1:m_\varepsilon ,$$ where $$m_\varepsilon$$ is the number of subintervals corresponding to $$N_\varepsilon .$$ Finally, we complete the algorithm by $$n_\varepsilon$$ random samples.

Denote the final randomized approximation by $${\mathscr {A}}_{\varepsilon ,\delta }f.$$ Then we have $${\mathscr {A}}_{\varepsilon ,\delta }f={\overline{M}}_{N_\varepsilon ,r}^{\,**}f$$ and

\begin{aligned} \textrm{Prob}\big (\,\big |Sf-{\mathscr {A}}_{\varepsilon ,\delta }f|>\varepsilon \big )\,\lessapprox \,\delta , \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}

A disadvantage of the above algorithm is that it uses a priority queue and therefore its total running time is proportional to $$N\log N.$$ It turns out that by using recursion the running time can be reduced to N.

A crucial component of the algorithm with the running time proportional to N is the following recursive procedure, in which $${\mathscr {S}}$$ is a set of intervals.

$$\textbf{procedure}\;\textrm{AUTO}\,(f,a,b,e)$$

$$\textbf{if}\;p_f([a,b])\le e$$

$$\quad \textrm{insert}({\mathscr {S}},[a,b]);$$

$$\textbf{else}$$

$$\quad c:=(a+b)/2;$$

$$\quad \textrm{AUTO}(f,a,c,e);$$

$$\quad \textrm{AUTO}(f,c,b,e);$$

$$\textbf{endif}$$

Similarly to $${\mathscr {A}}_{\varepsilon ,\delta },$$ the algorithm consists of two steps. First $$\textrm{AUTO}$$ is run for $$e=\varepsilon '$$ satisfying $$\varepsilon '\rightarrow 0^+$$ and $$\varepsilon /\varepsilon '\rightarrow 0^+$$ as $$\varepsilon \rightarrow 0^+,$$ e.g.,

\begin{aligned} \varepsilon '=\varepsilon ^\kappa , \quad \text{ where }\quad 0<\kappa <1. \end{aligned}

Then $${\mathscr {L}}_r(f)$$ in (6.2) is replaced by $$\widetilde{{\mathscr {L}}}_r(f)$$ given by (6.3), and $$N_\varepsilon$$ found from (6.4). The recursion is resumed with the target value $$e=\varepsilon '',$$ where

\begin{aligned} \varepsilon ''=\widetilde{{\mathscr {L}}}_r(f)\,m_\varepsilon ^{-(r+1)}. \end{aligned}

The algorithm is complemented by the corresponding $$n_\varepsilon$$ random samples.

Observe that the number $$m''$$ of subintervals in the final partition is asymptotically at least $$m_\varepsilon .$$ Indeed, for any function $$g\in C^{r}([a,b])$$ with $$g^{(r)}(x)=\big |f_\Delta ^{(r)}(x)\big |$$ we have $${\mathscr {L}}_r(g)={\mathscr {L}}_r(f)$$ and

where the first inequality above follows from Proposition 2 of [11]. This implies

\begin{aligned} m''\,\gtrapprox \,\bigg (\frac{\widetilde{{\mathscr {L}}}_r(f)}{\varepsilon ''}\bigg )^\frac{1}{r+1}\approx \,m_\varepsilon , \end{aligned}

as claimed.

Denote the resulting approximation by $${\mathscr {A}}_{\varepsilon ,\delta }^*f.$$ Observe that its running time is proportional to $$N_\varepsilon$$ since recursion can be implemented in linear time.

### Theorem 6.1

We have

\begin{aligned} \textrm{Prob}\big (\,\big |Sf-{\mathscr {A}}_{\varepsilon ,\delta }^*f|>\varepsilon \big )\,\lessapprox \,\delta , \qquad \text{ as }\quad \varepsilon \rightarrow 0^+. \end{aligned}

Now we present outcomes of the second automatic procedure $${\mathscr {A}}_{\varepsilon ,\delta }^*$$ for the test integral

\begin{aligned} \int _0^1\cos \bigg (\dfrac{100\,x}{x+10^{-4}}\bigg )\,\mathrm dx. \end{aligned}
(6.5)

Although the derivatives fluctuate and nullify many times in this case, we take $$\Delta =~0.$$ We confront the outcomes for $$r=2$$ and $$r=4.$$ In each case, we compute the number of breaches (i.e. when the absolute error is greater than $$\varepsilon = 10^{-3}$$) based on $$K=10\,000$$ independent executions of the code (Table 1). We also take $$\varepsilon ' = \varepsilon ^{1/2}.$$ In our testing, we expect the empirical probability of the breach to be less than $$\delta =0.05.$$ For completeness, we also present the maximum error from all executions together with obtained $$N_\varepsilon .$$

Note that in both cases we did not identify any exceptions. The magnitude of the maximum errors indicate a serious overestimation of $$N_\varepsilon ,$$ but the results are satisfactory given the upper bound estimate of Theorem 6.1.