## Abstract

We present a novel theoretical approach to the analysis of adaptive quadratures and adaptive Simpson quadratures in particular which leads to the construction of a new algorithm for automatic integration. For a given function \(f\in C^4\) with \(f^{(4)}\ge 0\) and possible endpoint singularities the algorithm produces an approximation to \(\int _a^bf(x)\,{\mathrm d}x\) within a given \(\varepsilon \) asymptotically as \(\varepsilon \rightarrow 0\). Moreover, it is optimal among all adaptive Simpson quadratures, i.e., needs the minimal number \(n(f,\varepsilon )\) of function evaluations to obtain an \(\varepsilon \)-approximation and runs in time proportional to \(n(f,\varepsilon )\).

### Similar content being viewed by others

## 1 Introduction

Consider a numerical approximation of the integral

for a function \(f:[a,b]\rightarrow {\mathbb R}\). Ideally we would like to have an automatic routine that for given \(f\) and error tolerance \(\varepsilon \) produces an approximation \(Q(f)\) to \(I(f)\) such that it uses as few function evaluations as possible and its error

This is usually realized with the help of *adaption*. Recall a general principle. For a given interval two simple quadrature rules are applied, one more accurate than the other. If the difference between them is sufficiently small, the integral in this interval is approximated by the more accurate quadrature. Otherwise the interval is divided into smaller subintervals and the above rule is recursively applied for each of the subintervals. The oldest and probably most known examples of automatic integration are adaptive Simpson quadratures [8–11], see also [4] for an account on adaptive numerical integration.

An unquestionable advantage of adaptive quadratures is that they try to maintain the error on a prescribed level \(\varepsilon \) and simultaneously adjust the length of the successive subintervals to the underlying function. This often results in a much more efficient final subdivision of \([a,b]\) than the nonadaptive uniform subdivision. For those reasons adaptive quadratures are now frequently used in computational practice, and those using higher order *Gauss-Kronrod* rules [1, 5, 15] are standard components of numerical packages and libraries such as MATLAB, NAG or QUADPACK [13]. Nevertheless, to the author’s knowledge, there is no satisfactory and rigorous analysis that would explain good behavior of adaptive quadratures in a quantitative way or identify classes of functions for which they are superior to nonadaptive quadratures. This paper is an attempt to partially fill in this gap.

At this point we have to admit that there are theoretical results showing that adaptive quadratures are not better than nonadaptive quadratures. This holds in the *worst case setting* over convex and symmetric classes of functions. There are also corresponding *adaption-does-not-help* results in other settings, see, e.g., [12, 14, 17, 18]. On the other hand, if the class is not convex and/or a different from the worst case error criterion is used to compare algorithms then adaption can significantly help, see [2] or [16].

In this paper we present a novel theoretical approach to the analysis of adaptive Simpson quadratures. We want to stress that the restriction to the Simpson rule as a basic component of composite rules is only for simplicity and we could equally well use higher order quadratures. The Simpson rule is a relatively simple quadrature and therefore better enables clear development of our ideas. To be more specific, we analyze the adaptive Simpson quadratures from the point of view of computational complexity. Allowing *all* possible subdivision strategies our goal is to find an *optimal* strategy for which the corresponding algorithm returns an \(\varepsilon \)-approximation to the integral (1) using the minimal number of integrand evaluations or, equivalently, the minimal number of subintervals. The main analysis is *asymptotic* and done assuming that \(f\) is four times continuously differentiable and its \(4\)th derivative is positive.

To reach our goal we first derive formulas for the asymptotic error of adaptive Simpson quadratures. Following [7] we find that the optimal subdivision strategy produces the partition \(a=x_0^*<\cdots <x_m^*=b\) such that

This partition is practically realized by maintaining the error on successive subintervals on the same level. The optimal error corresponding to the subdivision into \(m\) subintervals is then proportional to \(L^{\mathrm{opt}}(f)\,m^{-4}\) where

For comparison, the errors for the standard adaptive (local) and for nonadaptive (using uniform subdivision) quadratures are respectively proportional to \(L^{\mathrm{std}}(f)\,m^{-4}\) and \(L^{\mathrm{non}}(f)\,m^{-4}\) where

Obviously, \(L^{\mathrm{opt}}(f)\le L^{\mathrm{std}}(f)\le L^{\mathrm{non}}(f)\). Hence the optimal Simpson quadrature is especially effective when \(L^{\mathrm{opt}}(f)\ll L^{\mathrm{std}}(f)\). An example is \(\int _\delta ^1 x^{-1/2}\,{\mathrm d}x\) with ‘small’ \(\delta \). If \(\delta =10^{-8}\) then \(L^{\mathrm{opt}}(f)\), \(L^{\mathrm{std}}(f)\), \(L^{\mathrm{non}}(f)\) are correspondingly about \(10^5\), \(10^8\), \(10^{28}\).

Even though the optimal strategy is global it can be efficiently harnessed to automatic integration and implemented in time proportional to \(m\). The only serious problem of how to choose the acceptable error \(\varepsilon _1\) for subintervals to obtain the final error \(\varepsilon \) is resolved by splitting the recursive subdivision process into two phases. In the first phase the process is run with the acceptable error set to a ‘test’ level \(\varepsilon _2=\varepsilon \). Then the acceptable error is updated to

where \(m_2\) is the number of subintervals obtained from the first phase. In the second phase, the recursive subdivision is continued with the ‘target’ error \(\varepsilon _1\).

As noted earlier, the main analysis is provided assuming that \(f\in C^4([a,b])\) and \(f^{(4)}>0\). It turns out that using additional arguments the obtained results can be extended to functions with \(f^{(4)}\ge 0\) and/or possible endpoint singularities, i.e., when \(f^{(4)}(x)\) goes to \(+\infty \) as \(x\rightarrow a,b\). For such integrals the optimal strategy works perfectly well while the other quadratures may even lose the convergence rate \(m^{-4}\).

The contents of the paper is as follows. In Sect. 2 we recall the standard (local) Simpson quadrature for automatic integration. In Sect. 3 we derive a formula for the asymptotic error of Simpson quadratures and find the optimal subdivision strategy. In Sect. 4 we show how the optimal strategy can be used to construct an optimal algorithm for automatic integration. The final Sect. 5 is devoted to the extensions of the main results. The paper is enriched with numerical tests where the optimal adaptive quadrature is compared with the standard adaptive and nonadaptive quadratures.

We use the following asymptotic notation. For two positive functions of \(m\), we write

A corresponding notation applies for functions of \(\varepsilon \) as \(\varepsilon \rightarrow 0\).

## 2 The standard adaptive Simpson quadrature

In its basic formulation the local adaptive Simpson quadrature for automatic integration, which will be called *standard*, can be written recursively as follows. Let Simpson
\((u,v,f)\) be the procedure returning the value of the simple three-point Simpson rule on \([u,v]\) for the function \(f\), and let \(\varepsilon >0\) be the error demand.

A justification of STD that can be found in textbooks, e.g., [3, 6], is as follows. Denote by \(S_1(u,v;f)\) the three-point Simpson rule,

and by \(S_2(u,v;f)\) the composite Simpson rule that is based on subdivision of \([u,v]\) into two equal subintervals,

We also denote \(I(u,v;f)=\int _u^v f(x)\,{\mathrm d}x\). Suppose that

If the interval \([u,v]\subseteq [a,b]\) is small enough so that \(f^{(4)}\) is ‘almost’ a constant, \(f^{(4)}\approx C\) and \(C\ne 0\), then

Now let \(a=x_0<\cdots <x_m=b\) be the final subdivision produced by STD and

be the result returned by STD. Then, provided the estimate (2) holds for any \([x_{i-1},x_i]\), we have

This reasoning has a serious defect; namely, the approximate equality (2) can be applied only when the interval \([u,v]\) is sufficiently small. Hence STD can terminate too early and return a completely false result. In an extreme case of \([a,b]=[0,4]\) and \(f(x)=\prod _{i=0}^4 (x-i)^2\) we have \(I(f)>0\) but STD returns zero independently of how small \(\varepsilon \) is. Of course, concrete implementations of STD can be equipped with additional mechanisms to avoid or at least to reduce the probability of such unwanted occurrences. To radically cut the possibility of premature terminations we assume, in addition to \(f\in C^4([a,b])\), that the fourth derivative is of constant sign, say,

Equivalently, this obviously means that \(f^{(4)}(x)\ge c\) for some \(c>0\) that depends on \(f\). Assumption (3) assures that the maximum length of the subintervals produced by STD decreases to zero as \(\varepsilon \rightarrow 0\) and the asymptotic equality (2) holds. Indeed, denote by \(D(u,v;f)\) the divided difference of \(f\) corresponding to \(5\) equispaced points \(z_j=u+jh/4\), \(0\le j\le 4\), where \(h=v-u\), i.e.,

Since

the termination criterion

that is checked in line 3 of STD for the current subinterval \([u,v]\), is equivalent to

Our conclusion about applicability of (2) follows from the inequality \(D(u,v;f)\ge c/4!\)

Observe that each splitting of a subinterval \([u,v]\) results in the (asymptotic) decrease of the controlled value in (4) by the factor of \(2^4\). Thus the algorithm asymptotically returns the approximation of the integral within \(\varepsilon \), as desired. Specifically, we have

###
*Remark 1*

The inequality (5) explains why numerical tests often show better performance of STD than expected. To avoid this it is suggested to run STD with larger input parameter, say \(2\varepsilon \) instead of \(\varepsilon \).

## 3 Optimizing the process of interval subdivision

The error formula (5) for the standard adaptive Simpson quadrature does not say anything about how the number \(m\) of subintervals depends on \(\varepsilon \), or what is the actual error after producing \(m\) subintervals. We now study this question for different subdivision strategies. In order to be consistent with STD we assume that for a given subdivision \(a=x_0<x_1<\cdots <x_m=b\) we apply \(S_2(x_{i-1},x_i;f)\) for each of the subintervals \([x_{i-1},x_i]\), so that the final approximation

uses \(n=4m+1\) function evaluations.

The goal is to find *optimal* strategy, i.e., the one that for any function \(f\in C^4([a,b])\) satisfying (3) produces a subdivision for which the error of the corresponding Simpson quadrature \(S_m(f)\) is asymptotically minimal (as \(m\rightarrow \infty \)).

We first analyze two particular strategies, nonadaptive and standard adaptive, and then derive the optimal strategy. In what follows, the constant

In the nonadaptive strategy, the interval \([a,b]\) is divided into \(m\) equal subintervals \([x_{i-1},x_i]\) with \(x_i=a+ih\), \(h=(b-a)/m\). Let the corresponding Simpson quadrature be denoted by \(S_m^{\mathrm{non}}\). Then

as \(m\rightarrow \infty \).

Observe that for the asymptotic equality (6) to hold we do not need to assume (3).

We now analyze the standard adaptive strategy used by STD. To do this, we first need to rewrite STD in an equivalent way, where the input parameter is \(m\) instead of \(\varepsilon \). We have the following *greedy* algorithm.

The algorithm starts with the initial subdivision \(a=x_0^{(1)}<x_1^{(1)}=b\). In the \((k+1)\)st step, from the current subdivision \(a=x_0^{(k)}<\cdots <x_k^{(k)}=b\) a subinterval \([x_{i^*-1}^{(k)},x_{i^*}^{(k)}]\) is selected with the highest value

and the midpoint \((x_{i^*-1}^{(k)}+x_{i^*}^{(k)})/2\) is added to the subdivision.

Denote by \(S_m^{\mathrm{std}}(f)\) the result returned by the corresponding Simpson quadrature when applied to \(m\) subintervals. Then, in view of (4), the values \(S_m^{\mathrm{std}}(f)\) and \(S^{\mathrm{std}}(f;\varepsilon )\) are related as follows. Let \(m=m(\varepsilon )\) be the minimal number of steps after which (4) is satisfied by each of the subintervals \([x^{(m)}_{i-1},x^{(m)}_i]\). Then

We are ready to show the error formula for \(S_m^{\mathrm{std}}\) corresponding to (6).

###
**Theorem 1**

Let \(f\in C^4([a,b])\) and \(f^{(4)}(x)>0\) for all \(x\in [a,b]\). Then

###
*Proof*

We fix \(\ell \) and divide the interval \([a,b]\) into \(2^\ell \) equal subintervals \([z_{i-1},z_i]\) of length \((b-a)/2^\ell \). Call this partition a coarse grid, in contrast to the fine grid produced by \(S_m^\mathrm{std}\). Let

Let \(m\) be sufficiently large, so that the fine grid contains all the points of the coarse grid. Denote by \(z_{i-1}=x_{i,0}<x_{i,1}<\cdots <x_{i,m_i}=z_i\) the points of the fine grid contained in the \(i\)th interval of the coarse grid, and \(h_{i,j}=x_{i,j}-x_{i,j-1}\). Then the error can be bounded from below as

Suppose for a moment that for all \(i,j\) we have \(h_{i,j}^4c_i=A\) for some \(A\). Then \((b-a)/2^\ell =\sum _{j=1}^{m_i}h_{i,j}=m_i(A/c_i)^{1/4}\). Using \(\sum _{i=1}^{2^\ell }m_i=m\) we get

Observe now that any splitting of a subinterval decreases \(h_{i,j}^4c_i\) by the factor of \(16\). Hence

and consequently \(h_{i,j}^4c_i\ge A/16\) for all \(i,j\). Thus

To obtain the upper bound, we proceed similarly. Replacing \(c_i\) with \(C_i\) and using the equation \(h_{i,j}^4C_i\le 16 A\) we get that

To complete the proof we notice that both

are Riemann sums that converge to the integral \(\int _a^b\left( f^{(4)}(x)\right) ^{1/4}\!\,{\mathrm d}x\) as \(\ell \rightarrow \infty \). \(\square \)

###
*Remark 2*

From the proof it follows that the constants in the ‘\(\asymp \)’ notation in Theorem 1 are asymptotically between \(1/16\) and \(16\). The gap between the upper and lower constants is certainly much overestimated, see also Remark 4.

The two strategies, nonadaptive and standard adaptive, will be used as reference points for comparison with the optimal strategy that we now derive. We first allow *all* possible subdivisions of \([a,b]\) regardless of the possibility of their practical realization.

###
**Proposition 1**

The subdivision determined by points

where \(x_i^*\) satisfy

is optimal. For the corresponding quadrature \(S_m^*\) we have

###
*Proof*

We first show the lower bound. Let \(S_m\) be the Simpson quadrature that is based on an arbitrary subdivision. Proceeding as in the beginning of the proof of Theorem 1 we get that for sufficiently large \(m\) the error of \(S_m\) is lower bounded by

where \(m_i\) is the number of subintervals of the fine grid in the *i*th subinterval of the coarse grid. (We assume without loss of generality that the coarse grid is contained in the fine grid.) Minimizing this with respect to \(m_i\) such that \(\sum _{i=1}^{2^\ell }m_i=m\) we obtain the optimal values

After substituting \(m_i\) with \(m_i^*\) in the error formula we finally get

Since for the optimal \(m_i^*\) we have

the lower bound (9) is attained by the subdivision determined by \(\{x_i^*\}\). \(\square \)

Now the question is whether the optimal subdivision into \(m\) subintervals of Proposition 1 can be practically realized, i.e., using \(4m+1\) function evaluations. The answer is positive, at least up to an absolute constant. The corresponding algorithm \(S_m^\mathrm{opt}\) runs as \(S_m^\mathrm{std}\) with the only difference that in each step it halves the subinterval with the highest value

[instead of (7)].

###
**Theorem 2**

Let \(f\in C^4([a,b])\) and \(f^{(4)}(x)>0\) for all \(x\in [a,b]\). Then

where \(K\le 32\).

###
*Proof*

The proof goes as the proof of the upper bound of Theorem 1 with obvious changes related to the facts that now the algorithm tries to balance (11) [instead of (7)], and that

\(\square \)

###
*Remark 3*

The best constant \(K\) of Theorem 2 is certainly much less than 32, see also Remark 4.

We summarize the results of this section. All the three quadratures \(S_m^{\mathrm{non}}\), \(S_m^{\mathrm{std}}\), \(S_m^{\mathrm{opt}}\) converge at rate \(m^{-4}\) but the asymptotic constants depend on the integrand \(f\) through the multipliers

These multipliers indicate how difficult a function is to integrate using a given quadrature. Obviously, by Hölder’s inequality we have

###
*Example 1*

Consider the integral

In this case \(L^{\mathrm{non}}\), \(L^{\mathrm{std}}\), \(L^{\mathrm{opt}}\) rapidly increase as \(\delta \) decreases, as shown in Table 1.

Numerical computations confirm the theory very well. We tested all the three quadratures (the adaptive quadratures being implemented in \(m\log m\) running time using *heap* data structure) and ran them for different values of \(\delta \). Specific results are as follows.

For \(\delta =0.5\) the quadratures \(S_m^{\mathrm{non}}\), \(S_m^{\mathrm{std}}\), and \(S_m^{\mathrm{opt}}\) give almost identical results independently of \(m\). For instance, for \(m=10^2\) the errors are respectively \(1.31\times 10^{-13}\), \(1.46\times 10^{-13}\), \(1.46\times 10^{-13}\), and for \(m=10^3\) we have \(1.28\times 10^{-17}\), \(1.43\times 10^{-17}\), \(1.35\times 10^{-17}\). Note that the smallest error for the nonadaptive quadrature is caused by the fact that \(S_m^{\mathrm{non}}\) has a little better absolute constant in the error formula (6) than the adaptive quadratures.

However, the smaller \(\delta \), the more differences between the results. A characteristic behavior of the errors for \(\delta =10^{-2}\) and \(\delta =10^{-8}\) is illustrated by Figs. 1 and 2. Observe that in case \(\delta =10^{-8}\) the nonadaptive quadrature needs more than \(10^4\) subintervals to reach the right convergence rate \(m^{-4}\).

###
*Remark 4*

It is interesting to see the behavior of

By (6) we have that \(\lim _{m\rightarrow \infty }K_m^{\mathrm{non}}(f)=1\). The corresponding limits for the adaptive quadratures are unknown; however, we ran some numerical tests and we never obtained more than \(1.5\). This would mean, in particular, that \(S_m^{\mathrm{opt}}\) is at most \(50~\%\) worse than \(S_m^*\). Figure 3 shows the behavior of \(K_m^\mathrm{qad}(f)\) for the integral \(I_\delta \) of Example 1 with \(\delta =10^{-2}\).

## 4 Automatic integration using optimal subdivision strategy

We want to have an algorithm that automatically computes an integral within a given error tolerance \(\varepsilon \). An example of such algorithm is the recursive STD. Recall that the recursive nature of STD allows to implement it in time proportional to the number \(m\) of subintervals using *stack* data structure. However, it does not use the optimal subdivision strategy. On the other hand, the algorithm \(S_m^{\mathrm{opt}}\) uses the optimal strategy, but one does not know in advance how large \(m\) should be to have the error \(|S_m^{\mathrm{opt}}(f)-I(f)|\le \varepsilon \). In addition, the best implementation of \(S_m^{\mathrm{opt}}\) (that uses heap data structure) runs in time proportional to \(m\log m\). Thus the question now is whether there exists an algorithm that runs in time linear in \(m\) and produces an approximation to the integral within \(\varepsilon \) using the optimal subdivision strategy.

Since the optimal subdivision is such that the errors on subintervals are roughly equal, the suggestion is that one should run STD with the only difference that it is recursively called with parameter \(\varepsilon \) instead of \(\varepsilon /2\). Denote such modification by OPT.

Let

be the result returned by OPT. Analogously to (8) we have

if \(m\) is the minimal number of steps after which (11) is satisfied by all subintervals.

It is clear that OPT does not return an \(\varepsilon \)-approximation when \(\varepsilon \) is the input parameter. However we are able to estimate *a posteriori* error. Indeed, let \(m_1\) be the number of subintervals produced by OPT for an \(\varepsilon _1\). Then

We need to find \(\varepsilon _1\) such that \(m_1\varepsilon _1\le \varepsilon \). Since \(m_1\) depends not only on \(\varepsilon _1\) but also on \(L^{\mathrm{opt}}(f)\), it seems hopeless to predict \(\varepsilon _1\) in advance. Surprisingly this is not true.

The idea of the algorithm is as follows. We first run OPT with some \(\varepsilon _2\le \varepsilon \) obtaining a subdivision consisting of \(m_2\) subintervals. Next, using (12) and Theorem 2 we estimate \(L^{\mathrm{opt}}(f)\), and using again Theorem 2 we find the ‘right’ \(\varepsilon _1\). Finally OPT is resumed with the input \(\varepsilon _1\) and with subdivision obtained in the preliminary run of OPT. As we shall see later, this idea can be implemented in time proportional to \(m_1\).

Concrete calculations are as follows. From the equality

where \(\alpha _2\) and \(K_2\) depend on \(\varepsilon _2\), we have

We need \(\varepsilon _1\) such that for the corresponding \(m_1\) the error of \(S^{\mathrm{opt}}_{m_1}(f)\) is at most \(\varepsilon \), i.e.,

where \(\alpha _1\) and \(K_1\) depend on \(\varepsilon _1\). Substituting \(L^{\mathrm{opt}}(f)\) with the right hand side of (13) we obtain

and solving the inequality \(\alpha _1m_1\varepsilon _1\le \varepsilon \) with \(m_1\) given by (14) we get

Recall that, asymptotically, \(\alpha _1\) and \(\alpha _2\) are in \([1/32,1]\) which means that \(\beta \) can be asymptotically bounded from below by \(1\). Hence, taking

we have

The choice of \(\varepsilon _1\) given by (15) is rather conservative. In practice, we observe that the error of \(S^{\mathrm{opt}}(f;\varepsilon _1)\) is ‘on average’ even \(6\) or more times smaller than \(\varepsilon \). Hence we encounter the same phenomenon as for the standard Simpson quadrature, see Remark 1. Yet, in the latter case, the error is usually not so much smaller than \(\varepsilon \). As a consequence, for integrands \(f\) with \(L^{\mathrm{opt}}(f)\cong L^{\mathrm{std}}(f)\) the approximation \(S^{\mathrm{std}}(f;\varepsilon )\) may use less subintervals than \(S^{\mathrm{opt}}(f;\varepsilon _1)\).

To avoid an excessive work, we propose to run the optimal algorithm with the input \(B\,\varepsilon _1\) instead of \(\varepsilon _1\) where, say,

(This corresponds to \(\alpha _1,\alpha _2=1/4\).) We stress that such choice of \(B\) is based on some heuristics and is not justified by any rigorous arguments.

###
*Example 2*

We present test results for the integral \(I_\delta =\int _\delta ^1 x^{-1/2}/2\,{\mathrm d}x\) of Example 1 with \(\delta =10^{-2}\) and \(\delta =10^{-8}\), for the standard and optimal Simpson quadratures. In Tables 2 and 3 the results are given correspondingly for \(S^{\mathrm{std}}(f;\varepsilon )\) and \(S^{\mathrm{opt}}(f;\varepsilon _1)\), while in Tables 4 and 5 for \(S^{\mathrm{std}}(f;2\varepsilon )\) and \(S^{\mathrm{opt}}(f;4\sqrt{2}\varepsilon _1)\).

We end this section by presenting a rather detailed description of the optimal algorithm for automatic integration that runs in time proportional to \(m_1\). It uses two stacks, \(\mathrm {Stack1}\) and \(\mathrm {Stack2}\), corresponding to the two phases of the algorithm. The elements of the stacks, \(\mathrm {elt}\), \(\mathrm {elt1}\), \(\mathrm {elt2}\), represent subintervals. Each such element consists of \(6\) fields containing information about: the endpoints of the subinterval, function values at the endpoints and at the midpoint, and the value of the three-point Simpson quadrature for this subinterval. Such structure enables evaluation of \(f\) only once at each sample point. \(\mathrm {Push}\) and \(\mathrm {Pop}\) are usual stack commands for inserting and removing elements.

## 5 Extensions: \(f^{(4)}\ge 0\) and endpoint singularities

We have analyzed adaptive Simpson quadratures assuming that \(f\in C^4([a,b])\) and \(f^{(4)}>0\). It turns out that the obtained results hold and automatic integration can be successfully applied also for functions with \(f^{(4)}\ge 0\) and functions with endpoint singularities. An observed good behavior of adaptive quadratures for such functions cannot be explained using directly previous tools. What we need is a non-asymptotic error bound for \(S_2(u,v;f)\). Such a bound, together with the corresponding result for \(S_1(u,v;f)\), is provided by the following lemma.

###
**Lemma 1**

Suppose that \(f\in C([u,v])\) and \(f\in C^4([u_1,v_1])\) for all \(u<u_1<v_1<v\). If, in addition, \(f^{(4)}(x)\ge 0\) for all \(x\in (u,v)\), then

and

(with convention that \(0/0=1\)).

###
*Proof*

Given \(c\in (u,v)\), we have that for any \(x\in [u,v]\)

where \(T_c\) is a Taylor polynomial for \(f\) of degree \(3\) at \(c\). (The formula is obvious for \(x\in (a,b)\) and by continuity of \(f\) it extends to \(x=u,v\).) Furthermore, integrating (16) with respect to \(x\) we get that

Using (16) for \(z_j=u+jh/4\), \(0\le j\le 4\), \(h=v-u\), we then obtain

with the Peano kernel \(\psi _0(u,v;t)=h^4\Psi _0((t-u)/h)\) where

For the error of \(S_1\) we similarly find that

where \(\psi _1(u,v;t)=h^4\Psi _1((t-u)/h)\),

Since

(and both bounds are sharp), we get the desired bounds.

For the error of \(S_2(u,v;f)\) we analogously find that

where the kernel \(\psi _2(u,v;t)=h^4\Psi _2((t-u)/h)\),

The remaining bound follows from the inequality

The Peano kernels \(\Psi _0\), \(\Psi _1\), and \(\Psi _2\) are presented in Fig. 4. \(\square \)

In what follows we concentrate on generalizing Theorem 2 about \(S_m^\mathrm{opt}\) since the other results (Theorem 1 and Proposition 1) can be treated in a similar fashion.

First we prove that the assumption \(f^{(4)}>0\) in Theorem 2 can be replaced by

###
*Proof*

Suppose without loss of generality that \(f^{(4)}\) is not everywhere zero in \([a,b]\). We first produce a course grid \(\{z_i\}_{i=1}^{2^\ell }\) of length \((b-a)/2^\ell \) and remove from it all the points \(z_i\) (\(1\le i\le 2^\ell -1\)) such that

Denote the successive points of the modified grid by \(\{\hat{z}_i\}_{i=1}^k\), \(k\le 2^\ell \). Let

From (18) it follows that a subinterval is further subdivided if and only if \(f^{(4)}\not \equiv 0\) in this subinterval. Hence for sufficiently large \(m\) the coarse grid is contained in the fine grid produced by \(S_m^\mathrm{opt}\) and the subintervals \([\hat{z}_{i-1},\hat{z}_i]\) with \(C_i>0\) have been subdivided at least once.

Let \(\hat{z}_{i-1}=x_{i,0}<\cdots <x_{i,k_i}=\hat{z}_i\) be the points of the fine grid contained in \([\hat{z}_{i-1},\hat{z}_i]\), and \(h_{i,j}=x_{i,j}-x_{i,j-1}\). Define

We now make an important observation that for any \(i\in {\mathcal J_0}\) with \(C_i>0\) and any \(1\le j\le k_i\)

Indeed, if this were not satisfied by a subinterval \([x_{i^*,j^*-1},x_{i^*,j^*}]\) then its predecessor, whose length is \(2h_{i^*,j^*}\) and belongs to the \(i^*\)th subinterval of the coarse grid, would not be subdivided.

Hence, denoting by \(m_0\) the number of subintervals of the fine grid in \(P_0\), we have

This implies \(m_0\le 2\,(15\gamma )^{1/5}\,M_0\,\beta ^{-1/5}.\) Denoting by \(m_1\) the number of subintervals of the fine grid in \(P_1\), we have

which implies \(m_1\,\ge \,(15\gamma )^{1/5}\,M_1\,\beta ^{1/5}.\) Hence \(m_0/m_1\le 2M_0/M_1\) and this bound is independent of \(m\). However it depends on \(\ell \). Taking \(\ell \) large enough we can make \(m_0/m_1\) arbitrarily small.

From Lemma 1 it follows that the integration error in \(P_0\) is upper bounded by \(m_0\beta \). Since \(f^{(4)}\) is positive in \(P_1\), the error in \(P_1\) is asymptotically (as \(m\rightarrow \infty \)) lower bounded by \(m_1\beta /(15\cdot 32)\). Hence for \(\ell \) large enough the error in \(P_0\) is arbitrarily small compared to that in \(P_1\). In addition, the error in \(P_1\) follows the upper bound of Theorem 2. The proof is complete. \(\square \)

We now pass to functions with endpoint singularities. To fix the setting, we assume that \(f\) is continuous in the closed interval \([a,b]\) and \(f\in C^4([a_1,b])\) for all \(a<a_1<b\). Moreover,

and this divergence is asymptotically monotonic, i.e., there is \(\delta >0\) such that

As before, we prove that for such functions the upper error bound for \(S_m^\mathrm{opt}\) in Theorem 2 is still valid.

###
*Proof*

First, we observe that the difference \(S_1(a,a+h;f)-S_2(a,a+h;f)\) converges to zero faster than \(h\). Indeed, in view of (18) we have

This assures that the partition is denser and denser in the whole \([a,b]\) and the integration error goes to zero.

Second, we have that \(L^\mathrm{opt}(f)<\infty \). Indeed, by Hölder’s inequality

which is finite due to (16).

Now, let \(\ell \) be such that \((b-a)2^{-\ell }\le \delta \), and let \(\{z_i\}_{i=-\infty }^k\) with \(k=2^\ell -1\) be the (infinite) coarse grid defined as

Denote, as before, \(C_i=\max _{z_{i-1}\le x\le z_i}f^{(4)}(x)\). We obviously have \(C_i=f^{(4)}(z_{i-1})>0\) for all \(i\le 0\). For simplicity, we also assume \(C_i>0\) for \(1\le i\le k\).

Let \(m\) be sufficiently large so that the fine grid produced by \(S_m^\mathrm{opt}\) contains all the points \(z_i\) for \(i\ge 0\). Moreover, we can assume that each subinterval \([z_{i-1},z_i]\) with \(i\ge 1\) has been subdivided at least once. Let \([a,z_{-s}]\) be the first subinterval of the fine grid.

Let us further denote \(P_0=[a,z_0]\) and \(P_1=[z_0,b]\). Then \(P_0=P_{0,0}\cup P_{0,1}\) where \(P_{0,0}\) consists of \([0,z_{-s}]\) and all subintervals of the course grid that have not been subdivided by \(S_m^\mathrm{opt}\). Let \(m_{0,0}\), \(m_{0,1}\), \(m_1\) be the numbers of subintervals of the fine grid in \(P_{0,0}\), \(P_{0,1}\), \(P_1\), respectively.

Define \(\beta \) as in (20). In view of (24), the distance \((z_{-s}-a)\) decreases slower than \(\beta \) as \(m\rightarrow \infty \), and therefore \(m_{0,0}\) is at most proportional to \(\log _2(1/\beta )\). Since (21) holds for the subintervals in \(P_{0,1}\), the number \(m_{0,1}\) can be estimated as \(m_0\) in (22) with

where the last inequality follows from monotonicity of \(f^{(4)}\). Since \(m_1\) can be estimated as in (23) we obtain, analogously to the previous proof, that the number of subintervals in \(P_1\) and the error in \(P_1\) dominate the scene. The proof is complete. \(\square \)

We stress that for continuous functions with endpoint singularities we always have \(L^\mathrm{opt}(f)<\infty \), which is not true for \(L^\mathrm{std}(f)\). An example is provided by

with \(f^{(4)}(x)=(t\ln t)^{-4}.\) Indeed, since \(f(0)=\int _0^1(3!\,t\ln ^4t)^{-1}\,{\mathrm d}t<\infty \), the function is well defined and \(L^\mathrm{opt}(f)<\infty \), but

For such functions, the subdivision process of \(S_m^\mathrm{std}\) will not collapse [which follows from (24)] and the error will converge to zero; however, the convergence rate \(m^{-4}\) will be lost. On the other hand, if \(L^\mathrm{std}(f)<\infty \) then the error bounds of Theorem 1 hold true.

###
*Example 3*

Consider the integral

The integrand is continuous at \(0\) only if \(p\ge 0\). Then both, \(L^\mathrm{opt}(f)\) and \(L^\mathrm{std}(f)\), are finite. However, \(L^\mathrm{non}(f)<\infty \) only if \(p\) is a non-negative integer or \(p>3\). Figures 5 and 6, where the results for \(p=1/2\) and \(p=1/20\) are presented, show that, indeed, the adaptive quadratures \(S_m^\mathrm{std}\) and \(S_m^\mathrm{opt}\) converge as \(m^{-4}\), and \(S_m^\mathrm{non}\) converges at a very poor rate.

We end this paper by showing the importance of continuity of \(f\).

###
*Example 4*

Consider the integral \(\int _a^b f(x)\,{\mathrm d}x\) with \(a=-1/2\), \(b=1\),

In this case \(L^\mathrm{opt}(f)<\infty \) but \(L^\mathrm{std}(f)=\infty \). Figure 7 shows that \(S_m^\mathrm{opt}\) enjoys the ‘right’ convergence \(m^{-4}\) but \(S_m^\mathrm{std}\) completely fails. This is because the critical value

does not converge faster than \(h\); the algorithm keeps dividing the subinterval containing \(0\). As a result, the standard adaptive algorithm is asymptotically even worse than the nonadaptive algorithm.

Equally striking is the difference between OPTIMAL and STD. While OPTIMAL works perfectly well, see Table 6, STD will never reach the stopping criterion for \(\varepsilon \le 10^{-3}\), and will loop forever.

Unfortunately, this example is misleading. The very good behavior of \(S_m^\mathrm{opt}\) is a consequence of our “lack of bad luck” rather than a rule. Indeed, it is enough to change the value of \(f\) in \([-1/2,0]\) from \(0\) to \(7/3\) to see that then \(S_1(a,(a+b)/2;f)-S_2(a,(a+b)/2;f)=0\) although \(\int _a^{(a+b)/2}f(x)\,{\mathrm d}x=19/12>0\). As a consequence, \(\lim _{m\rightarrow \infty }S_m^\mathrm{opt}(f)=13/6\) while the integral equals \(25/12\).

## References

Calvetti, D., Golub, G.H., Gragg, W.B., Reichel, L.: Computation of Gauss-Kronrod quadrature rules. Math. Comput.

**69**, 1035–1052 (2000)Clancy, N., Ding, Y., Hamilton, C., Hickernell, F.J., Zhang, Y.: The cost of deterministic, adaptive, automatic algorithms: cones, not balls. J. Complex.

**30**, 21–45 (2014)Conte, S.D., de Boor, C.: Elementary numerical analysis—an algorithmic approach, 3rd edn. McGraw-Hill, New York (1980)

Davis, P.J., Rabinowitz, P.: Methods of numerical integration, 2nd edn. Academic Press, Orlando (1984)

Gander, W., Gautschi, W.: Adaptive quadrature—revisited. BIT

**40**, 84–101 (2000)Kincaid, D., Cheney, W.: Numerical analysis: mathematics of scientific computing, 3rd edn. AMS, Providence, RI (2002)

Kruk, A.: Is the adaptive Simpson quadrature optimal? Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Master Thesis (in Polish) (2012)

Lyness, J.N.: Notes on the adaptive Simpson quadrature routine. J. Assoc. Comput. Mach.

**16**, 483–495 (1969)Lyness, J.N.: When not to use an automatic quadrature routine? SIAM Rev.

**25**, 63–87 (1983)McKeeman, W.M.: Algorithm 145: adaptive numerical integration by Simpson’s rule. Commun. ACM

**5**, 604 (1962)Malcolm, M.A., Simpson, R.B.: Local versus global strategies for adaptive quadrature. ACM Trans. Math. Softw.

**1**, 129–146 (1975)Novak, E.: On the power of adaption. J. Complex.

**12**, 199–238 (1996)Piessens, R., de Doncker-Kapenga, E., Uberhuber, C.W., Kahaner, D.K.: QUADPACK. A subroutine package for automatic integration. Springer, Berlin (1983)

Plaskota, L.: Noisy Information and computational complexity. Cambridge University Press, Cambridge (1996)

Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, New York (2007)

Plaskota, L., Wasilkowski, G.W.: Adaption allows efficient integration of functions with unknown singularities. Numerische Mathematik

**102**, 123–144 (2005)Traub, J.F., Wasilkowski, G.W., Woźniakowski, H.: Information-based complexity. Academic Press, New York (1988)

Wasilkowski, G.W.: Information of varying cardinality. J. Complex.

**1**, 107–117 (1986)

## Acknowledgments

The author would like to thank Grzegorz Wasilkowski and Henryk Woźniakowski for discussions on the results of this paper, and an anonymous referee for constructive comments. This research was partially supported by the National Science Centre, Poland, based on the decision DEC-2013/09/B/ST1/04275.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

## About this article

### Cite this article

Plaskota, L. Automatic integration using asymptotically optimal adaptive Simpson quadrature.
*Numer. Math.* **131**, 173–198 (2015). https://doi.org/10.1007/s00211-014-0684-3

Received:

Revised:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00211-014-0684-3