Automatic integration using asymptotically optimal adaptive Simpson quadrature

We present a novel theoretical approach to the analysis of adaptive quadratures and adaptive Simpson quadratures in particular which leads to the construction of a new algorithm for automatic integration. For a given function f∈C4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f\in C^4$$\end{document} with f(4)≥0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f^{(4)}\ge 0$$\end{document} and possible endpoint singularities the algorithm produces an approximation to ∫abf(x)dx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\int _a^bf(x)\,{\mathrm d}x$$\end{document} within a given ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} asymptotically as ε→0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon \rightarrow 0$$\end{document}. Moreover, it is optimal among all adaptive Simpson quadratures, i.e., needs the minimal number n(f,ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n(f,\varepsilon )$$\end{document} of function evaluations to obtain an ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document}-approximation and runs in time proportional to n(f,ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n(f,\varepsilon )$$\end{document}.


Introduction
Consider a numerical approximation of the integral for a function f : [a, b] → R. Ideally we would like to have an automatic routine that for given f and error tolerance ε produces an approximation Q( f ) to I ( f ) such that it uses as few function evaluations as possible and its error This is usually realized with the help of adaption. Recall a general principle. For a given interval two simple quadrature rules are applied, one more accurate than the other. If the difference between them is sufficiently small, the integral in this interval is approximated by the more accurate quadrature. Otherwise the interval is divided into smaller subintervals and the above rule is recursively applied for each of the subintervals. The oldest and probably most known examples of automatic integration are adaptive Simpson quadratures [8][9][10][11], see also [4] for an account on adaptive numerical integration. An unquestionable advantage of adaptive quadratures is that they try to maintain the error on a prescribed level ε and simultaneously adjust the length of the successive subintervals to the underlying function. This often results in a much more efficient final subdivision of [a, b] than the nonadaptive uniform subdivision. For those reasons adaptive quadratures are now frequently used in computational practice, and those using higher order Gauss-Kronrod rules [1,5,15] are standard components of numerical packages and libraries such as MATLAB, NAG or QUADPACK [13]. Nevertheless, to the author's knowledge, there is no satisfactory and rigorous analysis that would explain good behavior of adaptive quadratures in a quantitative way or identify classes of functions for which they are superior to nonadaptive quadratures. This paper is an attempt to partially fill in this gap.
At this point we have to admit that there are theoretical results showing that adaptive quadratures are not better than nonadaptive quadratures. This holds in the worst case setting over convex and symmetric classes of functions. There are also corresponding adaption-does-not-help results in other settings, see, e.g., [12,14,17,18]. On the other hand, if the class is not convex and/or a different from the worst case error criterion is used to compare algorithms then adaption can significantly help, see [2] or [16].
In this paper we present a novel theoretical approach to the analysis of adaptive Simpson quadratures. We want to stress that the restriction to the Simpson rule as a basic component of composite rules is only for simplicity and we could equally well use higher order quadratures. The Simpson rule is a relatively simple quadrature and therefore better enables clear development of our ideas. To be more specific, we analyze the adaptive Simpson quadratures from the point of view of computational complexity. Allowing all possible subdivision strategies our goal is to find an optimal strategy for which the corresponding algorithm returns an ε-approximation to the integral (1) using the minimal number of integrand evaluations or, equivalently, the minimal number of subintervals. The main analysis is asymptotic and done assuming that f is four times continuously differentiable and its 4th derivative is positive.
To reach our goal we first derive formulas for the asymptotic error of adaptive Simpson quadratures. Following [7] we find that the optimal subdivision strategy produces the partition a = x * 0 < · · · < x * m = b such that This partition is practically realized by maintaining the error on successive subintervals on the same level. The optimal error corresponding to the subdivision into m subintervals is then proportional to For comparison, the errors for the standard adaptive (local) and for nonadaptive (using uniform subdivision) quadratures are respectively proportional to L std ( f ) m −4 and Hence the optimal Simpson quadrature is especially effective when L opt ( f ) L std ( f ). An example is are correspondingly about 10 5 , 10 8 , 10 28 .
Even though the optimal strategy is global it can be efficiently harnessed to automatic integration and implemented in time proportional to m. The only serious problem of how to choose the acceptable error ε 1 for subintervals to obtain the final error ε is resolved by splitting the recursive subdivision process into two phases. In the first phase the process is run with the acceptable error set to a 'test' level ε 2 = ε. Then the acceptable error is updated to where m 2 is the number of subintervals obtained from the first phase. In the second phase, the recursive subdivision is continued with the 'target' error ε 1 .
As noted earlier, the main analysis is provided assuming that f ∈ C 4 ([a, b]) and f (4) > 0. It turns out that using additional arguments the obtained results can be extended to functions with f (4) ≥ 0 and/or possible endpoint singularities, i.e., when f (4) (x) goes to +∞ as x → a, b. For such integrals the optimal strategy works perfectly well while the other quadratures may even lose the convergence rate m −4 .
The contents of the paper is as follows. In Sect. 2 we recall the standard (local) Simpson quadrature for automatic integration. In Sect. 3 we derive a formula for the asymptotic error of Simpson quadratures and find the optimal subdivision strategy. In Sect. 4 we show how the optimal strategy can be used to construct an optimal algorithm for automatic integration. The final Sect. 5 is devoted to the extensions of the main results. The paper is enriched with numerical tests where the optimal adaptive quadrature is compared with the standard adaptive and nonadaptive quadratures.
We use the following asymptotic notation. For two positive functions of m, we write A corresponding notation applies for functions of ε as ε → 0.

The standard adaptive Simpson quadrature
In its basic formulation the local adaptive Simpson quadrature for automatic integration, which will be called standard, can be written recursively as follows. Let Simpson(u, v, f ) be the procedure returning the value of the simple three-point Simpson rule on [u, v] for the function f , and let ε > 0 be the error demand.
A justification of STD that can be found in textbooks, e.g., [3,6], is as follows. Denote by S 1 (u, v; f ) the three-point Simpson rule, and by S 2 (u, v; f ) the composite Simpson rule that is based on subdivision of [u, v] into two equal subintervals, We also denote is small enough so that f (4) is 'almost' a constant, f (4) ≈ C and C = 0, then Now let a = x 0 < · · · < x m = b be the final subdivision produced by STD and be the result returned by STD. Then, provided the estimate (2) holds for any This reasoning has a serious defect; namely, the approximate equality (2) 2 we have I ( f ) > 0 but STD returns zero independently of how small ε is. Of course, concrete implementations of STD can be equipped with additional mechanisms to avoid or at least to reduce the probability of such unwanted occurrences. To radically cut the possibility of premature terminations we assume, in addition to f ∈ C 4 ([a, b]), that the fourth derivative is of constant sign, say, Equivalently, this obviously means that f (4) (x) ≥ c for some c > 0 that depends on f . Assumption (3) assures that the maximum length of the subintervals produced by STD decreases to zero as ε → 0 and the asymptotic equality (2) holds. Indeed, denote by D(u, v; f ) the divided difference of f corresponding to 5 equispaced points Since the termination criterion that is checked in line 3 of STD for the current subinterval [u, v], is equivalent to Our conclusion about applicability of (2) follows from the inequality D(u, v; f ) ≥ c/4! Observe that each splitting of a subinterval [u, v] results in the (asymptotic) decrease of the controlled value in (4) by the factor of 2 4 . Thus the algorithm asymptotically returns the approximation of the integral within ε, as desired. Specifically, we have Remark 1 The inequality (5) explains why numerical tests often show better performance of STD than expected. To avoid this it is suggested to run STD with larger input parameter, say 2ε instead of ε.

Optimizing the process of interval subdivision
The error formula (5) for the standard adaptive Simpson quadrature does not say anything about how the number m of subintervals depends on ε, or what is the actual error after producing m subintervals. We now study this question for different subdivision strategies. In order to be consistent with STD we assume that for a given subdivision The goal is to find optimal strategy, i.e., the one that for any function f ∈ C 4 ([a, b]) satisfying (3) produces a subdivision for which the error of the corresponding Simpson quadrature S m ( f ) is asymptotically minimal (as m → ∞).
We first analyze two particular strategies, nonadaptive and standard adaptive, and then derive the optimal strategy. In what follows, the constant In the nonadaptive strategy, the interval Let the corresponding Simpson quadrature be denoted by S non m . Then as m → ∞.
Observe that for the asymptotic equality (6) to hold we do not need to assume (3). We now analyze the standard adaptive strategy used by STD. To do this, we first need to rewrite STD in an equivalent way, where the input parameter is m instead of ε. We have the following greedy algorithm.
The algorithm starts with the initial subdivision a = x (1) and the midpoint (x (k) We are ready to show the error formula for S std m corresponding to (6).
Proof We fix and divide the interval Call this partition a coarse grid, in contrast to the fine grid produced by S std m . Let Let m be sufficiently large, so that the fine grid contains all the points of the coarse grid. Denote by z i−1 = x i,0 < x i,1 < · · · < x i,m i = z i the points of the fine grid contained in the ith interval of the coarse grid, and h i, j = x i, j − x i, j−1 . Then the error can be bounded from below as Suppose for a moment that for all i, j we have Observe now that any splitting of a subinterval decreases h 4 i, j c i by the factor of 16. Hence To obtain the upper bound, we proceed similarly. Replacing c i with C i and using the equation To complete the proof we notice that both are Riemann sums that converge to the integral b a f (4) Remark 2 From the proof it follows that the constants in the ' ' notation in Theorem 1 are asymptotically between 1/16 and 16. The gap between the upper and lower constants is certainly much overestimated, see also Remark 4.
The two strategies, nonadaptive and standard adaptive, will be used as reference points for comparison with the optimal strategy that we now derive. We first allow all possible subdivisions of [a, b] regardless of the possibility of their practical realization.

Proposition 1 The subdivision determined by points
Proof We first show the lower bound. Let S m be the Simpson quadrature that is based on an arbitrary subdivision. Proceeding as in the beginning of the proof of Theorem 1 we get that for sufficiently large m the error of S m is lower bounded by where m i is the number of subintervals of the fine grid in the ith subinterval of the coarse grid. (We assume without loss of generality that the coarse grid is contained in the fine grid.) Minimizing this with respect to m i such that 2 i=1 m i = m we obtain the optimal values After substituting m i with m * i in the error formula we finally get Since for the optimal m * i we have the lower bound (9) is attained by the subdivision determined by {x * i }.
Now the question is whether the optimal subdivision into m subintervals of Proposition 1 can be practically realized, i.e., using 4m + 1 function evaluations. The answer is positive, at least up to an absolute constant. The corresponding algorithm S opt m runs as S std m with the only difference that in each step it halves the subinterval with the highest value [instead of (7)].
Theorem 2 Let f ∈ C 4 ([a, b]) and f (4) Proof The proof goes as the proof of the upper bound of Theorem 1 with obvious changes related to the facts that now the algorithm tries to balance (11) [instead of (7)], and that

Remark 3
The best constant K of Theorem 2 is certainly much less than 32, see also Remark 4.
We summarize the results of this section. All the three quadratures S non m , S std m , S opt m converge at rate m −4 but the asymptotic constants depend on the integrand f through the multipliers These multipliers indicate how difficult a function is to integrate using a given quadrature. Obviously, by Hölder's inequality we have

Example 1 Consider the integral
In this case L non , L std , L opt rapidly increase as δ decreases, as shown in Table 1.
Numerical computations confirm the theory very well. We tested all the three quadratures (the adaptive quadratures being implemented in m log m running time using heap data structure) and ran them for different values of δ. Specific results are as follows. However, the smaller δ, the more differences between the results. A characteristic behavior of the errors for δ = 10 −2 and δ = 10 −8 is illustrated by Figs. 1  and 2. Observe that in case δ = 10 −8 the nonadaptive quadrature needs more than 10 4 subintervals to reach the right convergence rate m −4 .

Remark 4 It is interesting to see the behavior of
qad ∈ {non, std, opt}.
By (6) we have that lim m→∞ K non m ( f ) = 1. The corresponding limits for the adaptive quadratures are unknown; however, we ran some numerical tests and we never obtained more than 1.5. This would mean, in particular, that S opt m is at most 50 % worse than S * m . Figure 3 shows the behavior of K qad m ( f ) for the integral I δ of Example 1 with δ = 10 −2 .

Automatic integration using optimal subdivision strategy
We want to have an algorithm that automatically computes an integral within a given error tolerance ε. An example of such algorithm is the recursive STD. Recall that the recursive nature of STD allows to implement it in time proportional to the number m of subintervals using stack data structure. However, it does not use the optimal subdivision strategy. On the other hand, the algorithm S opt m uses the optimal strategy, but one does not know in advance how large m should be to have the error |S In addition, the best implementation of S opt m (that uses heap data structure) runs in time proportional to m log m. Thus the question now is whether there exists an algorithm that runs in time linear in m and produces an approximation to the integral within ε using the optimal subdivision strategy.
Since the optimal subdivision is such that the errors on subintervals are roughly equal, the suggestion is that one should run STD with the only difference that it is recursively called with parameter ε instead of ε/2. Denote such modification by OPT. 0 function OPT(a, b, f, ε); 1 begin S1 := Simpson(a, b, f ); 2 S2 := Simpson a, a+b be the result returned by OPT. Analogously to (8) if m is the minimal number of steps after which (11) is satisfied by all subintervals.
It is clear that OPT does not return an ε-approximation when ε is the input parameter. However we are able to estimate a posteriori error. Indeed, let m 1 be the number of subintervals produced by OPT for an ε 1 . Then We need to find ε 1 such that m 1 ε 1 ≤ ε. Since m 1 depends not only on ε 1 but also on L opt ( f ), it seems hopeless to predict ε 1 in advance. Surprisingly this is not true. The idea of the algorithm is as follows. We first run OPT with some ε 2 ≤ ε obtaining a subdivision consisting of m 2 subintervals. Next, using (12) and Theorem 2 we estimate L opt ( f ), and using again Theorem 2 we find the 'right' ε 1 . Finally OPT is resumed with the input ε 1 and with subdivision obtained in the preliminary run of OPT. As we shall see later, this idea can be implemented in time proportional to m 1 .
Concrete calculations are as follows. From the equality where α 2 and K 2 depend on ε 2 , we have We need ε 1 such that for the corresponding m 1 the error of S opt m 1 ( f ) is at most ε, i.e., where α 1 and K 1 depend on ε 1 . Substituting L opt ( f ) with the right hand side of (13) we obtain and solving the inequality α 1 m 1 ε 1 ≤ ε with m 1 given by (14) we get Recall that, asymptotically, α 1 and α 2 are in [1/32, 1] which means that β can be asymptotically bounded from below by 1. Hence, taking we have The choice of ε 1 given by (15) is rather conservative. In practice, we observe that the error of S opt ( f ; ε 1 ) is 'on average' even 6 or more times smaller than ε. Hence we encounter the same phenomenon as for the standard Simpson quadrature, see Remark 1. Yet, in the latter case, the error is usually not so much smaller than ε. As a consequence, for integrands f with L opt ( f ) ∼ = L std ( f ) the approximation S std ( f ; ε) may use less subintervals than S opt ( f ; ε 1 ).
To avoid an excessive work, we propose to run the optimal algorithm with the input B ε 1 instead of ε 1 where, say, (This corresponds to α 1 , α 2 = 1/4.) We stress that such choice of B is based on some heuristics and is not justified by any rigorous arguments.
We end this section by presenting a rather detailed description of the optimal algorithm for automatic integration that runs in time proportional to m 1 . It uses two stacks, Stack1 and Stack2, corresponding to the two phases of the algorithm. The elements of the stacks, elt, elt1, elt2, represent subintervals. Each such element consists of 6 fields containing information about: the endpoints of the subinterval, function values at the endpoints and at the midpoint, and the value of the three-point Simpson quadrature for this subinterval. Such structure enables evaluation of f only once at each sample point. Push and Pop are usual stack commands for inserting and removing elements.   Table 4 Standard and optimal quadratures for 1 Error  Table 5 Standard and optimal quadratures for 1 Error

Extensions: f (4) ≥ 0 and endpoint singularities
We have analyzed adaptive Simpson quadratures assuming that f ∈ C 4 ([a, b]) and f (4) > 0. It turns out that the obtained results hold and automatic integration can be successfully applied also for functions with f (4) ≥ 0 and functions with endpoint singularities. An observed good behavior of adaptive quadratures for such functions cannot be explained using directly previous tools. What we need is a non-asymptotic error bound for S 2 (u, v; f ). Such a bound, together with the corresponding result for S 1 (u, v; f ), is provided by the following lemma.
Proof Given c ∈ (u, v), we have that for any x ∈ [u, v] where T c is a Taylor polynomial for f of degree 3 at c. (The formula is obvious for x ∈ (a, b) and by continuity of f it extends to x = u, v.) Furthermore, integrating (16) with respect to x we get that For the error of S 1 we similarly find that 15 16 (and both bounds are sharp), we get the desired bounds.
For the error of S 2 (u, v; f ) we analogously find that The remaining bound follows from the inequality The Peano kernels 0 , 1 , and 2 are presented in Fig. 4.
In what follows we concentrate on generalizing Theorem 2 about S opt m since the other results (Theorem 1 and Proposition 1) can be treated in a similar fashion.
First we prove that the assumption f (4) > 0 in Theorem 2 can be replaced by Proof Suppose without loss of generality that f (4) is not everywhere zero in [a, b].
We first produce a course grid {z i } 2 i=1 of length (b − a)/2 and remove from it all the points z i (1 ≤ i ≤ 2 − 1) such that f (4)  Denote the successive points of the modified grid by Letẑ i−1 = x i,0 < · · · < x i,k i =ẑ i be the points of the fine grid contained in We now make an important observation that for any i ∈ J 0 with C i > 0 and any Indeed, if this were not satisfied by a subinterval [x i * , j * −1 , x i * , j * ] then its predecessor, whose length is 2h i * , j * and belongs to the i * th subinterval of the coarse grid, would not be subdivided.
Hence, denoting by m 0 the number of subintervals of the fine grid in P 0 , we have This implies m 0 ≤ 2 (15γ ) 1/5 M 0 β −1/5 . Denoting by m 1 the number of subintervals of the fine grid in P 1 , we have which implies m 1 ≥ (15γ ) 1/5 M 1 β 1/5 . Hence m 0 /m 1 ≤ 2M 0 /M 1 and this bound is independent of m. However it depends on . Taking large enough we can make m 0 /m 1 arbitrarily small. From Lemma 1 it follows that the integration error in P 0 is upper bounded by m 0 β. Since f (4) is positive in P 1 , the error in P 1 is asymptotically (as m → ∞) lower bounded by m 1 β/(15 · 32). Hence for large enough the error in P 0 is arbitrarily small compared to that in P 1 . In addition, the error in P 1 follows the upper bound of Theorem 2. The proof is complete.
We now pass to functions with endpoint singularities. To fix the setting, we assume that f is continuous in the closed interval [a, b] and f ∈ C 4 ([a 1 , b]) for all a < a 1 < b. Moreover, lim x→a f (4) (x) = +∞, and this divergence is asymptotically monotonic, i.e., there is δ > 0 such that As before, we prove that for such functions the upper error bound for S opt m in Theorem 2 is still valid.
Proof First, we observe that the difference S 1 (a, a +h; f )− S 2 (a, a +h; f ) converges to zero faster than h. Indeed, in view of (18) we have This assures that the partition is denser and denser in the whole [a, b] and the integration error goes to zero.
Second, we have that L opt ( f ) < ∞. Indeed, by Hölder's inequality b a ( f (4) , which is finite due to (16). Now, let be such that (b − a)2 − ≤ δ, and let {z i } k i=−∞ with k = 2 − 1 be the (infinite) coarse grid defined as Denote, as before, Let m be sufficiently large so that the fine grid produced by S opt m contains all the points z i for i ≥ 0. Moreover, we can assume that each subinterval [z i−1 , z i ] with i ≥ 1 has been subdivided at least once. Let [a, z −s ] be the first subinterval of the fine grid.
Let us further denote P 0 = [a, z 0 ] and P 1 = [z 0 , b]. Then P 0 = P 0,0 ∪ P 0,1 where P 0,0 consists of [0, z −s ] and all subintervals of the course grid that have not been subdivided by S opt m . Let m 0,0 , m 0,1 , m 1 be the numbers of subintervals of the fine grid in P 0,0 , P 0,1 , P 1 , respectively.
Define β as in (20). In view of (24), the distance (z −s − a) decreases slower than β as m → ∞, and therefore m 0,0 is at most proportional to log 2 (1/β). Since (21) holds for the subintervals in P 0,1 , the number m 0,1 can be estimated as m 0 in (22) with where the last inequality follows from monotonicity of f (4) . Since m 1 can be estimated as in (23) we obtain, analogously to the previous proof, that the number of subintervals in P 1 and the error in P 1 dominate the scene. The proof is complete.
We stress that for continuous functions with endpoint singularities we always have L opt ( f ) < ∞, which is not true for L std ( f ). An example is provided by 3 3! f (4) (t) dt, 0 ≤ t ≤ 1, with f (4) (x) = (t ln t) −4 . Indeed, since f (0) = 1 0 (3! t ln 4 t) −1 dt < ∞, the function is well defined and L opt ( f ) < ∞, but For such functions, the subdivision process of S std m will not collapse [which follows from (24)] and the error will converge to zero; however, the convergence rate m −4 will be lost. On the other hand, if L std ( f ) < ∞ then the error bounds of Theorem 1 hold true.  f (x) = 0, −1/2 ≤ x ≤ 0, x −1/2 /2, 0 < x ≤ 1.
In this case L opt ( f ) < ∞ but L std ( f ) = ∞. Figure 7 shows that S opt m enjoys the 'right' convergence m −4 but S std m completely fails. This is because the critical value does not converge faster than h; the algorithm keeps dividing the subinterval containing 0. As a result, the standard adaptive algorithm is asymptotically even worse than the nonadaptive algorithm. Equally striking is the difference between OPTIMAL and STD. While OPTIMAL works perfectly well, see Table 6, STD will never reach the stopping criterion for ε ≤ 10 −3 , and will loop forever.
Unfortunately, this example is misleading. The very good behavior of S opt m is a consequence of our "lack of bad luck" rather than a rule. Indeed, it is enough to change the value of f in [−1/2, 0] from 0 to 7/3 to see that then S 1 (a, (a +b)/2; f )−