Extrapolation quadrature from equispaced samples of functions with jumps

Based on the Euler–Maclaurin formula, the Romberg quadrature method extrapolates trapezoidal values to improve their accuracy when computing the integral of smooth functions from equispaced samples. It has been known at least since an article of Lyness in 1971 that the Euler–Maclaurin formula may be extended to accommodate functions with jumps. In the present work, we develop an extrapolation method, based on this extended formula, for the quadrature of such discontinuous functions. We illustrate the method with numerical examples, using one as well as several sample vectors.


Introduction
Consider the numerical approximation of the definite integral To Claude Brezinski, the founder and editor-in-chief of this journal, on his 80th birthday Jean-Paul Berrut jean-paul.berrut@unifr.ch Manfred R. Trummer trummer@sfu.ca 1 of a smooth function f by sampling it at equispaced points (or nodes) x k := kh, k = 0, . . . , n, h = L/n, f k := f (x k ), and evaluating the composite trapezoidal rule In this work, we abide by the signal processing (Wikipedia) definition of a sample as a value at a point in time or space. A sample vector is a vector of equispaced samples. It is well known that the accuracy of T f (h) depends on the values of the derivatives of odd orders of f at the extremities 0 and L. In the favorable case where all of those coincide at these endpoints, the convergence is faster than every power of h, i.e., spectral. If f , L-periodically extended on both sides of the interval [0, L], is analytic in a horizontal strip containing , the convergence even becomes exponential. A survey [19] about T f (h) has been published in 2014.
In many numerical analysis courses, students learn that the difference between the (composite) trapezoidal rule and the exact integral is given by the Euler-Maclaurin formula [2,13].
Theorem 1 (Euler--Maclaurin formula for the trapezoidal rule) Let f ∈ C 2m+2 [0, L] for some m ≥ 0. Then, for every n ∈ and h := L n , the error of the trapezoidal rule may be written as follows: (2) for some ξ ∈ [0, L], where Here B denotes the th Bernoulli number.
There are at least two ways of deriving T f (h). The customary one is to consider I as the sum of the integrals over the subintervals [x k , x k+1 ] and to replace each of these integrals with the area of the trapezoid with bases f (x k ) and f (x k+1 ) and height h. But one may also see T f (h) (up to a constant depending on the length of the interval) as the exact integral of the trigonometric polynomial of minimal degree which interpolates the function values f (x 0 ) + f (x n ) /2, f (x 1 ), . . ., f (x n−1 ) at the nodes x 0 , . . . , x n−1 [1,2]. That way, formula (2) is a quantitative description of the influence of the jump at 0 ≡ L (mod L) of the L-periodically extended function f (on both sides of [0, L]) on the accuracy of the exact integral T f (h) of that trigonometric polynomial as an approximation of I .
In the present work, we shall address the following problem: we assume that a vector (in Section 7 a few vectors) of equispaced samples of a piecewise smooth function is (are) given, together with the location of its jumps. As usual in quadrature by extrapolation, we shall first assume that the number of subintervals is some power of two (or an odd number times such a power). It turns out that a large number of nodes are required in many cases. We therefore also consider the case where several vectors of samples can be taken.
In Section 2, we recall the Euler-Maclaurin formula for functions with jumps, the basis of the extrapolation method presented in this work. The method is described for the case of a single interior jump in Section 3, and an example is given in Section 4. The next section introduces the general method, of which examples are given in Section 6. Up to that point, we use a single vector of samples; Section 7 extends the method to the case where several vectors of samples can be obtained, which can greatly reduce the total number of function evaluations required to achieve a prescribed level of accuracy, and gives corresponding examples. A few remarks conclude the paper.

The Euler-Maclaurin formula for functions with jumps
According to the circular interpretation mentioned above, the L-periodized f usually has a jump at c 0 := 0 ≡ L (mod L). Nevertheless, when we talk about jumps in the present work, we mean the usual concept, i.e., that f has interior jumps, i.e., jumps at some points c j , 0 < c j < L, j = 1, . . . , J . We assume that these c j are distinct and ordered according to their index, and that f is continuous on every interval (c j , c j +1 ) between two consecutive jumps, and possesses one-sided limits f (c − j ) and f (c + j ) at c j , j = 0, 1, . . . , J − 1. Moreover, the value of f at every jump is changed, if necessary, to the average (f (c j −)+f (c j +))/2 of the two corresponding limits there.
As in [2], we consider the classic Riemann sum generalization of For t = 0, we have R f (h) = T f (h); for t = 1 2 , this is the midpoint rule. Let us recall the following theorem of [2], which will be the basis of our extrapolation procedure. The result was already given by Lyness in [14] (see the discussion in [2, p. 383]). If f (m) is absolutely integrable between every two consecutive c j , then the integration error may be written as follows: with and where P is the Bernoulli polynomial of degree and P its continuous 1-periodic extension.
In the sequel, we shall refer to formula (4) by the acronym EMF. The change of variable y = L(x − a)/(b − a) shows that the polynomial part of R f (h) − I is the same when f is defined on an arbitrary interval [a, b] instead of [0, L]; the latter only makes the formulas and the proofs simpler to read.
In the exceptional case where the values of the derivatives are known on both sides of the jumps, substituting them into (4) easily permits the improvement of the Riemann values (3) (see [2]). One can also correct the endpoint weights in T f (h) to increase its order [8].
The number t j is the relative distance of c j to the node following it. In the rather exceptional case where the values of h are such that t j = 1/2 (or constant) for every h, one can directly apply Richardson-Romberg (see Example 1 of [2], where the only interior jump is at the midpoint of the interval of integration). But this is atypical and, usually, the t j depend on h, so that the factors a of the powers of h in the EMF are not constants as they depend on values of the Bernoulli polynomials at the t j .

Extrapolation via a system of equations
Extrapolation eliminates the first powers of h on the right-hand side of the EMF without knowledge of the a . Assume that a quantity I is approximated by a formula B(h), and that the latter may be written as follows: in which the k are the Lagrange fundamental polynomials The Lebesgue function is the norm of the interpolation operator and indicates the conditioning of the interpolant (see for example [20]).
Proof Obviously, I − I = Q p (0) − C p (0). In view of the linearity and uniqueness of polynomial interpolation, the polynomial part But, according to a well-known result about the perturbation of the polynomial interpolant [3,7], Letting h → 0 in this expression yields (9).
With a good set of interpolation nodes {h k }, such as Chebyshev or Legendre points, one has λ p (h) = O(log p) [6], so that the error becomes O(h p+1 log p). Apart from that of the Chebyshev points of the first kind, we do not know bounds on the Lebesgue function for sets of nodes which accumulate toward a point outside of the interpolation interval, such as 0 here, but our computations with the geometrically decreasing sequence of the first example below demonstrate that its growth at 0 is indeed very slow in comparison with that of 1/h. It remains to compute I = Q p (0). Since it is the value at a single point of the interpolating polynomial Q p , one usually uses Neville's algorithm [17], which is stable in the context of numerical quadrature, where the h-values h k cluster toward the interpolation point 0 and the coefficients a do not depend on h k . However, in our case, they do (through the t j ). General Neville-Aitken type interpolation algorithms going into that direction have been given in the literature. According to the Math-SciNet review of [11], Schneider [16] introduced the first one in a special case, before Mühlbach [15] solved the general problem. Brezinski [4] and Håvie [11] later gave different proofs. These same authors [5,10] also independently derived a general extrapolation algorithm, which Brezinski called the E-algorithm. But none of these articles contains much in terms of numerical experiments demonstrating the stability of the methods, if any; moreover, none seems to have been used to treat functions with interior singularities.
Here we use the (simpler) matrix version (corresponding to the Vandermonde solution to the interpolation problem) and solve the resulting system of linear equations by means of Gaussian elimination.

A first example with one jump
As a test and an example illustrating the method, we have started with a function which numerically has no jump at the extremities of the interval and only one in the interior, with f is plotted for c 1 = 1/ √ 3, β = 2, and η = 35 in Fig. 1. Here and in the subsequent figures, a small circle at a jump c j marks the value (f (c j −) + f (c j +))/2 used in the trapezoidal sum when c j is one of the nodes.
We limit our computations to the case t = 0, i.e., R f (h) = T f (h), the composite trapezoidal rule. As the sequence of h k -values, we have taken the geometric sequence h k = 2 −k , as in classic Richardson extrapolation [12, p. 100].
As a means of comparison, we have first applied Richardson extrapolation to the problem of computing (1) with L = 1, c 1 = 1/ √ 3 (this choice guarantees that the jump at c 1 does not coincide with an integration node for any h) and β = 2. η is introduced so that all derivatives of f numerically vanish at the extremities 0 and 1; hence, from a practical point of view, only one jump is present, namely at c 1 : we chose η = 35. We have used two calls of the MATLAB quad-routine, one on each subinterval [0, c 1 ] and [c 1 , 1], to obtain an accurate approximate value of (1) with which to compare our results: I ≈ 0.180560634293184. One could also use Chebfun with "splitting on." (We note, however, that these algorithms are no competitors to our method, as we assume that f is given by one vector of equispaced samples only.) Our results are summarized in Table 1. The first column contains the sequence of h kvalues, the second the error of the trapezoidal value for h k without extrapolation, and the third the error in the Richardson-extrapolated value (the value at zero of the polynomial of the lowest degree interpolating the trapezoidal values up to h k , computed here with the polyfit routine of MATLAB). The last column shows the error I −I with the extrapolation method presented below. Every row contains the results when the given samples are those corresponding to the value h k in the first column, and the extrapolation makes use of all the trapezoidal values up to this h k .
The trapezoidal values decrease, but not monotonically: this is to be expected, as the relative location of the jump in the interval between the two nodes enclosing it varies with h (see also the values of t 1 (h) in the next tables).
The Richardson values do not improve on the direct trapezoidal ones. This is hardly surprising, as the main part of the Euler-Maclaurin formula is no polynomial in h, because of the t j .
Let us now write down the extrapolation equations of the suggested method. Since there is no jump at the extremities, for all , so that the sums in the a start with j = 1 (t 0 = 0 is present only when there is a jump at the extremities). We denote by where the dependence of t 1 on h is now explicit. We approximate the right-hand side by the pseudo-polynomial with p + 1 ≤ q terms

, with the matrices
(compare with [9, p. 220]) and We merely need the first component of the solution x := I , d Since the first element of the diagonal of H is 1, it suffices to find the first component of the solution of Ay = b (y := Hx), where b is the right-hand side of (12).
If the jump is a quadrature node for every h k , then γ 1 = 0 for all of these and the second column of A vanishes. Then the unknown d (1) 1 must be removed, together with the second column and the last row of A, H becomes diag(1, h 2 0 /2!, h 3 0 /3!, . . . , h p 0 /p!) and the last component of the right-hand side of (12) must be discarded as well.
These equations could probably be solved analytically to obtain Neville-like formulas. As mentioned at the end of Section 3, we solve the system with Gaussian elimination, in which pivoting almost always guarantees a stable method. This is more expensive, but every value of the Neville tableau is computed independently, so that any numerical error arising in one of the entries does not affect the following ones. Our results are in the last column of Table 1.
The values display a pattern that is common to several of our examples. When starting with h 0 = b−a, as we do, quite a large number of subdivisions of the interval are needed to reach the point (here h k = 1/64) where the extrapolated values become better than those obtained with the plain trapezoidal method. But once this stage is attained, our extrapolation scheme rapidly outperforms the latter and we always obtain an accuracy below our tolerance (usually N * 10 −16 , where N + 1 is the size of the sample vector).
The huge value for h = 1/8 is consistent with the large condition number of A: see Section 6.2 for a discussion.
Our function cos(2x) may seem too nice and simple. We also computed with a more challenging function, namely cos(100x), as well as Runge's function centered at c 1 /2, both of which are much more difficult to interpolate with equispaced points: the results are very similar, and the method continues to perform well.

Generalization to several jumps
We now address the general case, where f may have several interior jumps at c 1 , c 2 , . . . , c J , J ≥ 1. Since with the trapezoidal rule the extremities 0 and L are quadrature points, the jump at the endpoints has γ 0 = 0 and formula (4) becomes with d ( ) Here, the t j are as defined in Theorem 2. The interpolating pseudo-polynomial Q p (h) with p + 1 ≤ q terms again is the same expression, in which I is replaced with its approximation I and the terms of the -sum stop with h p . The interpolation condi- . . , (J + 1)p − 1, lead to the system of (J + 1)p equations where b is the right-hand side of (15) and the matrix A is given by with r = (J + 1)p − 1 rows, (17) However, there is a caveat here. Recall that all odd degree Bernoulli polynomials vanish at 0, and that t 0 (z) = 0 (t 0 as defined in Theorem 2) for all z = h 0 /2 k , k ∈ N, in the present case. Therefore, the first column of the mth block A m of A vanishes when m is odd and a "jump" is present at the extremities. (The unknown d 0 is absent as well.) We thus implement the following modification of the process: when the increase of p by one unit leads to an odd number, that first column is erased, and simultaneously one row of the matrix is removed, which means that one less trapezoidal approximation is taken into account in the process (making it, in fact, cheaper). We express this by adding a star to the first column of A m , to indicate that it is absent if p is odd; then r < (J + 1)p − 1 if p exceeds two. H changes as well, as some of its elements are eliminated, but that does not affect the method, as only the first component of x is needed.
A general algorithm, which includes the case where several sample vectors are available, is given in Section 7.
One of the reviewers expressed doubts about our method's efficiency and suggested to replace f over every interval between two jumps by an approximation constructed from the samples there and to sum the exact integrals of these approximations (which amounts to using an open quadrature rule over every subinterval). We have not compared our method to such a scheme. Another comment suggested to use adaptivity, and implement a special procedure near the discontinuities. This is a natural idea, but would ignore our assumption that only equispaced samples are available in our setting.

A first example with the jump at the boundary
Here we want to integrate the function g in (11) between 0 and 1. The exact value is With c 1 = 1/ √ 3 and β = 2, I ≈ 0.9833366718258914; the corresponding g is given in Fig. 2. The jump at the extremities -or "boundary jump" -which appears in the classic Euler-Maclaurin formula of Theorem 1 is now present and the corresponding terms must be taken into account in Formula (4).
The extrapolation method described in the previous section yields Table 2, in the caption of which it is called "first version," to distinguish it from the second version introduced after the table. Its first column again displays the values h k , the second gives 100 * t 1 (h k ), the relative location of the jump with respect to the next node, in percent. The third column shows the errors of the trapezoidal value, the fourth the degree p of the extrapolation using h k , and the last column lists the errors of the extrapolated value computed as the first component of the solution of Ay = b. A missing value in that last column points to the fact that no extrapolated value is computed after the first new trapezoidal value, when two are needed to interpolate a h p -term, i.e., p is even.
The abovementioned elimination of one column and simultaneous noncomputation of a trapezoidal value have the consequence that the values used in the  next extrapolation step do not improve in accuracy as much as for even p. We have therefore repeated the same computations, but by dividing the stepsize h by four when extrapolation is not performed (with corresponding modification of the matrix A, i.e., a multiplication by 1/4 instead of 1/2 of the fraction in front of γ 1 in the second column, and correspondingly in the next columns). The results with this "second version" of the method are given in Table 3. Fewer trapezoidal values are now used for extrapolation for about the same accuracy (11 instead of 14 in this example), which decreases the size of the system of equations.

Another function with one jump
To confirm the above results, we have tried another function with one jump, namely The exact value is 3 −1 f (x)dx = sin(4c 1 ) + sin(4) /4 + cos(2.5c 1 ) − cos(2.5) /2.5. We set c 1 = 1/30; the corresponding function is plotted in Fig. 3 and the results with the first version, i.e., the division of h by two at every computation of a new T f (h), are displayed in Table 4. (From here on, we refrain from giving the value of p, which is just the number of the extrapolation step, i.e., of the row in the last column.) As in Table 1, we encounter a phenomenon that arose in several of our examples: the extrapolated value can be extremely large before h k becomes small enough (or enough trapezoidal values are computed). But this is not terribly important, as we need small enough h k to get improvement over the trapezoidal values.
The results also confirm that bad intermediate extrapolated values do not affect the quality of the subsequent approximations I : the only numbers used in performing the extrapolation are the trapezoidal values. This illustrates our comments about the Neville tableau in Section 4.
We stopped with the last h k given in the table, since the difference between two consecutive extrapolated values was smaller than our tolerance of N * 10 −16 . We then went up to p = 15: the accuracy remained below 5 · 10 −13 , despite condition numbers of A larger than 10 20 .  These huge condition numbers impact the results only for relatively large h k . The condition number is, of course, only an upper bound on the amplification of round-off errors in solving linear equations. Whether actual amplification occurs depends on how the right hand side b and perturbations to it are aligned with the singular vectors of the matrix. In many practical applications, numerical solutions behave much better than predicted by condition numbers.
In our case, because of the jumps, the first components of b behave eratically, and it is not surprising that sometimes the solution of the system suffers gravely from the ill-conditioning. When the abscissae h k of the last T f (h k ) cluster close to the evaluation point 0, these trapezoidal values become much more regular, which diminishes the likelihood of a bad b. For a generalized Richardson method close to ours, Gaussian elimination has been shown in [18, pp. 73-76] to be stable. All the above likely explains the excellent behavior of our scheme for p large enough.
We then solved the same problem, but, as with the former example, by dividing the stepsize h by four when extrapolation is not performed. The results with that second version appear in Table 5.
When h k is such that the jump is close to a node, the trapezoidal rule usually performs worse than with h k−1 . Nevertheless, the extrapolated values are much more regular than in the first version. We have therefore considered only this second version from here on.
What if, after some steps, the jump coincides with a node? In fact, this does not cause a problem: as the first elements of the corresponding column do not vanish, the matrix A remains (mathematically) nonsingular (except for the case where the jump coincides with a node from the onset, but then Romberg may be applied from the start). To demonstrate this, we have repeated the calculations for the last example, but this time with c 1 = 1/32. The results are listed in Table 6: from n = 128 on, when c 1 is a node, the trapezoidal values become much more regular (in fact O(h 2 ) according to the Euler-Maclaurin formula) and the extrapolated values rapidly converge within our tolerance N * 10 −16 . Table 6 Second example with one interior jump at c 1 , with c 1 becoming a node
The function with c 1 = 1/30 and c 2 = √ 3 is shown in Fig. 4; the exact value becomes I = 2.945411417434258 and our results are given in Table 7. The number of trapezoidal values necessary for an extrapolation is not always three, the number of jumps, since for odd p only two such values are required: as in Section 5, P m (t 0 (h/2 k )) = 0 for all k, so that d ( ) 0 does not appear in (13) and therefore needs not be determined.
The first extrapolated values are extremely large, in fact about the reciprocal of the unit round-off. This results from the extremely bad conditioning of the matrix A and, at first, even led us to believe that there was a programming error in our code. To test this, we changed c 2 from √ 3 to 1 + √ 3. The results are given in Table 8.   Table 8 Same as Table 7, but with and then recursively divided all intervals in two, or four in the second version when p is odd, thereby doubling (resp. quadrupling) the number of function values at each step. Equivalently, being given the vector of samples corresponding to the last subdivision, we compute all trapezoidal values and solve the system of equations Ay = b for the first component of y. The fact that this uses only one vector of samples may be an important advantage of the method. However, such doubling of the number of function values rapidly results in a very large number of these. For that reason, it has been suggested long ago to alternate powers of two multiples of 2 and 3 in the extrapolation process: the so-called Bulirsch sequence [9] of interval numbers, with n k = 2n k−2 for k ≥ 4, is (the sequence is monotonically increasing). This requires two sample vectors, one with 2 · 2 + 1 and the other with 3 · 2 + 1 samples. One may generalize this to several vectors of samples, say q of them, the sth one with (2s − 1)2 + 1 components, s = 1, 2, . . . , q, i.e., the first q odd numbers times the first powers of two. (One could consider taking primes times 2 to have only different values of f in the interior of the interval, but with q ≤ 4 this is the same.) We do not know whether it matters, but we have ordered the numbers of intervals. For q = 3, and ignoring the possible first element 1, this yields the following sequences of sample indices: Ordering these numbers turns out to be very simple: it suffices to read them one anti-diagonal after the other, to obtain (1, )2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32, 40, 48, 64, . . . (21) or n k = 2n k−3 , k ≥ 6.
It does not seem as simple for q = 4. These numbers indeed grow slowly and may give trapezoidal approximations too inaccurate to yield a precise extrapolate. Compared with the harmonic sequence, which is the best in the context of extrapolation methods for ODEs [9, p. 221], the sequence grows somewhat faster, but requires only three sample vectors whereas the harmonic sequence requires an unboundedly increasing number of them. The system of equations to be solved for the d ( ) j is like (15), but now with a modified matrix A m The general algorithm for determining an extrapolated value I may now be described as follows: • obtain one (or several) sample vector(s) with N 1 +1, resp. N 1 +1, N 2 +1 . . . N q + 1 samples; the numbers of intervals N r should be composite numbers, so that several trapezoidal values can be computed from any one of the vectors; • determine the sequence {n + 1} of numbers of samples from which the trapezoidal values will be computed for instance (20)   be skipped to improve accuracy (this is the more accurate "second method" suggested in Section 5); • compute the matrix A from (15) with the submatrices A m from (23); • solve the system of equations Ay = b for the first component y 1 ; the latter is the sought extrapolated value I .
Before experimenting with several sample vectors, we have computed with only the second vector of the above table, i.e., n k = 3 · 2 k , k = 0, 1, 2, 3, 4, . . ., which can be accomplished by a minor change in our program. For the same example as in Section 6.3, Table 9 then replaces Table 7.
This change does not look like a good idea: to obtain, say, an accuracy of 10 −11 , about 25'000 points are necessary, compared to about 16'000 with the sequence 2 .

An example with two sample vectors
To test the method with several sample vectors, we started with two of them, i.e., with the Bulirsch sequence (20). Our results, still with the same example as in Section 6.3, are given in Table 10.

An example with three sample vectors
In example (19), J = 2 and thus three trapezoidal values are needed for the first extrapolation. In order to avoid computing the same function values twice and to use the formula giving T (h/2) from T (h) [17, p. 377], and in view of (22), we started the method with h = (b − a)/3, so that the next two interval lengths are h = (b − a)/4 and h = (b − a)/5. The results are displayed in Table 11.
With the data of Tables 7, 10, and 11, we finally plot in Fig. 5 the number of function evaluations needed to reach a certain accuracy, starting from 10 −2 , in green and dotted after extrapolation with one vector of samples, in blue and dashed with two, and in red and solid with three (the curves are intended to be read from right to left); the plain trapezoidal values are added in black and dash-dotted for comparison. Note the logarithmic scale!

Conclusions
The direct application of the composite trapezoidal rule to a piecewise smooth function f converges slowly and in a non-monotone fashion to the integral of f . The extrapolation method presented in this work accelerates the convergence, and makes it regular as soon as the interval length h is small enough. It converged in all examples we tried, despite obtaining sometimes very bad values at the initial stage. The number of correct digits increased by a factor of two to three compared with the trapezoidal values. One may decrease h without any stability concern, and thus use a stopping criterion based on the agreement of consecutive values of I .
One question is whether better results could be obtained by starting (and extrapolating) from smaller values h 0 , to avoid the often erratic first extrapolates. We tried this for the very first example (Section 4): the results were not too encouraging, as large values due to the ill-conditioning of the matrices again arose for small divisors of h 0 and longer sample vectors were needed to reach our tolerance.
Note, finally, that the method requires knowledge of the locations of the jumps. Methods exist for determining them, but most work in a transformation space; we envision one using just the vector(s) of equispaced samples.
Acknowledgements The authors thank Jean-Pierre Gabriel and the reviewers for their comments, which have significantly improved the presentation.
Funding Open access funding provided by the University of Fribourg. The second author was supported by a National Science and Engineering Research Council (NSERC) Canada Discovery grant (RGPIN-2020-04663).
Data availability Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Conflict of interest The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.