Automatic approximation using asymptotically optimal adaptive interpolation

We present an asymptotic analysis of adaptive methods for Lp approximation of functions f ∈ Cr([a, b]), where 1≤p≤+∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1\le p\le +\infty $\end{document}. The methods rely on piecewise polynomial interpolation of degree r − 1 with adaptive strategy of selecting m subintervals. The optimal speed of convergence is in this case of order m−r and it is already achieved by the uniform (nonadaptive) subdivision of the initial interval; however, the asymptotic constant crucially depends on the chosen strategy. We derive asymptotically best adaptive strategies and show their applicability to automatic Lp approximation with a given accuracy ε.


Introduction
Numerical algorithms for solving continuous problems generally fall into two categories: nonadaptive algorithms and adaptive algorithms. By "adaptive" we here mean that in its successive steps the algorithm uses information about the problem instance (usually a real valued function) obtained from the previous steps. Adaptive algorithms often overcome nonadaptive ones in that they enjoy an essentially better convergence rate. Examples include bisection or Newton's method for solving nonlinear equations. A good deal of numerical literature is devoted to automatic integration using adaptive quadratures (see, e.g., [1,6]), one of the first and probably best known being the adaptive Simpson quadrature [5]. The question when and how much adaption helps is one of the main issues in information-based complexity [7,12].
For the problems of function approximation or integration, adaptive algorithms are especially efficient when the underlying function is piecewise smooth only, since then adaption can be successfully used to localize the unknown singular points [9][10][11]. On the other hand, if the function is smooth in the whole domain, then adaptive algorithms can improve the error only by a constant compared to nonadaptive algorithms. The exact asymptotic constants for quadratures of degree of exactness r − 1 and for functions f ∈ C r ([a, b]) with f (r) > 0 were obtained in [8] for r = 4 and in [2,3] for arbitrary r. Procedures corresponding to the optimal strategies for automatic integration were also proposed.
While adaptive numerical algorithms for the problem of function approximation of smooth functions are sometimes constructed (see, e.g., [4,Sect. 6.14]), a similar quantitative analysis of such algorithms seem not to exist. The purpose of the current paper is to fill this gap. We consider approximation of functions f ∈ C r ([a, b]) and algorithms that rely on piecewise polynomial interpolation of degree r − 1 with adaptive strategies of selecting m subintervals. The error is measured in the integral norm · L p with 1 ≤ p ≤ +∞. It is well known that the optimal convergence rate is in this case of order m −r and it is already achieved by the uniform, i.e., nonadaptive, partition of the initial interval. (Actually, the rate m −r cannot be beaten even in the much larger class of algorithms that use m function evaluations, which follows in particular from [13]) . We first prove that for any function in the class a theoretically best adaptive strategy of interval subdivision relies on keeping the L p errors equal in all subintervals. Then, the global L p error of approximation asymptotically, as m → +∞, equals α r,p r! f (r) while for the uniform partition it equals where α r,p is given by (4) (see Propositions 1 and 2). The gain from using adaption can be significant. For instance, consider the L ∞ approximation of f (x) = 1/(x + 10 −d ) in the interval [0, 1]. If d = 2, then the adaptive algorithm overcomes the nonadaptive one roughly by the factor of 10 6 , and for d = 8 this factor becomes 10 29 (see Table 2). Then, we show how the optimal strategy can be realized in practice. That is, for a given function f ∈ C([a, b]) we construct a relatively simple procedure that uses a priority queue and produces an almost optimal mth partition with the help of proportionally to m evaluations of f . Different versions of the procedure and the error analysis are presented depending on additional properties of f (see . Next, we deal with automatic approximation. We consider a local subdivision strategy that is a departure point for obtaining a recursive procedure using the (almost) optimal strategy. For any ε > 0 and f ∈ C r ([a, b]), the proposed procedures return an approximation with the L p error at most ε, asymptotically as ε → 0 + .
Finally, we notice that our results imply the previously known and mentioned earlier in this introduction results for the numerical integration.
The content of the paper is as follows. In Section 2 we formally define our problem and show some preliminary estimates. A theoretically optimal partition is constructed in Section 3, while Section 4 is devoted to its practical realization. The recursive procedures for automatic approximation are constructed in Section 5. In Section 6 we comment on relations to the numerical integration. Theoretical findings are complemented by some numerical examples.

Preliminaries
For an integer r ≥ 1 and −∞ < a < b < +∞, we denote by C r ([a, b] We assume that such functions are approximated using piecewise interpolation of degree r − 1 with possibly nonuniform partition of the interval [a, b] into subintervals. Specifically, we first fix points 0 ≤ t 1 < t 2 < · · · < t r ≤ 1. (1) For a given f ∈ C r ([a, b]), the interval [a, b] is subdivided into m subintervals that are determined by a choice of points In each subinterval [x j −1 , x j ], the function is approximated by its Lagrange polynomial of degree r − 1 interpolating f at where h j = x j − x j −1 . We denote such an approximation by L m,r f . The error of approximation is measured in the L p norm, i.e., We are interested in partitions (2) such that the errors for the corresponding approximations L m,r f are asymptotically (as m → +∞) as small as possible. Note that the problem can be formally treated as a special way of approximating the embedding Remark 1 Obviously, the uniform approximation C r ([a, b]) → C( [a, b]) is also of interest. We do not analyze it separately, since it is equivalent to L ∞ approximation provided t 1 = 0, t r = 1, and r ≥ 2. Indeed, then for any partition, the approximation L m,r f is continuous in In the rest of the paper, we assume without loss of generality that f is not a polynomial of degree smaller than r, since otherwise we clearly have L m,r f = f . Then, in particular, the derivative f (r) is a nontrivial function.
We now provide preliminary formulas for the approximation error that will be used later. Let where t i s are given by (1). Let Then, the local errors, by which we mean the errors in the successive subintervals [x j −1 , x j ], can be written as follows. For 1 ≤ p < +∞, Hence, In the sequel, η j always denotes a point in the j th subinterval for which For convenience, we also use the following asymptotic notation. For two nonnegative functions a and b of the variable m we write Consider first the uniform partition of the interval [a, b], in which case Proposition 1 For the uniform partition (7) we have and by (6) we have

Optimal partition
We now show that an asymptotically optimal partition makes all local errors equal. That is, it asymptotically enjoys the smallest error as m → +∞, for all functions f ∈ C r ([a, b]). Specifically, for a given m, let be such that all the quantities where L * m,r f denotes the approximation corresponding to (8), are equal. Observe that such a partition exists since the local errors continuously depend on the points x i .
In the sequel, g L q (a,b) = b a |g(x)| q dx 1/q for all 0 < q ≤ +∞. This is obviously not a norm in case 0 < q < 1, since then the triangle inequality is not satisfied. We also adopt the notation that 1/p = 0 for p = +∞.

Proposition 2
The equal-local error partitions (8) and the corresponding approximations L * m,r are asymptotically optimal. That is, for the approximations L m,r using other partitions we have Proof We first show the error formula for L * m,r .
Then, for finite p, we have For infinite p we have in turn b) . For that end, we fix ≥ 1 and define u i = a + iH where H = (b − a)/ , and We can assume without loss of generality that for m ≥ we have . Indeed, since we keep fixed, we can always add the points u i to a given partition without asymptotically increasing the error, as m → +∞. Let l i be such that Consider first finite p. We have The minimization of the last sum with respect to i=1 m i = m gives the optimal The last sum in the parentheses is a Riemann sum for the integral b a |f (r) (x)| 1/(r+1/p) dx. Hence, taking sufficiently large and m ≥ , we can make The right-hand side is minimized by The proof completes the observation that the last sum in the parentheses is a Riemann sum for the integral b a f (r) (x) 1/r dx.

Remark 2
The error of approximation depends on the points t i s via α r,p . Recall that for p ∈ {1, 2, +∞} this factor is minimized by the points t * i s being zeros of appropriate orthogonal polynomials, adjusted to the interval [0, 1]. For p = 1 these are Chebyshev polynomials of the second kind, for p = 2 these are Legendre polynomials, and for p = +∞ these are Chebyshev polynomials of the first kind. Consider, for example, r = 4. Then, the optimal points are as follows. For p = 1, For p = 2, Table 1 shows the corresponding values of α r,p for the optimal points t * i and, for comparison, for the equispaced points Remark 3 We have shown that the optimal partition is asymptotically better than the uniform partition by the factor of . We obviously have that where the equality holds for f being a polynomial of degree r, and the more f (r) varies the bigger R r,p (f ). An example is provided in Table 2. Table 1 The values of α 4,p for the optimal and equispaced choices of the t i s 0.0123 Table 2 The values of Now we want to see how much we potentially lose by not using the optimal partition. For the error to go to zero as m → +∞ we have to assume that the partitions satisfy where Obviously, K m ≥ 1 and for the optimal partition is K m = 1. Let us check how big K m can be assuming that for all m sufficiently large we have max 1≤i,j ≤m where > 1 and 0/0 = 1. Since K m is a homogeneous function of A, we can assume without loss of generality that 1 ≤ A i ≤ for all is. It is clear that then the maximum is attained at A = ( , . . . , , 1, . . . , 1), where is repeated k times, for some k. If p = +∞, then the maximum is for k = 1 and Let 1 ≤ p < +∞. Then, setting q = 1/(r + 1/p) we have We treat K m as a function of k ∈ [0, m] and find its maximum. The maximum is for Remark 4 Especially important will be the case where Then, K m in (9) is bounded from above by κ r,∞ = 2 r for p = +∞, and The values of κ r,p for p = 1, 2, ∞ and 1 ≤ r ≤ 6 are in Table 3 4

An algorithm for (almost) optimal partitions
In this section, we show how asymptotically (almost) optimal partitions can be practically realized for a given m and f ∈ C r ([a, b]). We allow algorithms that can evaluate f at any x ∈ [a, b].
Let us fix another point Remark 5 Observe that for each interval I the functional L I is uniquely (up to a multiplicative factor) defined by the conditions that it linearly combines the values of f at u i for 0 ≤ i ≤ r, and its kernel consists of all polynomials of degree at most r − 1. In our definition, L I (f ) is just the error of interpolating f in I at u 0 , but equally well it could be the divided difference f [u 0 , u 1 , . . . , u r ]. For we have The algorithm that we present and analyze in this section uses a priority queue S whose elements are subintervals. For each I ∈ S of length h its priority is given by In the following pseudocode, insert(S, I ) and I := extract max(S) denote actions corresponding to inserting an interval to S, and extracting from S an interval with highest priority.
After execution, the elements of S form a partition into m subintervals. Since a priority queue can be implemented using a heap, an mth partition can be obtained at cost proportional to m log m.
Denote by L * * m,r f the approximation corresponding to the mth partition obtained by our algorithm. Recall that α r,p and κ r,p are respectively given by (4) and (11).

Theorem 1 If the function f ∈ C r ([a, b]) is such that its derivative f (r) does not nullify in [a, b], then
Proof In vew of the definition of κ r,p in (11) of Remark 4, it suffices to show that the value of in (10) can be chosen arbitrarily close to 2 r+1/p , provided m is large enough. Suppose that f (r) > 0 (the case f (r) < 0 is symmetric and there is no need to consider it separately). Then, there are 0 < d ≤ D < +∞ depending on f such that For an interval I of length h, its priority can be written as where γ is defined in (12). This means that i.e., the priority is always positive, and it decreases to zero when an interval is successively subdivided. This leads to an important observation that the maximum length of a subinterval in an mth partition goes to zero as m → +∞. Furthermore, if the interval I is further subdivided into I 1 and I 2 , then for s = 1, 2 we have where ω is the modulus of continuity of the function f (r) , This in turn means that for any δ > 0 there is m δ such that for all m > m δ the ratio of the highest to lowest priorities in the mth partition is (Indeed, m δ is such that the lengths of all subintervals in the corresponding partition are at most δ, and such that after dividing the subinterval with the highest priority, one of its successors has the lowest priority). Consider now the partition for a particular m ≥ m δ . Since the local error in an interval I can be written as by (13) and (14) we have that the ratio of any two local errors is upper bounded by Since δ can be arbitrarily small, the right-hand side can be made arbitrarily close to 2 r+1/p , as claimed.
Example 1 Figure 1 shows results of a numerical experiment for regularity r = 4.
The tested function is for which the 4th derivative is positive. The approximations are based on the adaptive partitions obtained from ALGORITHM, and those based on the uniform (nonadaptive) partitions. In this and all the numerical examples that follow we take t 0 = 0.5 and the optimal points t * 1 , t * 2 , t * 3 , t * 4 , cf. Remark 2. The errors are measured in the L p norms with p ∈ {1, 2, +∞}. The results perfectly confirm the theoretical findings.

Remark 6
In this paper, we consider the algorithm error versus the number m of subintervals. One may want to consider the error versus the number n of function values used. Then the choice of the equispaced points t i = (i − 1)/(r − 1) may lead to a better asymptotic constant than the choice of t * i , 1 ≤ i ≤ r, despite the fact that the factor α r,p is in this case slightly larger, cf. Table 1. Consider, for instance, our algorithm for r = 4. If the points t * i are applied, then the algorithm produces an mth partition using n ≈ 10m function values. On the other hand, for the equispaced t i s we have n ≈ 4m, since all 5 function values computed for a given subinterval can be re-used when halving this interval in one of the following steps.
A key point in this example is that the set of points t i , 0 ≤ i ≤ r, does not contain both endpoints of the interval [0, 1]. If this obstacle is removed, then Theorem 1 holds true for all functions such that f (r) does not change its sign. To show this, we need the following auxiliary result.
In particular, if g(u 0 ) = 0, then g nullifies on the whole interval [c, d].
Proof Assume without loss of generality that g (r) ≥ 0. We estimate g(u) for u different from any of the points u i . We have two cases: either u < u 0 or u > u 0 . If u < u 0 then, by the explicit formula for divided differences, we have On the other hand, if r i=1 (u − u i ) < 0, then 1 (u)g(u 0 ) ≤ g(u) ≤ 0. In the case u > u 0 , we similarly combine (18) with to get that either and applying the substitution u = c + th, we finally obtain that L p (c,d) = h 1/p l L p (0,1) ; hence, the lemma holds with β r,p = l L p (0,1) .
Remark 7 An important consequence of Lemma 1 is that for any subinterval I of a given partition we have Indeed, it suffices to take g = f − L m,r f in Lemma 1 and recall the definition of p f (I ). If so, then for any partition we have which means that the inequality (20) allows us to control the exact error of approximation. For instance, if r = 2 and (t 0 , t 1 , t 2 ) = (1/2, 0, 1), then β r,p = l(t) L p (0,1) , where l(t) = 1 + 2|t − 1/2|. We have β 2,1 = 1.5, β 2,∞ = 2, and We stress that the exact inequalities hold true only under the assumptions of Lemma 1. The values of β r,p given by (19) are by no means best possible. Optimization of β r,p is a separate problem and is not addressed in the present paper.

Theorem 2 Let the assumption (16) of Lemma 1 be fulfilled. Then, the error estimate of Theorem 1 holds true if the derivative f (r) does not change its sign in [a, b].
Proof Choose 0 < < f (r) C ([a,b]) . For a given m, define where I i is the ith subinterval. We assume that m is large enough, m ≥ m , so that I 2 = ∅ and the modulus of continuity of f (r) at denoted by ω, is smaller than . Such an m exists since by Lemma 1 we have p f (I i ) > 0 for i ∈ I 1 ∪ I 2 , which implies that the maximum length of such subintervals decreases to zero as m → +∞. Let Then, for i ∈ I 0 , we have for which the 4th derivative, g (4)  , changes its sign 32 times (see Fig. 2). In Fig. 3, we present the quality of L ∞ approximation of g using ALGORITHM, for two extreme choices of the function δ; namely δ(h) = 0 and δ(h) = 10 4 . For comparison, we also include the corresponding error for the uniform subdivision.
For δ(h) = 0, i.e., for p f (I ) = |L I (f )|, the error seems to decrease at speed m −4 , despite the fact that neither the assumptions of Theorem 3 nor those of Theorem 4 are fulfilled. However, the error fluctuates because of difficulties in proper estimation of the local errors in the intervals where g (4) changes its sign. Much better results are for the "safe" choice δ(h) = 10 4 , i.e., for p f (I ) = max |L I (f )|, (10h) 4 , for which Theorem 4 applies.

Automatic approximation
In this section, we deal with automatic approximation. Ideally, we should have a procedure that for a given function f and an error threshold ε > 0 returns a partition, for which the corresponding approximation, say A(f, ε), satisfies Obviously, such a procedure does not exist if it is supposed to work for all f ∈ C r ([a, b]) and ε > 0, and use only finitely many function evaluations. We shall show however that the inequality (28) can be achieved asymptotically, as ε → 0 + . Since the accuracy ε (instead of m of function evaluations) is now an input parameter, in this section we use the asymptotic notation with respect to ε → 0 + . That Fig. 2 The graph of g (4) which implies that for the corresponding approximation Admittedly, we achieved our goal; however, the obtained partition is (almost) optimal only for p = +∞. Indeed, it is easy to see that AUTO1 tries to keep all the local errors proportional to h 1/p i , which results in that the factor depending on f in the overall error equals f (r) r (a,b) instead of f (r) , where f is any To construct a procedure for 1 ≤ p < +∞ that uses an (almost) optimal partition, consider the following modification of AUTO1.
When run for f ∈ C r ([a, b]) with e = ε, this procedure keeps all the local errors at most ε, and the approximation corresponding to the resulting partition equals L * * * m ε ,r f, where m ε is as before the number of subintervals in the resulting partition. Then where the first inequality follows from Theorem 3. Thus, to reach an ε-approximation it suffices to run AUTO2 with e = ε * such that ε * m 1/p ε * ≤ ε. The value of ε * can be found as follows. We first use the lower bound of Proposition 2, together with (30) to get that f (r) (Observe that ε * = ε if p = +∞, which is consistent with the previous considerations).
To summarize, our algorithm consists of two steps. First, we run the recursive procedure AUTO2 with the error threshold e = ε and find m ε . Second, we resume the recursion with the updated threshold e = ε * given by (31) to get the final partition. If the recursion is implemented using a stack, then the cost of the algorithm is proportional to m ε * , which in turn is proportional to f (r) 1/r Denote the resulting approximation by A 2 (f, ε).
Example 3 Results of numerical tests for the automatic approximation of the functions f and g of Examples 1 and 2 using AUTO2 are presented, correspondingly, in Tables 4 and 5. We observe a perfect behavior of the algorithm for f and = 0, and for p = 1, 2, +∞. Things are quite different for g. If = 0, then the algorithm wrongly estimates the L ∞ error and terminates too early. A much better is the "safe" choice = 10 4 .

Remark 8
It is easy to verify using Theorems 1 and 2 that if = 0 in (29), i.e., when the priority p f = p f , then Theorem 5 holds true provided f (r) does not nullify,  or f (r) does not change its sign and the condition (16) is fulfilled. Moreover, in the latter case, it is possible to obtain an ε-approximation non-asymptotically. Indeed, it is enough to change the "if" condition in AUTO1 to β r,p p f ([a, b]) ≤ e, where β r,p is as in Lemma 1, and run the procedure with e = ε. It immediately follows from (20) that then we get for sure an approximation with error at most ε.
The existence of a corresponding to AUTO2 recursive procedure that uses an (almost) optimal partition is problematic. Instead one can apply the following iterative procedure that is based on our initial algorithm discussed in Section 4.
It is worth mentioning that AUTO3 produces an approximation with unnecessarily much smaller error than the required ε, and consequently its running time is much higher than that of AUTO2. This is due to the fact that β r,p p f (I ) in (20) usually considerably overestimates the error in any interval I . For instance, consider again the L ∞ approximation of the function f from previous examples. Let β 4,∞ be defined as in (19). Then, for ε = 10 −3 , 10 −6 , 10 −9 , the procedure AUTO3 produces respectively approximations with errors 3.0211e−06, 3.3098e−10 and 5.6843e−14 using 88, 878, and 9749 subintervals (compare with the corresponding results for AUTO2 in Table 4). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.