Small errors imply large evaluation instabilities

Numerical analysts and scientists working in applications often observe that once they improve their techniques to get a better accuracy, some instability of the evaluation creeps in through the back door. This paper shows for a large class of numerical methods that such a Trade-off Principle between error and evaluation stability is unavoidable. It is an instance of a no free lunch theorem. Here, evaluation is the mathematical map that takes input data to output data. This is independent from the numerical routine that calculates the output. Therefore, evaluation stability is different from computational stability. The setting is confined to recovery of functions from data, but it includes solving differential equations by writing such methods as a recovery of functions under constraints imposed by differential operators and boundary values. The trade-off principle bounds the product of two terms from below. The first is related to errors, and the second turns out to be related to evaluation instability. Under certain conditions satisfied for splines and kernel-based interpolation, both can be minimized. Then the lower bound is attained, and the error term is the inverse of the instability term. As a byproduct, it is shown that Kansa’s Unsymmetric Collocation Method sacrifices accuracy for improved evaluation stability, when compared to symmetric collocation.


Introduction
After quite some efforts to find kernels that allow small recovery errors and wellconditioned kernel matrices at the same time, the paper [1] proved that this does not work. The result was called "Uncertainty Relation" or "Trade-off Principle" (see, e.g. Holger Wendland's book [2] of 2005) and received quite some attention in the literature. It is a special case of the "No free lunch" principle. As correctly mentioned by Greg Fasshauer and Michael McCourt in their 2015 book [3], it had quite some negative influence on the development of the field, because it kept users from looking at better bases than those spanned by kernel translates. However, it will turn out here that changes of bases will be helpful in a different way, but not allow an escape from the Trade-off Principle.
Sparked by a question of C.S. Chen of the University of Southern Mississippi in an e-mail dated Dec. 28th, 2021, this paper extends the result of [1] to much more general situations. To avoid the misconceptions implied by [1], the effect of basis changes will be discussed at various places. But most of the results here are independent of choices of bases.
Since the scope of the paper will be quite wide, a good deal of abstraction will be necessary later, and therefore a classical case should be served as starters. Consider interpolation of functions In a somewhat sloppy formulation, this is a first Trade-off Principle:

Small Power Functions lead to large norms of Lagrangians.
To avoid misunderstandings, it should be pointed out that (2) concerns two successive interpolation steps and lets occur in two different meanings. In the first factor, the error for an interpolation in 1 points is evaluated at using a specific norm, but for the second factor the interpolation takes place on 2 points with added, and then the corresponding norm of the Lagrangian at is taken. The error at the new step is not estimated. At some new point , it would be bounded indirectly by via (1) and connected to 2 by (2) in the next step. Users will often see interpolation as a sequence of many steps, and if they try to make the first factor go to zero for large , they will have to face that the second goes to infinity. This goes away from the stepwise viewpoint.
By (2), the point as the new 1 for the next step should be placed at the maximum of the Power Function. This greedy technique leads to Leja points and will be treated in more detail in Section 12.
The value of the interpolant at some is given by 0 (3) no matter how numerical calculations work. Absolute errors in the input data will always be multiplied by the Lagrangians to get absolute errors in the values. For any norm chosen, 0 lets absolute errors in the input data be multiplied by 0 to deviations in norms of values, in the worst case. This is why large norms of Lagrangians lead to large evaluation instabilities measured in these norms. Therefore, the Trade-off Principle may be reformulated as Small errors are connected to evaluation instabilities taken in the same norm.
The paper gives a rigid underpinning of this Trade-off Principle. Of course, users may prefer other formulas for computation of values, in particular barycentric ones by J.P. Berrut and L.N. Trefethen [4]. This is mandatory for univariate polynomial interpolation with large degrees. But no computation can escape from the validity of (3) that shows how absolute input errors will enter into absolute output errors in the worst case, even if a numerical algorithm avoids Lagrangians. However, the numerical computation of the Lagrangians, if attempted, induces additional instabilities that are ignored here. For polynomials, we already mentioned barycentric formulas [4]. For kernel-based recoveries, various methods were invented to cope with instabilities, see the references given in Section 13. Section 3 sets the stage for general recovery methods including solving differential equations. Recovery processes reconstruct functions from data given as prescribed values of linear functionals, and the evaluation of the result will again be an application of a functional. Data can include values of arbitrary linear operators acting on functions, thus rewriting methods for PDE solving as function recoveries.
Section 4 introduces the form of the recoveries considered. Nearest-neighbour methods, optimal recoveries in Hilbert spaces, and regression in machine learning are special cases described in Section 5.
The basic technique used for trade-off principles is outlined in Section 6, still in rather abstract form. Then Section 7 contains the central results, namely trade-off principles that bound the product of norms of errors and norms of certain "bump" functions from below. The lower bounds turn into equalities in case of optimal recoveries later in Section 9. The central notion of evaluation instability is explained at length in Section 8, defining it as the worst possible propagation factor of absolute errors in the input data to absolute errors in the function values of the recovery. It is shown that the norms of "bump" functions from Section 7 are closely connected to evaluation instability.
Examples are given in Section 10, including splines and recoveries via expansions like Fourier or Taylor series. The connection to the older trade-off principle from [1] is provided in Section 10.8, followed by extensions to unsymmetric methods like Kansa's collocation technique. The trade-off principle holds for these as well, but they sacrifice accuracy for evaluation stability. Finally, the implications for greedy adaptive methods are sketched.

Data as functionals
A fairly general and useful viewpoint on Numerical Analysis or Computational Mathematics when working on functions is to see data of a function as values of linear functionals. In particular, differential equations, ordinary or partial, just impose infinitely many restrictions on a function from some function space by applying linear functionals. This can be conveniently written as (4) for all functionals from a subset of the dual of , the space of continuous linear functionals on . The problem is to recover from the given data . The specifics of certain differential equation problems involving differential or boundary evaluation operators disappear. And if users have only limited information in the sense of just finitely many data for a finite subset 1 , one has to use computational techniques that get along with the available data. This viewpoint is behind the scenes for this paper. Readers should always be aware that differential operators may lurk behind the functionals appearing here.
For illustration, consider a standard Poisson problem in in (5) on a bounded domain 2 for simplicity. The data functionals come in two variations: caring for the PDE in the domain and for the boundary values. They are finite selections from the obvious infinite sets of functionals that define the true solution analytically. If the analytic problem is well-posed and if the function recovery from the above data is carried out with enough oversampling, this technique produces accurate and convergent approximations to the true solution of the PDE problem [5]. This reference also fits algorithms that solve problems in weak form into this framework, including the Meshless Local Petrov Galerkin approach by S.N. Atluri and T.-L. Zhu [6,7] and Generalized Finite Element Methods, see the survey by Babuška et al. [8]. The general practical observation is that going for more accuracy causes more instabilities, in a way that will be clarified here. Also, evaluation of functions is the application of a functional to some function . In particular, evaluation of a multivariate derivative at a point is the application of the functional in case that is continuous on . If point evaluation is not defined, as in 2 spaces, but if local integration is feasible, one can evaluate local integral means, as substitutes for point evaluation. This is the standard way to handle problems in weak form in the references cited above.
Summarizing, everything boils down to a matter of functionals. What can we say about if we know all for all ? In particular, what can we say about when we know plenty of data ? Note that this problem is regression in a probabilistic context, and it arises in Machine Learning on a large scale, with Big Data given in high-dimensional spaces.

Recovery of functions
We now postulate that the recovery of functions from their data 1 is a linear map from into a space of functions. No matter how it is actually implemented numerically, we assume that it can be mathematically written as for all (7) via certain functions that we call quasi-Lagrangians because they produce the recovery like Lagrangians, but possibly without exact reproduction of the data. This is connected to the notion of quasi-interpolation.
To avoid certain complications, the map is assumed to be surjective, but we allow nonuniqueness of the . Note that the recovery map is an abstract notion that does not imply that the are actually calculated. They just describe the way the given input data enter mathematically into the values of the recovery, and they are assumed to exist independent of how the recovery is numerically implemented.
We call the recovery process interpolatory if the recovery preserves the data, i.e.
for all and then 1 holds, i.e. the have the Lagrange interpolation property. We shall use the notation for a Lagrangian that satisfies 0 1, and then holds in this notation. Evaluation of the recovery via a functional now is (8) by defining a vector that is a bilinear form in and . One may restrict these maps to sums over neighbours of , to get more locality, and this is what generalized Moving Least Squares [9][10][11][12] or Finite Elements do. But if theoretically done for all , this still fits into the above framework. As a prominent example, Barycentric formulas by J.P. Berrut and L.N. Trefethen [4] change the way the above formula is calculated, with a significant gain in numerical stability, focusing on neighbours.
For recovery of a single value from single values , one can construct a vector of single values such that (9) without necessarily calculating the as functions and taking values afterwards. In meshless methods (see the early survey by T. Belytschko et al. [13]), the functions are called shape functions. In the standard approach, they are calculated in many points, and if derivatives are needed for dealing with PDEs, these are taken afterwards or obtained by taking derivatives of the local construction process. In contrast to this, Direct Moving Least Squares by D. Mirzaei et al. [14,15] use an error minimization of (9) for derivative functionals without the detour via shape functions.
This presentation looks unduly abstract, but it is not. It considers recovery without any fixed assumptions about how functions are represented, how norms of errors and functions are defined, and how bases are chosen. Therefore, it allows to compare actual numerical strategies on a higher level. It goes back to the input data and considers the output data, as functionals, the actual determination of the recovery map being in a black box. The final goal in this paper is to see whether going for a small error implies some sort of instability whatsoever, and this may be independent of what happens in the black box.

Special recovery strategies
When avoiding full functions, the recovery of a single value from given values is trivial if for some . In more generality, one would pick the functional that is "closest" to , and then take as an approximation of . This is the nearest neighbour strategy, but it needs distances between functionals, and requires to find the closest neighbour. Cases involving point geometry like nearest neighbours or triangulations will be covered by the theory developed here, but we do not include examples.
If a norm on is available, one can consider the approximation problem to minimize over all coefficients , denote a solution by and to approximate by . This avoids functions as well, but it requires norms in the dual space that users can work with. Special cases are Reproducing Kernel Hilbert spaces. They have a kernel on an abstract set and define an inner product where the application on arises as a superscript. Furthermore, each functional defines a function for all and these functions have the inner product making a Riesz representer of . It is then easy to prove that an optimal recovery consists of the vector that solves the system with a kernel matrix that usually is positive definite. This looks theoretical again, but it applies to Sobolev spaces, having Whittle-Matérn kernels, and therefore it is useful for solving PDE problems by recovery of functions from PDE data. This recovery strategy has various optimality properties [16] that we skip over here. See details on kernel-based methods in books by M.D. Buhmann [17], H. Wendland [2], and G. Fasshauer/M. McCourt [3].
It also applies to Machine Learning [18]. On a general set , one has feature maps that map the abstract objects to a real value like cost or weight or area. The kernel then is with positive weights and the inner product lets the above machinery work for regression, but details are omitted. Combining the cases above, Machine Learning can "learn" the solution of a PDE using this framework.

Dual trade-off principles
Throughout we shall assume that norms in and are defined and connected via the suprema sup 0 sup 0 . (10) Take a functional and imagine that it is evaluating an error of a recovery process. Then 1 holds for all functions with 1. If the error of the recovery process is small, the norms of the functions must be large. In Section 8, we shall connect these norms to the instability of the evaluation of function values of the recovery.
Of course, there also is a dual version 1 for all functions and all functionals with 1. In Hilbert spaces, one can minimize the second factors under the given constraint, and the minimum is realized by Riesz representers.
Note that the above inequalities turn into equalities if the suprema in (10) are attained for the functions or functionals in the second factors. Details and applications will follow below.
It is essential that the two norms in the above inequalities are dual to each other. If . allows and penalizes high derivatives, the functionals in will allow high derivatives as well. Users might want the functional factor and the function factor to use non-dual norms, but this is a quite different story.
These trade-off principles differ from certain standard algebraic ones involving inversions like 1 1 for square nonsingular matrices . But for vectors is a special case, because one of the vectors can be seen as a functional acting on the other. If generalized to variances and a covariance or commutator, the latter case is behind the Heisenberg Uncertainty Principle after a few steps of generalization.

Power functions and bump functions
We assume to have a finite set of functionals to recover functions from their data via a recovery (7), and we evaluate the result by applying a functional to .

Definition 1
The norm (12) is called the Generalized Power Function.
It leads to an error bound for all (13) and this is why we use it to deal with the recovery error. Proof Just insert a bump function into (13).
Remark Power Functions, bump functions, and Lagrangians are independent of bases. There is no escape from the Trade-off Principle by any change of basis, as long as the recovery map or the space are kept fixed.
Remark Furthermore, recoveries, bump functions, and Lagrangians can be defined without using norms or spaces. These come up when going over to a Power Function and a norm of a bump function. Therefore, Theorem 1 does not only cover all possible recoveries, but also all ways to handle errors and evaluation instability for these by defining norms afterwards. Furthermore, (14) is local in the sense that it holds for each specific . The right-hand side will vary considerably with , up to the limit 1 0 in the excluded case . While (14) is an add-one-in version, a special leave-one-out version is 1 (16) if a bump function is available. And if the recovery is interpolatory, using a Lagrangian , we get When using the leave-one-out version, it is a pitfall to assume that the recovery arises from deleting the component from . The other components will still depend on all functionals in .

Evaluation instability
Consider the evaluation of a linear formula like (8) when users have to take the data as they are. Even if the numerical calculations are performed differently and without roundoff, any absolute error in the input data will be multiplied by (18) in the worst case. If the result is as large as the recovery error, the recovery may be too polluted to be useful.

Definition 3
Evaluation stability is the worst-case propagation factor for absolute errors in the input data to absolute errors in the output values of the recovery.
To avoid misunderstandings, it should be pointed out again that evaluation stability is independent of the recovery error. And it is independent of how the recovery is numerically implemented. Evaluation instability may cause absolute worst-case errors in recovery values that are much larger than machine precision, even in the order of the attainable recovery error. Then this looks like preventing further convergence. This effect is well-known from Runge phenomenon examples that may be exponentially convergent in theory, but useless numerically.
If the above functional is a point evaluation , one can call (18) a pointwise evaluation instability factor. If the functional varies in the dual of a function space , the left-hand side of sup 1 (19) is the evaluation norm instability for the recovery measured in . If contains functions with high derivatives, this also accounts for the instability of derivatives.
Assume that the numerical recovery is not using the formula (8) that is directly based on the data. If some basis 1 is chosen for data 1 and the linear system is solved for coefficients 1 , the evaluation formula now is 1 .
The worst-case amplification factor for absolute errors in this evaluation formula will be 1 in the norm chosen on . This does not account for the possible stability problems when solving the system (20) for the coefficients 1 . No matter how exact the are, the above sum will be intolerably large if the are intolerably large. This frequently happens for badly chosen bases, e.g. for monomials and translates of kernels. But the evaluation stability of (8) often comes out much smaller, e.g. in for Chebyshev polynomials on Chebyshev points, or for kernel-based interpolation [19]. At this point, the chosen norm comes into play. If (19) is evaluated for kernel-based interpolation using the native space , the results of [1] and Section 10.8 usually give very large results, while [19] deals with the norm and proves a good evaluation stability via boundedness of Lagrangians on asymptotically uniformly placed data locations.
We add two other ways to see how norms of Lagrangians enter into the evaluation stability. The norm of the interpolation as a map from data to functions in is blown up for large -norms of Lagrangians due to max sup 0 sup 0 max .
Summing up (17) in the interpolatory case, we get a trade-off principle that lets the final factor grow when the error is small. Here, the norms in are running over the , and we allow 1 1 1. If we repeat the above argument in more generality using bump functions, shows that control over norms of bump functions implies control over the , and we saw above that these blow up absolute errors in the input data. Going on like above, the evaluation is bounded above by (22) with 1 1 1 and where the norms on run over the values. The bound factors into the linear influence of and and keeps the final factor as something like a Lebesgue constant.
In a somewhat sloppy formulation, we summarize the above arguments as a tradeoff principle

Small errors imply large evaluation instabilities
that tacitly assumes that errors are measured via norms in while evaluation instabilities are measured via norms in , the two being dual in the sense of Section 6.
Dealing with a non-dual situation, in particular with a weaker notion of evaluation instability, requires much more machinery. A typical case is evaluation stability governed by norms of Lagrangians for point evaluation data . This is the standard path along Lebesgue functions and Lebesgue constants. For univariate polynomial interpolation, this approach reveals that equidistant points have an exponential instability, while Chebyshev-distributed points only have a logarithmic instability. For kernel-based interpolation of function values, [19] proved uniform boundedness of Lagrangians if point sets are asymptotically uniformly distributed and if kernels have finite smoothness.
Here is a caveat. As long as there is no reasonable error bound yielding a Power Function, Theorem 1 will be useless. There is no strict Trade-off Principle under these circumstances. However, looking at evaluation stability without a bound on the error still gives very useful information.
Remark Certain numerical techniques factorize the map from (7) like . This occurs for changes of bases, e.g. the transition to Newton bases for polynomials or kernel-based recovery [30]. The norm-based evaluation instability will only change by and not produce any improvement. The advantages lie in different numerical algorithms that may have a better numerical stability.

Optimal recovery of functions
Like in the introductory example, there are interpolatory cases where the Lagrangian satisfies (14) and (17) with equality. Then, by (15), the norms of the Lagrangians are minimal under the norms of all bump functions, and the recovery process has minimal error. Though simple to prove, this seems to be a novel insight. Under certain conditions satisfied for splines and kernel-based interpolation, this holds systematically and in both directions:

Theorem 2 Assume that an interpolatory recovery process satisfies minimumnorm and bounded-error properties, i.e.
inf and for all .

Examples of interpolatory recoveries
This section illustrates how the Trade-off Principle works under various circumstances. For the univariate cases in this section, we treat interpolation in 1 1 of values of functions on points 0 and evaluation at some point . The functionals are functionals, and everything can be expressed via the points. Other cases stay with the original formulation via general functionals .

Connect-the-dots
The simplest univariate case is connect-the-dots piecewise linear interpolation, but it has no choice of a space yet. The simplest is 1 1 under the sup norm, the Lagrangians being hat functions, with constant extensions to the boundary, if boundary points are not given. Then there is no proper error bound, and (17) consists of all ones.
If we keep the interpolation method as is, we can go over to zero boundary values and 1 0 1 1 under the sup norm of the first derivative. An add-one-in Lagrangian 1 based on three adjacent increasingly ordered points 1 will have the norm min If takes the 2 norm of first derivatives, we are in a standard spline situation [20] and Theorem 2 applies. This illustrates that the Trade-off Principle works locally and for all possible norms when the recovery problem is fixed to be connect-the-dots.

Taylor data
Here is a rather academic but mathematically interesting case. Take a space of univariate real-valued functions on 1 1 that have complex extensions being analytic in the unit disc, and consider Taylor This generates a Hilbert space of functions whose reproduction formula is the Taylor series, see [21] for plenty of examples, including Hardy and Bergman spaces. The interpolation here is just a partial sum of the Taylor series, while it works by kernel translates in [21]. Now the monomials are the Lagrangians for the , with norms 2 2 . And Theorem 2 holds because we just chop the Taylor series. Consequently, inequalities (14) and (17) are satisfied by equality, and we also know that the Power Function in the leave-last-out form is 1 .
We treat the add-one-in case in the next section.
Note that all cases will behave like that if they take expansions into series of Lagrangians as their underlying space , with weights for the expansion coefficients.

Orthogonal series
Now assume that a space carries an inner product and allows an orthonormal basis because Theorem 2 holds.

Recovery in Hilbert spaces
Assume that a set of linearly independent data functionals 1 from the dual of some Hilbert space is given. This approach contains generalized finite elements using 1 0 and 2 for test functions . Symmetric kernel-based collocation methods for solving PDEs are also covered [16]. It satisfies Theorems 1 and 2 after minimization in (15). The optimal recovery by optimizing the left-hand side of (15)  if the are assumed to be orthonormalized. Details are left to the reader.

Splines
These are cases where Theorem 2 applies, if they are written in their Hilbert space context. Power Functions can be calculated via reciprocals of norms of Lagrangians. But the theory of this paper allows plenty of nonstandard approaches to splines as well, using different norms.

Polynomial interpolation
Here, the choice of the space needs special treatment, but we keep the data being values at points 0 forming a set to enable exact interpolation by polynomials of degree or order 1. The classical way to deal with this is to take 1 1 1 and to concentrate on the 1 -st derivative only, i.e. taking the seminorm 1 . This brings us back to the introductory example in Section 1. See how (2) works locally, up to the limit 1 0 in case . Choosing other norms will lead to different results. When restricted to polynomials, the next section treats a special case.

Norms via expansions
Our univariate model case here is dealing with functions in chebfun style (T. Driscoll et al. [22]), where is a space of functions on 1 1 having expansions 0 into Chebyshev polynomials, and where the norm takes nonnegative weights of the coefficients, e.g. The add-one-in Lagrangian arises as 1 1 and if rewritten as  Remark As long as the recovery map , the evaluation functionals and the chosen space are fixed, there is no escape from the Trade-off Principle in the above form by changes of bases, because both ingredients are basis-independent. Of course, basis changes help, but in a different way. For instance, choosing a special basis and proceeding via (21) may give a much better evaluation stability than choosing simpler bases. The Trade-off Principle for strong norms is unaffected. In general, one can sacrifice small errors for better evaluation stability by changing the recovery strategy or the chosen norms.
Remark Once the functionals are fixed, one can vary the kernel, with respect to smoothness and scale. The Trade-off Principle will hold as an equality in all cases if the norm is defined via the native space for the kernel and the scale.

Unsymmetric case
The previous two sections still used interpolation and Lagrangians. But there are much more general cases, e.g. for PDE solving by unsymmetric meshless methods. In the latter case, users have no freedom to choose the data functionals, because they are prescribed by the PDE to be solved. The functionals will generate boundary values or values of the differential operator in the interior. We further assume that the user prefers a certain sort of trial functions that should finally approximate the true PDE solution very well. In cases with well-posedness in the sense of Real Analysis, it suffices to come up with such a solution even if there is no uniqueness of the recovery procedure [5].
Before we look at the trial space, recall that optimal Power Functions are purely dual objects, min not depending on trial spaces, and will always outperform other solutions, error-wise. Norm-minimal bump functions will also not be dependent on trial spaces. and if restricted to some trial space, their norm will not be minimal. In view of a Trade-off Principle, this means that non-optimal recovery methods will sacrifice smaller errors for larger stability. Anyway, we now consider a set of data functionals 1 and set of trial functions 1 from some normed space of functions, spanning a subspace . These two ingredients determine a generalized Vandermonde matrix of size with entries that is in the theoretical background, though certain algorithms will never generate it as a whole. We also assume that there may be an unknown numerical rank that limits the practical use of the matrix as is. This occurs in plenty of kernel-based methods, and even in square cases there may be a rank loss that occurs while the matrix condition in the sense of MATLAB's condest is still tolerable.
There are many ways to deal with this situation, and here we assume that the practically applied technique uses an matrix that calculates coefficients for the trial space basis for a given data vector . By an -vector of the basis functions, the result is a function , and evaluation of a functional has the error 1 1 .
Bump functions are not necessarily connected to the trial space chosen. If there exists a bump function , the Trade-off Principle (14) applies for the above Power Function. The next section will treat a special case in more detail, because it has a huge background literature in applications.

Unsymmetric collocation
An important example for solving PDEs via a recovery of functions is unsymmetric collocation, named after Edward Kansa [24]. Here, we confine ourselves to a standard Poisson problem (5) discretized as (6) for simplicity. One chooses a reproducing kernel Hilbert space of functions on that matches the expectable smoothness of the solution, and implements the PDE via test functionals, as sketched in Section 3. The functionals in (6) may be renamed as 1 to match the notations used above. But note that will usually exceed .
Symmetric collocation takes a space of trial functions where these test functionals act on the kernel , and this is an optimal recovery strategy [16] in the space , with good convergence properties [25,26]. The trade-off principle for this was treated in Section 10.8.
The unsymmetric approach takes a set of trial functionals 1 to generate trial functions 1 .
The notation is now like in Sections 10.7 and 11, but we have not yet specified how we choose the matrix of (24). For calculation of the Power Function, we use (24), define the quasi-Lagrangians from (25) and get 2 2 in self-evident kernel matrix notation and . The Power Function for symmetric collocation replaces by the solution of the system and therefore realizes the minimum of the quadratic form over all possible vectors . Figure 2 shows squares of Power Functions for unsymmetric collocation of a Poisson problem with Dirichlet data on the unit square. The setting has 121 regular interior points, 16 regular boundary points, 121 regular trial points and uses a Matern-Sobolev kernel of order 5 at scale 1. The matrix was the pseudoinverse of the generalized Vandermonde matrix . The corresponding squares of optimal Power Functions from symmetric collocation are in Fig. 3. They are not substantially smaller, just by a factor of about 1 2.
Norm-minimal bump functions exist and have norms that are related to the optimal Power Function via Theorem 2 by equality in (16) and (17). They are Lagrangians for the symmetric setting. Therefore, the right-hand sides of these inequalities get larger when the Power Function is inserted. The reciprocals of squared norms of the optimal bump functions are visualized in Fig. 3, because they coincide with the square of the optimal Power Function. Special bump functions in the trial space of the unsymmetric case will usually not exist if . But it may be an advantage of the unsymmetric technique that its evaluation is based on quasi-Lagrangians instead of Lagrangians.  where is the standard kernel matrix for evaluations in trial points [23]. Figure 4 shows the reciprocals of these, in order to compare with the squared optimal Power Functions in Fig. 3. These results come out only on the data locations, without any plotting refinement. The values are larger by about a factor of 18 than those of the square of the optimal Power Function, indicating that the squared norms of the quasi-Lagrangians are smaller than those of the Lagrangians for the symmetric case by a factor of 1 18.
Summarizing, unsymmetric collocation works at a larger error level than symmetric collocation, but gets better evaluation stability by using quasi-Lagrangians. The norms are kept fixed.

Regularization of linear systems
Assume an overdetermined linear system with an matrix . The spaces and then are . After a Singular Value Decomposition, the new In the square nonsingular case, the Power Function is zero and there are no bump functions, leading back to the excluded 1 0 situation.

Greedy methods
Assume the add-one-in situation, and consider an optimal to be added to for an extended problem. In view of the Trade-off Principle, one should either take to maximize or to minimize . In cases satisfying Theorem 2, these strategies coincide. This aims at good stability and uses new functionals that cope with the current maximal error to make it zero in the next step. During a greedy method like this, the product in (14) stays above 1 and the first factor goes to zero, while the second increases. Again, this shows that large evaluation instabilities do not contradict convergence.
For interpolation of function values by polynomials, this leads to Leja points ( [27], see also the survey by St. De Marchi [28]), while for kernel-based interpolation this is the -greedy method of [23]. Under certain additional assumptions, these strategies are approximately optimal in the sense that they realize -widths (G. Santin and B. Haasdonk [29]), i.e. they generate trial spaces that are asymptotically optimal under all other trial spaces of the same dimension. They can be combined with the construction of Newton bases on-the-fly [30], but we omit further details.

Outlook and open problems
The technique used in this paper is very elementary, and it is possible that there are earlier results on Trade-off Principles. On instance is [31] by Platte et al. proving that exponential convergence of polynomial interpolants implies exponential instability in case of equidistant points. Though the instability notion is different there, this can be seen as a case where a convergence rate is connected to a comparable rate of instability. This paper proves in general that all convergence rates have at least their exact counterpart in rates of evaluation instability when using dual norms. Such rates will change, e.g. when point locations are changed.
There are many more cases that fit into this paper, e.g. Finite Elements or spaces of multivariate splines. If errors are decreased by extended smoothness properties, there always will be an increasing evaluation instability. The connection between smoothness properties and convergence rates is a well-known Trade-off Principle in Approximation Theory, holding for several important cases, but a general theory seems to be lacking.
The same holds for a hypothetical Trade-off Principle suggesting that strongly localized methods cannot have small errors and/or large smoothness.
Handling the non-dual case is an open problem as well, in particular for evaluation stability.
If users have strong reasons to insist on very good accuracy and on evaluations of high derivatives, they have to face serious evaluation instabilities. Then it is a challenge to cope with these, including regularizations and other changes to the recovery map . The literature on kernel-based methods provides several of such techniques, e.g. Contour-Padé [32], RBF-QR [33], and RBF-GA [34] by the group around Bengt Fornberg, and Hilbert-Schmidt-SVD by Fasshauer/McCourt [3,Chapter 13]. Greedy methods from Section 12 fight evaluation instability by choosing functionals or evaluation points adaptively.