On the stability of the representation of finite rank operators

The stability of the representation of finite rank operators in terms of a basis is analyzed. A conditioning is introduced as a measure of the stability properties. This conditioning improves some other conditionings because it is closer to the Lebesgue function. Improved bounds for the conditioning of the Fourier sums with respect to an orthogonal basis are obtained, in particular, for Legendre, Chebyshev, and disk polynomials. The Lagrange and Newton formulae for the interpolating polynomial are also considered.


Introduction
The representation of a continuous function in a finite dimensional space depends on the choice of a basis.A first possibility consists of expanding the function in terms of a Lagrange basis with respect to some set of nodes so that the coefficients with respect to the basis are values of the function.The representation in terms of the Lagrange basis with respect to the Chebyshev sites turns out to be a very stable representation of polynomials (see pp. 12-15 of [2]).
A related problem is the representation of operators of finite rank in terms of a given basis.The Lebesgue function provides a pointwise bound for the error propagation of an operator independently of the formula used for evaluation.However, the choice of a basis for expressing the operator might provide worse stability results for the evaluation than those predicted by the Lebesgue function.Some operators, such as orthogonal projections, have good stability properties if an orthogonal basis is used [5].In order to measure the stability properties, we introduce a condition number associated with the representation.
The bound provided by the Lebesgue function is lower than the conditioning of any representation of the operator with respect to a basis.It would be an ideal situation to have an evaluation formula for the operator whose condition is exactly the Lebesgue function.However, many common formulae for the operator in terms of an orthogonal basis give rise to a condition number higher than the Lebesgue function.If the conditioning is closer to the Lebesgue function, then the corresponding representation is more stable.In the case that the conditioning coincides with the Lebesgue function, the representation is optimal.The Lagrange formula for the interpolation operator is optimally stable as we will show in Section 5 (see also [3]).In contrast, the Newton formula may present numerical instability [1].
In previous research, a different conditioning for comparing different representations of an operator has been considered [3][4][5][6]8].However, this conditioning tends to overestimate the instability, especially when dealing with Fourier representations with respect to orthogonal polynomials, as we shall see later.The conditioning proposed in this paper is sharper in the sense that it resembles more closely the Lebesgue function.This conditioning might be harder to compute in some cases, but it can be easily bounded in the case of orthogonal projections, giving rise to sharper and practical bounds.
In Sect.2, we introduce a conditioning κ(x, B, B −1 ) of a basis B, and we show that it is smaller than the condition number cond(x, B, B −1 ) used in [3].In fact, Example 1 illustrates that it can be considerably smaller.In Sect.3, we extend the proposed conditioning to the case of representations of a continuous linear operator on the space of continuous functions on a compact domain with finite rank κ(x, B, ).We compare it with the condition number cond(x, B, ), discussed in [5], and prove that κ(x, B, ) ≤ cond(x, B, ).We also show that both conditionings are invariant under reordering or rescaling of the basis.Moreover, both conditionings coincide in the case where the functionals associated with the representation are nonnegative.In Sect.4, we consider the conditioning of least squares problems.The Christoffel function relates the values of the Fourier sum operator S n [ f ](x) with f 2 , the norm associated with the scalar product.This relation allows us to provide practical bounds for the conditioning κ(x, P, ) of the representation of S n in terms of an orthogonal basis P. In Theorem 1, we provide bounds for both conditionings, and we can say that the bound for κ(x, P, ) is lower than the corresponding bound for cond(x, P, ).We describe some relevant examples, considering Legendre polynomials, Chebyshev polynomials, and disk polynomials.In the three cases, the bounds for κ(x, P, ) considerably improve the bounds for cond(x, P, ).Section 5 focusses on the conditioning of Lagrange interpolation.The representation of the interpolation operator with respect to the Lagrange basis L and the evaluation functionals X is optimal, because κ(x, L, X ) coincides with the Lebesgue function.Moreover, both condition numbers coincide, κ(x, L, X ) = cond(x, L, X ).In Sect.6, we consider the conditioning of the Newton representation of the interpolating polynomial.In this case, the conditioning depends on the ordering of the nodes.We characterize the orderings such that both conditionings coincide.As a consequence, if the nodes are increasingly ordered or, more generally, if they follow a central ordering with respect to a center (see [4]), both conditionings coincide.Finally, Sect.7 considers the discrete case, which can be analyzed as a particular case of a Lagrange interpolation operator.

Conditioning of a basis
Let K be a compact domain in R d .Let U = b 0 , . . ., b n be the vector space generated by (b 0 , . . ., b n ) with linearly independent b 0 , . . ., b n ∈ C(K ).Then, the linear mapping can be regarded as a basis of U whose inverse B −1 is the corresponding coordinate mapping.Let us denote by π i (u) := (B −1 u) i , i = 0, . . ., n, the coordinate projections.Each function u ∈ U can be written in terms of the basis In order to compute u(x) expressed in terms of a given basis, we evaluate each of the terms π i (u)b i (x) and sum up all of them.Since the computation of each coefficient π i (u), i = 0, . . ., n, can be affected by an error ε i , we shall obtain instead So, we can assume that the computed value is the exact expression of a perturbed function u(x) + e(x), where the perturbation e(x) = n i=0 ε i b i (x) belongs to the space U .The sign of the errors in the coefficients is difficult to predict.In the worst of the cases, when we evaluate the function u at a given point x, all summands may have the same nonstrict sign, for instance, ε i b i (x) ≥ 0 for all i = 0, . . ., n.So, the size of the perturbation can reach the following upper bound and bound the size of the perturbation in terms of the norm of the error vector ε = (ε 0 , . . ., ε n ).So, n i=0 |b i (x)| gives a bound for the relative error |e(x)|/ ε ∞ .
The size of the error of a coefficient depends on how the coefficient has been computed.However, the previous bound does not reveal the influence of the error i of each coefficient to the error propagation e(x).Moreover, the starting point in some problems is a perturbed function, and we want to measure how the perturbation might affect the evaluation of the function in terms of a given basis.For this purpose, we note that ε i = π i (e), i = 0, . . ., n, and then n i=0 In order to measure the size of the error, we introduce Since any e ∈ U with e ∞ = 1 can be expressed in the form for some c = (c 0 , . . ., c n ) = 0, we can write Proposition 1 Let B be a basis of an (n +1)-dimensional space of continous functions U defined on a compact set K and let π 0 , . . ., π n be the corresponding coordinate projections.Let us define where π i ∞ := sup e∈U , e ∞ =1 |π i e|.Then, we have Proof Since for each e ∈ U we have we can write The conditioning cond(x, B, B −1 ) has been used in previous papers [3][4][5].We want to show that, in some cases, cond(x, B, B −1 ) is much bigger than κ(x, B, B −1 ) and so, cond(x, B, B −1 ) overestimates the error propagation of the representation of a function in terms of a basis.

Conditioning of an operator
The norm of a continuous linear operator on the space of continuous functions on a compact domain is also called the Lebesgue constant.We can introduce a Lebesgue function as |T [e](x)|.
The following result proves that the supremum value of the Lebesgue function coincides with the Lebesgue constant.
Proposition 2 Let T : C(K ) → C(K ) be a continuous linear operator on the space of continuous functions defined on the compact domain K .Then, we have T [e] ∞ = T ∞ , we deduce that sup On the other hand, for each e ∈ C(K ) with e ∞ = 1, we have that Then, we deduce that and we can obtain the representation of T with respect to a basis B : R n+1 → U of the form Defining := B −1 • T , we have that f = (φ 0 f , . . ., φ n f ), where φ i : C(K ) → R is the linear functional obtained applying the i-th coordinate map to T [ f ] In this way, the relation T = B • can be understood as a way of representing the operator by choosing a basis (b 0 , . . ., b n ) and a system of functionals (φ 0 , . . ., φ n ).
Definition 2 Let U be a finite dimensional subspace of C(K ) with dim U = n + 1 and let B : R n+1 → U be a basis mapping for U .Let : The conditioning κ(x, B, ) can be regarded as a pointwise bound for the error in the computation of the operator T = B • expressed in terms of the basis B, relative to the size of any perturbation e ∈ C(K ).

Proposition 3 Let T : C(K ) → C(K ) be a continuous linear operator of finite rank
We also have the following inequality relating κ(x, B, ) and the Lebesgue function Proof Let φ i : C(K ) → R be the components of , i = 0, . . ., n, and let e = f − f be the perturbation function.Since T [e] = n i=0 φ i (e)b i , we can write By the above inequality, we have that and Note that, in contrast to κ(x, B, ), the Lebesgue function λ(x; T ) depends only on the finite rank operator and not on the choice of the basis.Different bases B may lead to different conditionings κ(x, B, ).If κ(x, B, ) is close to λ(x; T ), the basis B provides a quasi-optimal conditioned representation of the operator at the point x.
In [5], the following measure for the conditioning was introduced: Let us show that κ(x, B, ) provides a measure of the conditioning sharper than cond(x, B, ).
Proposition 4 Let U be a finite dimensional subspace of C(K ) with dim U = n + 1 and let B : R n+1 → U be a basis mapping for U .Let :

, n, are continuous linear functionals defined on C(K )
. Then, we have the following inequality: For a given operator T : we can compare different representations with respect to different bases.If B and B are two basis mappings of U , we can write We have already mentioned that the conditioning depends on the choice of the basis.A reordering of the elements of a basis corresponds to a reordering of the associated functionals.In the same way, a rescaling of the basis bi (x) = k i b i (x), implies a rescaling of the functions φi = 1 k i φ i .Let us show now that the conditionings cond(x, B, ) and κ(x, B, ) are invariant under reordering or rescaling of the basis.

, n, be a sequence of nonnegative linear functionals and B be a basis mapping of a finite dimensional space U
and, using Proposition 4, the result follows.
In some cases, the basis B of the space U can be chosen such that all functions b i attain its maximum absolute value at the same point.Legendre and Chebyshev polynomials form relevant bases of the space of polynomials of degree not greater than n and attain its maximum absolute value on [−1, 1] at x = 1.

Proposition 7
Let U be a finite dimensional subspace of C(K ) with dim U = n + 1 and let B : R n+1 → U be a basis mapping for U such that there exists x 0 ∈ K such that all basis functions attain its maximum at the same point x 0 ∈ K ,

Conditioning of least squares problems
In this section, orthogonal projections arising in least squares problems will be considered.We first deduce some general properties of the conditioning of projectors.
where I is the identity map from R n+1 → R n+1 .In other words, the system of functionals (φ 0 , . . ., φ n ) is dual to the basis (b 0 , . . ., b n ) in the sense that where δ i j is the Kronecker symbol.This implies that the restriction of to the space U coincides with the coordinate mapping B −1 , that is, The following proposition shows that the condition of the representation of a projector is always greater than or equal to the corresponding condition of the basis.
Proposition 8 Let T : C(K ) → C(K ) be a projector on a finite dimensional space U and let B be a basis mapping of U .
Proof Let π 0 , . . ., π n be the coordinate mappings with respect to B. Since T is a projector, then φ i u = π i u for all u ∈ U , i = 0, . . ., n.Then In the same way, we deduce that Let K be a compact set of R d and μ be a nonnegative regular Borel measure with 0 < μ(K ) < ∞.Let us define the semidefinite symmetric bilinear form and If the bilinear form is positive definite on the finite dimensional subspace U , then the best approximation exists, it is unique and it is characterized by the property that the error is orthogonal to the space U .If P = ( p 0 , . . ., p n ) is an orthogonal basis of U , then the solution of the least squares problem can be given as the n-th Fourier sum Introducing the Christoffel-Darboux kernel , we can express the n-th Fourier sum in the form Using the Cauchy-Schwarz inequality we deduce that the values of the Fourier sum at x and the norm f 2 for f ∈ C(K ) can be related by the Christoffel function 1/K n (x, x) (see Theorem 3.6.6 of [7]).Since the measure of K is finite, we have that and the following upper bound for the Lebesgue function in terms of the Christoffel function follows The next result shows that the same bound can be deduced for the conditioning κ(x, P, ) for the Fourier sum expressed in terms of an orthogonal basis P.
Theorem 1 Let K be a compact set of R d and μ be a nonnegative finite Borel measure.Let P = ( p 0 , . . ., p n ) be an orthogonal basis of a space U with respect to the bilinear form f , g := K f (x)g(x)dμ(x) and , i = 0, . . ., n. Let , be the Christoffel-Darboux kernel associated to the basis P.Then, the Lebesgue function of the Fourier sum operator and we have the following bounds for the conditionings , be the Christoffel-Darboux kernel associated to the orthogonal basis P. The Lebesgue function of the projector S n can be written in the form

y)e(y)dμ(y) .
Choosing for each x ∈ K a sequence e n (y) of functions in C(K ) with e n ∞ = 1 converging to sign(K (x, y)), where x > 0, we deduce that λ(x; By the Riesz representation Theorem (see Theorem 6.19 of Chapter 6 of [9]), , Using the Cauchy-Schwarz inequality | p i (x)|.
From the Cauchy-Schwarz inequality, we obtain Using Bessel inequality and the fact that |e(x)| ≤ 1, for all x, we deduce that Then, the bound (4) follows.
We remark that the bound in (4) is lower than the bound in (3) because .
Now, let us analyze some common Fourier approximations with respect to different scalar products, giving rise to classical orthogonal polynomials.
In the case that μ is the Lebesgue measure on [−1, 1], we can take the basis of Legendre polynomials P = (P 0 , . . ., P n ).Taking into account that we have that cond(x, P, ) = From the fact that the Legendre polynomials attain its maximum value P n (1) = 1 at x = 1 (see Section 7.2 of [10]), we deduce from Proposition 7 that max (see Proposition 2 of [5]).Using Proposition 7, we deduce that κ(x, P, ) attains its maximum value at x = 1, and from (4), we get max We remark that which implies that the bound for cond(x, P, ) is higher than the bound for κ(x, P, ).
Example 2 Let us consider the Fourier sum that associates to each function its best approximation in the least squares sense in the space of polynomials of degree not greater than 1 Let us first compute the Lebesgue function of S 1 .By Theorem 1, Using the change of variables τ = −t, we deduce So, the Lebesgue function is an even function.
The continuous linear functionals for the representation of S 1 with respect to the basis of Legendre polynomials P 0 The corresponding norms are From Proposition 3, we deduce that So, we have that λ(x; S 1 ) = κ(x, P, ) < cond(x, P, ).
Thus, we can say that the basis of Legendre polynomials has the optimal conditioning κ(x, B, ) = λ(x; T ).However, the conditioning cond(x, P, ) does not show the good behavior of the basis.By Proposition 7, the maximum value of both conditionings is attained at x = 1 Chebyshev polynomials T = (T 0 , . . ., T n ) are orthogonal with respect to dμ( The norm of the functionals τ i f = f , T i / T i 2 can be computed from the fact that T 0 1 = π and T n 1 = 2, for n ≥ 1, So, we find again that the bound for κ(x, T , τ ) is lower than cond(x, T , τ ).Since Chebyshev polynomials attain its maximum absolute value at x = 1, we deduce from Proposition 7, that the maximum conditionings κ(x, T , τ ) and cond(x, T , τ ) are attained at Disk polynomials (cf.section 2.6 of [7]) can be used for approximation of functions in the disk D = {(r cos θ, r sin θ)|0 ≤ r ≤ 1; θ ∈ [−π, π]}.An orthogonal basis of Gegegenbauer-like orthogonal polynomials with respect to the measure where and P (α,m) j denotes the usual Jacobi polynomial of degree j in [−1, 1].In Theorem 3 of [6], the conditioning of the basis has been computed as where The integral defining h α j+m, j can be expressed in terms of the square of the norm of the usual Jacobi polynomials (see formula (8) of [6]) giving rise to where (t) j := t(t + 1) • • • (t + j − 1) denotes the usual Pochhammer symbol.For j = m = 0, we have h α 0,0 = 1.First, we provide the values of the norm of the basis (see the proof of Theorem 3 of [6]) We now compute a bound for κ(r , θ, Z α , α ) using formula (4). .
Since the polynomials Z j,m attain their maximum value at the boundary r = 1, we get that R α j,m (r ) ≤ R α j,m (1) = 1 and deduce the following bound max Since h α j+m, j is a decreasing function of α, α ∈ [0, ∞), we deduce that the least bound is attained for the Zernike polynomials corresponding to α = 0 max and obtain the bound In Proposition 2 of [6] it was shown that max .
With an analogous reasoning, we get that the bound is a decreasing function of α ∈ [0, ∞) and the least bound is obtained for Clearly, the bound for κ(r , θ, Z 0 , 0 ) is lower than the bound for cond(r , θ, Z 0 , 0 ).

Conditioning of Lagrange interpolation
An interesting case of a projector is the Lagrange interpolation operator.Given a sequence of distinct nodes X = (x 0 , . . ., x n ) with x 0 , . . .
has a unique solution in U , we can define the operator which associates to each f ∈ C(K ) its unique interpolant in U at the sequence of nodes.Let l 0 , . . ., l n ∈ U be the fundamental solution associated to the sequence of nodes, that is where δ i j is the Kronecker symbol.Then, we can express the operator in terms of the Lagrange basis The functionals X := L −1 • T associated to the Lagrange representation of the interpolant L[ f ] are the evaluation functionals at the nodes Theorem 2 Let U be a subspace of C(K ) with dim U = n+1 and let X = (x 0 , . . ., x n ) be a sequence of nodes such that the Lagrange interpolation problem has a unique solution in U .Let T be the Lagrange interpolation operator, L be the Lagrange basis and let X = (x 0 , . . ., x n ) be the evaluation functionals at the nodes, If, in addition, the constant functions belong to U , we also have that Proof By propositions 3 and 4, we have that The evaluation functionals have unit norm, and we can write Given ξ ∈ K , let u ∈ U be the solution of the interpolation problem Then, we have that e ∞ = 1 and e ∈ C(K ).For this function, we can write In particular, | is attained and we deduce that for each ξ ∈ K .So, we have shown (7).
Since the interpolation operator is a projection, we deduce from Proposition 8 that cond(x, L, L −1 ) ≤ cond(x, L, X ) and κ(x, L, L −1 ) ≤ κ(x, L, X ).Let π 0 , . . ., π n be the coordinate projections corresponding to the Lagrange basis.If 1 ∈ U , by Proposition 1, we have that Finally, if 1 ∈ U , we also have that n i=0 l i (x) = 1 and, by the triangular inequality, So, we have shown that the Lagrange representation of the interpolation L[ f ] = n i=0 f (x i )l i has optimal conditioning and that both κ(x, L, X ) and cond(x, L, X ) coincide for this representation.
We observe that the Lagrange representation is a particular case of a representation with respect to a set of nonnegative functionals and that the equality of κ(x, L, X ) and cond(x, L, X ) could also be obtained by direct application of Proposition 6.
Let us compute the conditionings of the representation of the interpolant L[ f ] with respect to any other basis.Let us denote skeel(A) := |A −1 ||A| ∞ , the Skeel condition number, where |A| stands for the matrix whose entries are the absolute values of the entries of A.
Theorem 3 Let U be a subspace of C(K ) with dim U = n+1 and let X = (x 0 , . . ., x n ) be a sequence of nodes such that the Lagrange interpolation problem has a unique solution in U .Let T be the Lagrange interpolation operator and B : R n+1 → U be a basis mapping.Let = B −1 • T be the corresponding set of functionals of the representation of T with respect to the basis B. Let b 0 (x), . . ., b n (x) be the basis functions associated to B and let be the collocation matrix of the basis B at the set of nodes X .Then, we have and Proof The matrix M(B, X ) is the matrix of change of basis between (b 0 , . . ., b n ) and the Lagrange basis (l 0 , . . ., Let us observe that the basis mapping corresponding to (b 0 , . . ., b n ) is given by The corresponding functionals = B −1 • T can be expressed in terms of the inverse of the matrix M(B, X ) Using the notation M(B, X ) −1 = (m i j ) i, j=0,...,n , we can write Proposition 2 of [3] can be immediately generalized to a multivariate setting to derive Then, we deduce that And we obtain a formula for cond(x, B, )

Conditioning of the Newton interpolation formula
Let us now compute the conditioning of the Newton representation of the Lagrange interpolation formula where is the i-th order divided difference of f at the nodes x 0 , . . ., x i and with the convention that ω 0 is the constant polynomial ω 0 (x) = 1.Since we deduce (see Proposition 2 of [3]) that Using formula (10), the conditioning κ(x, ω, d) can be computed in the following way Definition 4 Let x 0 , . . ., x n be a sequence of distinct nodes.We say that x n leaves the other nodes at one side if either Theorem 4 Let x 0 , . . ., x n be a sequence of distinct nodes.Then, we have that if and only if each node x i leaves the previous nodes x 0 , . . ., x i−1 at one side, that is, there exist σ 1 , . . ., σ n such that Proof Let us define s 0 := 1 and Then, we have that From Proposition 4, we deduce that κ(x, ω, d) = cond(x, ω, d).

Definition 5
We say that the sequence (x 0 , . . ., x n ) follows a central order with respect to a center c if the sequence of distances of the nodes to the center is monotonically increasing, that is, If the nodes form a monotonic sequence, we can consider that they follow a central order with respect to the first node x 0 .In fact, if n, also form an increasing sequence.
Corollary 1 Let x 0 , . . ., x n be a sequence of distinct nodes following a central order with respect to a center c.Then, κ(x, ω, d) = cond(x, ω, d).
Proof In order to apply the characterization of Theorem 4, let us show that each x i , i ∈ {1, . . ., n}, leaves the previous nodes at one side.Let σ i := sign(c−x i ) ∈ {−1, 1}.If σ i (c − x j ) > 0 for some j ∈ {0, . . ., i − 1}, then c leaves x i and x j at the same side and, since both nodes are distinct, we have that |x i − c| > |x j − c| and that is, x i leaves all previous nodes at the same side as c.
In [4], some nice properties of the central ordering were described.In particular, for equidistant nodes, the central ordering with respect to the center of the interval provides lower bounds for the conditioning of the Newton formula than the corresponding bounds for increasing nodes.
An ordering of the nodes giving rise to conditionings that are relatively close to the Lebesgue function is the central ordering with respect to the evaluation point (see Section 4 of [4]).Using Corollary 1, we also deduce that for evaluation of the Newton formula using nodes following a central ordering with respect to x, that is, Let us illustrate with an example that κ(x, ω, d) can be lower than cond(x, ω, d).
Evaluating at x = 1, we have The maximum value of κ is attained at x = 3/2, and we have

Conditioning in the discrete case
The discrete case corresponds to K = {x 0 , . . ., x n }, where x 0 , . . ., x n are distinct points in R d .Each real function f defined on K can be completely described by the vector ( f (x 0 ), . . ., f (x n )) ∈ R n+1 .The functions l j ∈ R K defined by l j (x i ) = δ i j , j = 0, . . ., n, form a basis of C(K ) = R K corresponding to the basis mapping f (x j )l j (x), the coordinate projectors π i are the evaluation functionals π i f = f (x i ), i = 0, . . ., n.
Considering the whole set K as a set of nodes, the functions l 0 , . . ., l n can be regarded as the Lagrange basis with respect to the Lagrange interpolation problem, find u ∈ R K such that u(x i ) = f (x i ), i = 0, . . ., n.
This problem has a unique solution and the interpolation operator T : R K → R K is the identity mapping because the interpolation space coincides with the whole set C(K ).

e
∞ := max x∈K |e(x)|.The quantity sup e∈U , e ∞ =1 n i=0 |π i (e)b i (x)| can be regarded as a pointwise bound for the error in the computation of a function in U at x with respect to a given basis B, relative to the size of the perturbation e ∞ .This suggests the following definition.Definition 1 Let B be a basis of a (n + 1)-dimensional space of functions U .The conditioning of B at a point x of the domain K is