Some uses of the field of values in numerical analysis

In this expository paper we illustrate the role that the field of values (or numerical range) of a matrix plays in connection with certain problems of numerical analysis. These include the approximation of matrix functions and the convergence of preconditioned Krylov subspace methods for solving large systems of equations arising from the discretization of partial differential equations.


Introduction
The field of values, or numerical range, of a matrix (or operator in Hilbert space) is a well-studied object in linear algebra and functional analysis [28]. Some of its fundamental properties were identified by Hausdorff and Toeplitz over a century ago [30,45]. In recent decades, the field of values has become increasingly important in numerical analysis, in particular in certain problems of numerical linear algebra involving functions of matrices and iterative methods for solving large systems of linear equations. In such problems one has to deal with sequences of matrices of increasing (potentially unbounded) dimension. For instance, the matrices may arise from the discretization of differential or integral operators, and their dimension tends to infinity as the discretization is refined; in other cases the discretization is fixed, but the size of the computational domain may increase without bounds. Analyzing the behavior of algorithms for the approximation of functions of such matrices (or, more typically, for the approximation of the action of matrix functions on vectors) as their size increases is of central importance in numerical linear algebra.
For sequences of normal matrices, the eigenvalues provide all the necessary information to establish the convergence rates of approximation algorithms; indeed, the spectral theorem for normal matrices (and bounded operators) allows one to translate the approximation problem for functions of matrices into one for functions of a real or complex variable, and to make use of classical results from approximation theory. For matrices that "stay close to normal", in a sense that can be made precise, the eigenvalues are still useful indicators of what's going B Michele Benzi michele.benzi@sns.it 1 Scuola Normale Superiore, Piazza dei Cavalieri, 7, 56126 Pisa, Italy 160 M. Benzi on as the dimension grows. If, however, the matrices are far from normal, and particularly if the departure from normality grows as the dimension increases, then it has long been known that the eigenvalues alone are not sufficient to capture many phenomena of interest and may even paint a misleading picture [46].
Consider for example the following two (for simplicity, finite-dimensional) linear dynamical systems, the first one discrete, the second one continuous in time: 1. x k+1 = Ax k + b with k = 0, 1, . . .
Here A ∈ C n×n and b ∈ C n are fixed, and x 0 ∈ C n is prescribed and arbitrary. As is well known, the long-term behavior of both evolution processes is governed by the spectral properties of A. Specifically: In practice, we are interested in the rate of convergence. In the first case, the asymptotic rate of convergence is dictated by the spectral radius of A: In the second case, by the spectral abscissa of A: If A is normal (i.e., unitarily diagonalizable), then ρ(A) and α(A) completely describe the evolution of x k and x(t), not just asymptotically, but for all k = 0, 1, . . . and t ≥ 0, respectively. Indeed, if we denote by · the operator norm induced by the Euclidean norm on C n , we have, by unitary invariance of · , A = ρ(A), and therefore if A is normal and ρ(A) < 1 we have that Likewise, if A is normal and α(A) < 0 we have that e t A = e tα(A) → 0 monotonically as t → ∞.
Hence, in both cases the dynamics is governed at all times by the (extreme) eigenvalues, when A is normal. What happens, however, when A is non-normal? In particular, highly non-normal? Suppose for the time being that A is diagonalizable: A = X DX −1 with D diagonal, for some nonsingular X ∈ C n×n . Then where κ(X ) = inf X X −1 , where the infimum is taken over all nonsingular matrices X that diagonalize A; note that κ(X ) ≥ 1, and that κ(X ) = 1 when A is normal. This quantity is known as the spectral condition number of the eigenbasis of A. Similarly, in the continuous time case we have In numerical analysis one often deals not with a single problem of fixed size, but with sequences of problems of increasing size, usually due to some discretization parameter going to zero. It is easy to find examples of sequences of matrices of increasing size, of interest in applications, for which the condition number of the eigenbasis grows without bounds, even though their spectral radius or the spectral abscissa remain bounded away from their critical values, 1 and 0 (such an example is described in Sect. 5, see (11)). It is clear that in such cases the bounds (1)-(2) are virtually useless when trying to establish the actual rate of convergence: although the right-hand sides of both (1) and (2) eventually approach zero, if κ(X ) is very large then we cannot infer anything on the transient behavior of the quantities on the left-hand side. If A is not diagonalizable, the situation is even worse. Some asymptotic estimates involving the size of the largest Jordan block in the Jordan canonical form of A are known [47, Theorem 3.1], but they are of limited practical use; see also the discussion in [46,Chapter 16].
Informally, A is said to be highly non-normal if it is not diagonalizable or if the corresponding condition number of the eigenbasis, κ(X ), is very large. For matrices like these, the onset of the asymptotic convergence regime may manifest itself only after very large times; in other cases, the bounds (1)-(2) may be so loose as to be uninformative. Thus, the eigenvalues give at best a partial picture of the underlying behavior.
Another limitation of spectral analysis is that the eigenvalues of a non-normal matrix can be highly sensitive to perturbations. For instance, due to unavoidable rounding errors, finite precision approximations of the above linear dynamical systems 1-2 are governed not by the exact spectral radius or spectral abscissa of A but by those of a slightly perturbed matrix A ≈ A. If A has highly sensitive eigenvalues, which is often the case when A is far from normal, it may happen that ρ(Ã) > 1 or α(Ã) > 0, even though the unperturbed matrix A amply satisfies ρ(A) < 1 or α(A) < 0, thus causing divergence or blow up of the computed quantities.
The limitations of eigenvalue analysis become even more apparent when we consider processes that are more complex than the convergence of simple linear (discrete or continuous) dynamical systems. In the rest of the paper we will focus on two such problems from the field of numerical linear algebra.

Two problems in numerical linear algebra
In this section, we briefly introduce two important problems in numerical linear algebra, one concerning functions of matrices, the other one the solution of large systems of linear equations; as we will see, while apparently rather different, the two problems are closely related.

Decay estimates for functions of large matrices
In several applications, given a complex-valued function f defined on the spectrum of A, we are interested in obtaining estimates, or bounds, for the entries of the matrix f (A).
Typically, f is analytic and A is banded or sparse. We say that a matrix is k-banded if a i j = 0 for all i, j with |i − j| > k. For instance, a tridiagonal matrix is 1banded. In the following discussion, one should think of k being fixed, while the dimension n of A grows unbounded (n → ∞). Among the various equivalent definitions of a matrix function, we can use for instance the following one based on contour integration (and due to E. Cartan): where Γ is a contour in C, counterclockwise oriented, containing the eigenvalues of A in its interior, and such that f is analytic inside and on Γ . We refer to [31] for details and other, equivalent, definitions of matrix function. It is frequently observed that while functions of banded or sparse matrices are fully populated, the magnitude of the entries in f (A) that are in some sense far from the nonzero entries in A are often small, and in fact they tend to decay with the distance; see Figs. 1 and 2 for two such examples. Based on this observation, several bounds or estimates for the entries of functions of banded or sparse matrices have been obtained. Typically, when f is analytic and A is banded these take the form of exponential off-diagonal decay bounds: Note that for any fixed matrix A this inequality can always be trivially satisfied by taking K large enough, but here we are interested in non-trivial bounds where the constants K and α > 0 are given explicitly in terms of properties of f and A, such as the location of the singularities of f , the spectral properties of A, and the bandwidth k. Special interest is placed in those cases where K and α are independent of the dimension n. In this case we speak of localization of the entries of f (A). We refer to [6] for an extensive survey of matrix localization, and to the recent thesis [42] for further results and applications. When A is sparse, but not necessarily banded, the bounds take the form where d(i, j) is now the geodesic distance between nodes i and j, i.e., the length of the shortest path joining nodes i and j in the graph G(A) associated with A, where there is an edge between node i and node j if and only if a i j = 0. Note that this is a genuine distance only if A is structurally symmetric (i.e., G(A) is undirected).
An example of such a bound is the following one from [11]: if A = X DX −1 is diagonalizable and sparse, then Here the positive constants K 0 and α depend only on the distance between the singularities of f (if any) and the spectrum of A and on the maximum of | f | on the boundary of a region F ⊂ C containing the eigenvalues of A and such that f is analytic on F . Hence, (5) is not a single bound but a family of bounds, parameterized by the choice of F . There is a trade-off involved: taking a larger set F may lead to faster exponential decay (larger α), but K 0 will also become larger. If f is entire, the set can be chosen arbitrarily, leading to superexponential decay estimates; that is, α > 0 can be taken arbitrarily large, but of course K 0 will also grow without bounds (except for the trivial case of constant f ), in view of Liouville's Theorem. Clearly, the bound (5) suffers from the same limitations as the ones for A k or e t A we discussed earlier: the presence of κ(X ) makes the bound virtually useless, unless A is normal (κ(X ) = 1), or nearly normal (κ(X ) small). In particular, if κ(X ) depends on n, we don't obtain uniform bounds in n. Of course, if A is not diagonalizable then the bounds simply do not apply.
We shall come back to this problem in Sect. 5.

Convergence bounds for Krylov subspace methods
The second problem concerns the characterization of the convergence of minimal residualtype Krylov subspace methods to the solution of large-scale linear systems arising from the discretization of certain PDEs or systems of PDEs. These methods construct polynomial approximations of the form x k = p k (A)b to the solution x * = A −1 b of the system Ax = b. The polynomial p k is chosen so as to satisfy an optimality condition [40]. When A is Hermitian there are two main approaches, both based on the minimization of two different norms of the residual r k = b − Ax k over a suitable subspace of dimension k at each step k = 1, 2, . . .. Without loss of generality, here we assume that x 0 = 0. These two approaches lead to the Minimal Residual (MINRES) method and to the Conjugate Gradient (CG) method, respectively.
The Minimal Residual method determines the vector x k which minimizes the 2 -norm of the residual r k = b − Ax k over the kth Krylov subspace Note that the vectors in this subspace are of the form p k−1 (A)b, where p k is a polynomial of degree k, and that the Krylov subspaces form a nested sequence, Therefore, the sequence of residual norms r k is non-increasing.
On the other hand, if A is positive definite the Conjugate Gradient method minimizes over the same subspace. Again, the convergence is monotonic in this norm. For both of these methods, the eigenvalues of A are descriptive of the convergence behavior. Indeed, for MINRES we have the following bound: where Π k denotes the set of all polynomials of degree ≤ k that satisfy p(0) = 1. For CG we have the analogous bound in the appropriate norm: We note that both bounds (6) and (7) [40]) minimizes the 2 -norm of the residual over the Krylov subspace method K k (A, b) at each step. If A is diagonalizable, A = X DX −1 , then the residual norm at step k satisfies leading again to a crude bound of the form If A is normal, κ(X ) = 1 and we recover the bound for MINRES. If κ(X ) is large, however, the right-hand side of (8) may provide no information; in particular, if the right-hand side is > 1 the bound doesn't even capture the non-increasing behavior of the residual norms r k . Furthermore, it has been shown by Greenbaum et al. [27] that, given any set of n not necessarily distinct complex numbers λ 1 , . . . , λ n (for instance, all equal to 1) and any nonincreasing sequence of n nonnegative values ρ 0 , . . . , ρ n−1 , it is possible to construct a matrix A ∈ C n×n and a right-hand side b ∈ C n such that A has the λ i as its eigenvalues and GMRES with initial guess x 0 = 0 produces a sequence of residuals {r k } with r k = ρ k for k = 0, 1, . . . , n − 1.
In other words: any non-increasing convergence curve is possible for GMRES, and the eigenvalues of A, in general, do not contain enough information to describe the convergence behavior. It follows that when A is far from normal, other tools must be sought. While it is unlikely that we will ever find a fully satisfactory answer to the problem of characterizing the convergence of GMRES in general (see [23]), in Sect. 6.1 we will see that in certain special cases it is possible to give reasonably satisfactory convergence bounds.
3 What else is there besides the spectrum?
As we have seen, when A is non-normal, eigenvalue information alone is not enough to analyze various fundamental problems in numerical linear algebra, and in some cases it can even be misleading. Moreover, when A is non-normal (for example, A is defective or close to a defective matrix) the spectrum lacks robustness in the presence of perturbations in the data, which are unavoidable in finite precision computations. It is also desirable to find approaches that do not assume the diagonalizability of A.
Among the sets associated to an operator A that have been proposed as substitutes for the spectrum Λ(A), we mention the following: 1. The pseudospectrum Λ (A); 2. The field of values W( A);

Various spectral sets, intermediate between Λ(A) and W( A).
These sets allow us to do away with the diagonalizability assumption and lead to bounds that do not depend on κ(X ). After a brief discussion of the pseudospectrum, we will focus our attention on the field of values; other spectral sets are mentioned in passing in the conclusion section.

The pseudospectrum
Let A ∈ C n×n and let > 0. The -pseudospectrum of A is the set It can be equivalently defined as the set of all z ∈ C such that there exists a matrix ΔA ∈ C n×n with ΔA < and z ∈ Λ(A + ΔA). In other words, the -pseudospectrum of A is the set of all complex numbers that are eigenvalues of -perturbations of A [46].
When A is normal, the -pseudospectrum of A is just where, as usual, the sum of sets is defined elementwise (Minkowski addition). However, when A is far from normal, Λ (A) can be much larger than Λ(A) even for very small values of .
Consider now the problem of bounding the approximation error where q(z) is a polynomial approximation of f (z) on some region of C containing the eigenvalues of A. Note that both of our problems, obtaining bounds for the entries of f (A) and bounding the error in the approximate solution of Ax = b by Krylov subspace methods can be reduced to this one; in the latter case, we take f (z) = z −1 .
Recalling that where σ n (z I − A) denotes the smallest singular value of z I − A, we obtain the bound In particular, if Γ is the boundary of the pseudospectrum Λ (A), then σ min = and we get When A is normal, one can shrink the contours so that L/ is arbitrarily close to 1, and thus the approximation error is given, in the limit as → 0, by and we recover the fact that the eigenvalues suffice to fully describe the quality of the approximation.
If A is non-normal, however, we have to choose the contours (and thus ) so as to balance the size of L with that of R = σ −1 min , which can be difficult. Nevertheless, there are cases where (9) can be used to obtain uniform error bounds, not containing the factor κ(X ), and thus applicable even if A is not diagonalizable.
Unfortunately, the need to choose a suitable value of and the fact that the geometry of the pseudospectra can be rather complicated make the use of this tool quite difficult in practice. Examples of successful uses of the pseudospectrum in a variety of problems in pure and applied mathematics, together with a discussion of its advantages and disadvantages, can be found in the (now classic) book [46]. We do not consider the pseudospectrum further, and move instead to the second alternative.

The field of values and some of its properties
If A is a bounded linear operator on a complex Hilbert space H, the field of values (or numerical range) of A is the subset of C defined by In other terms, W( A) is the range of the quadratic form q(x) = Ax, x as x varies over the unit sphere in H. Depending on the problem, one may consider the field of values with respect to different inner products. When not explicitly indicated otherwise, we assune H = C n and the inner product will be the standard one.
Here are some properties of the field of values of a matrix A ∈ C n×n : Several of this properties, but not all, retain meaning and remain true in infinite dimension. In particular, while the field of values of a bounded operator on an infinite-dimensional Hilbert space is bounded and convex, it may not be closed. We refer to [32] for detailed expositions of the properties of the field of values of n × n matrices, and to [28] for the operator case.

Spectral containment: Λ(A) ⊂ W( A).
We also note that properties 3 and 4 together show that the field of values is stable under perturbations, in the sense that the field of values of a slightly perturbed matrix is a slight perturbation of the field of values of the original matrix.
In Fig. 3 we show the boundary of the field of values and the eigenvalues of a 10 × 10 matrix with randomly distributed entries in (0, 1).

Functions of large, sparse matrices
Bounds on the entries of f (A) for A banded, or sparse, can be obtained from bounds on the polynomial approximation error N = 0, 1, , . . . , on a suitable compact set K ⊂ C. Here we assume that Λ(A) ⊂ K and that f is analytic on an open set containing K.
Indeed, suppose A is k-banded and let p N be the best approximation polynomial of degree N . Using the fact that p N (A) is k N -banded, it is possible to write for all i, j such that |i − j| > k N . Assume for a moment that there exist constants C 0 > 0, α > 0 such that For i = j we can write |i − j| = k N + , = 1, 2, . . . , k. Observing that |i − j| > k N implies N + 1 < |i− j| k + 1, we can write for all i = j where C = C 0 e −α , α = α/k, i.e., an exponential off-diagonal decay bound. Hence, we need to find a suitable set K such that (10) holds, and obtain explicit expressions for the constants C 0 and α. In particular, we seek bounds not containing the condition number of the eigenvector matrix; indeed, we do not want to assume that A is diagonalizable. Moreover, as already mentioned, we are especially interested in bounds that are independent of the dimension n, when possible.
For A Hermitian (more generally, normal), such bounds have been given in [8,11,12]. In these papers the solution is obtained through Bernstein's Theorem combined with the Spectral Theorem. Bernstein's Theorem states that if K ⊂ C is a continuum (a nonempty, compact, connected set not reduced to a point) and f is analytic on an open subset Ω with K ⊂ Ω, then f can be approximated uniformly on K by a sequence of polynomials p N such that the approximation error f − p N ∞,K decays at least exponentially in the degree, N (and viceversa). As it turns out, the p N can be taken to be Faber polynomials.
The Spectral Theorem for normal matrices allows one to translate this result into the corresponding exponential decay bound for [ f (A)] i j via the inequalities with C and α as described above. Both C 0 and α (and thus C and α ) depend on the choice of K; taking a larger K makes both C and α larger, as already mentioned; hence, there is a trade-off.
These results have been extended to the non-normal case by the author and Boito in [7] and, more recently, by Pozza and Simoncini in [39]. Specifically, if A is a banded normal matrix and f is analytic in the interior of W( A) and bounded on the boundary ∂W(A), then an exponential off-diagonal decay bound can be established for the entries of f (A). A similar bound holds for sparse matrices with the geodesic distance on the graph of A replacing the distance from the main diagonal. Moreover, these results hold not just for functions of matrices over the complex field, but more generally for functions of matrices with entries in any complex C * -algebra.
The proof given in [7] was obtained combining Bernstein's Theorem and the following deep theorem of Crouzeix's: Theorem 1 [15] Let A ∈ C n×n and let f be analytic in the interior of W( A) and bounded on its boundary. There exists a universal constant Q such that The constant Q satisfies 2 ≤ Q ≤ 11.08 and it is conjectured that Q = 2.

Moreover, the same result applies to analytic functions of bounded linear operators on a complex Hilbert space H.
Recently, the upper bound on Q has been lowered to 1 + √ 2 in [17]. Whether Q = 2 remains an open question. The bounds in [7] contain the constant Q, which can be taken to be equal to 1 + √ 2. The results of Pozza and Simoncini do not make use of Crouzeix's Theorem but instead rely on a result of Beckermann [4]. In both approaches, a key role in the analysis is played by Faber polynomials, which are briefly introduced next. For more details we refer to [21,36,44].
Recall that a continuum is any compact, connected set not reduced to a point. If K is a continuum with connected complement, the Riemann Mapping Theorem guarantees the existence of a function φ that maps the exterior of K conformally onto the set {z ∈ C ; |z| > 1} and such that Such φ has the Laurent expansion The polynomial parts, are called the Faber polynomials generated by the continuum K. The constant ρ is called the logarithmic capacity of K.
Let K ⊂ C be a continuum. As shown by Faber [24], every analytic function f defined on K can be expanded in the series (uniformly convergent on K), where the coefficients are given by Here τ > 1 is chosen such that f is analytic on the complement of the set {φ −1 (z) ; |z| > τ} and φ maps the exterior of K conformally onto the set {z ∈ C ; |z| > 1}.
If A ∈ C n×n has spectrum contained in K, then Moreover, we have the following important result by Beckermann [4], the proof of which employs ideas from potential theory.

Theorem 2 [4]
Let K ⊂ C be convex and compact. If A ∈ C n×n is such that then the Faber polynomials generated by K satisfy F N (A) ≤ 2, for all N . The constant 2 is optimal.
Using this theorem, Pozza and Simoncini [39] obtained the following off-diagonal decay bound. We include the short and elegant proof for completeness.
Theorem 3 [39] Let A ∈ C n×n be k-banded and such that W( A) ⊆ K, with K compact and convex. With φ and τ > 1 defined as before, we have we easily obtain A more precise statement is possible to account for matrices with lower bandwidth β and upper bandwidth γ with β = γ , see [39]. Moreover, the result can be extended to more general sparse matrices. Note, again, the trade-off involved in the choice of τ . If f is entire, τ can be arbitrarily large and the decay is superexponential. When explicitly computing the bound, one can take K = W( A), if the latter is known. For certain classes of matrices, W( A) itself is not known, but it is known to be bounded by some simple compact convex set, like an ellipse or a disk, which can be easily estimated. In some cases the corresponding bounds can be dramatically better than those containing the condition number of the eigenvector matrix. This is the case of families of n × n matrices such that κ(X ) grows unboundedly with the dimension n, while the field of values remains uniformly bounded. Consider for example the infinite tridiagonal Toeplitz matrix generated by the symbol ϕ(z) = 2z −1 + 1 + 3z, |z| = 1: The matrix represents a bounded linear operator on 2 (N). Let A n denote the finite section of A of dimension n, i.e., the n × n matrix formed by the first n rows and columns of A. Then all the fields of values W( A n ) are regions whose boundaries are ellipses, see [19,Corollary 4]. As n → ∞, these ellipses converge to an ellipse which contains all the W( A n ) and is the boundary of W( A), therefore the fields of values of A n are all uniformly bounded in n. In contrast, the condition number of the eigenbasis κ(X n ) grows exponentially with n. Note that the infinite matrix (11) has no point spectrum, hence no eigenvectors in 2 (N).
In Fig. 4, we illustrate the decay behavior of the order of magnitude of the entries in the first row of f (A n ) = e A n for n = 100 (black plot), together with the bounds obtained using the field of values (red) and the one containing κ(X ) (blue). Note the logarithmic scale on the vertical axis. This example shows that the eigenbasis-dependent bounds can overestimate the magnitude of the entries by many orders of magnitude, while the bounds based on the field of values can result in much more accurate estimates, especially at short distance from the main diagonal. For this matrix, the eigenbasis condition number is κ(X ) ≈ 5.26 · 10 8 . Taking larger values of n will make the eigenbasis-dependent bound much worse, while the field of values-dependent bound remains unchanged. This example can be easily generalized and extended.
Finally, we mention that while we have focused here on the derivation of bounds for the entries of f (A), nearly identical considerations apply to the problem of polynomial (and also rational) approximations for computing the action of a function of a matrix on a vector, v = f (A)b; see, for instance, [5,48].

Convergence of Krylov methods for saddle point problems
In this section we review some convergence bounds for GMRES based on the field of values, and show how they lead to mesh-independent estimates of the rate of convergence of preconditioned GMRES applied to saddle point problems.

Field of values bounds for GMRES
The Generalized Minimal Residual (GMRES) method [40,41] is the most widely used algorithm for the solution of large, sparse, nonsymmetric systems of linear equations Ax = b. Starting from an initial guess x 0 , GMRES constructs approximations x k to the solution where deg( p) is the degree of the polynomial p. Using p(A)r 0 ≤ p(A) r 0 , we easily obtain the bound which no longer depends on b or r 0 . Over the years, there have been many attempts to derive descriptive error bounds for GMRES analogous to those available for MINRES or CG. This is a difficult task, see for example [23]. Results are known for matrices A such that A + A * is positive definite, see [20] (see also [40]). More generally, if 0 / ∈ W( A), there are field of values-based bounds due to Eiermann [19] and to Beckermann [4], among others. The latter one is given next.

Theorem 4 [4]
Let A ∈ C n×n and let K ⊂ C be convex, compact, and such that W( A) ⊆ K and 0 / ∈ K. Let φ be the map in the statement of Theorem 3. Then the GMRES residuals satisfy Suppose now that we have a family of linear systems, A ν x ν = b ν , depending on a parameter ν. Here ν could be a physical parameter, such as the viscosity in a discretized convection-diffusion equation, or the dimension of the linear system, corresponding to finer and finer discretizations of some differential or integral operator. Of particular interest is the case where ν = O(h), where h is a discretization parameter. We have the following simple consequence of Beckermann's result: Then GMRES converges to the solution of each of the linear systems A ν x ν = b ν in a number of steps that is bounded uniformly in ν.
In practice, this result will be applied not to the original linear system A ν x ν = b ν but to a preconditioned version. Indeed, apart from very special situations, preconditioning is usually necessary to achieve ν-independent convergence. We turn to preconditioning next.

Field of values equivalence
We begin by reviewing the notion of spectral equivalence for families of Hermitian positive definite (HPD) matrices [3]. Recall that two families of HPD matrices {A h } and {B h } are said to be spectrally equivalent if there exist h-independent constants α and β with Note that this is an equivalence relation between families of matrices. If the discretization of (say) an elliptic PDE leads to a sequence of linear systems A h u h = b h , a family of spectrally equivalent preconditioners {B h } guarantees that the Preconditioned Conjugate Gradient (PCG) method will converge in a number of steps that is uniformly bounded with respect to the parameter h . If h denotes some measure of the mesh size (discretization parameter), the resulting PCG iteration exhibits mesh-independent convergence. If, in addition, the cost of applying the preconditioner B h is linear in the number of degrees of freedom, we say that the preconditioner is optimal with respect to the mesh size h. In general, of course, the actual performance of the preconditioner can be affected by other factors, such as physical parameters. Good general references for the PCG method for the solution of discretized PDEs include [3,22,38]. When the preconditioned system is not symmetrizable with positive eigenvalues, for example because the preconditioner is indefinite or non-symmetric, then spectral equivalence is no longer the appropriate tool to analyze the convergence of preconditioned Krylov methods, and PCG cannot be applied. In this case, the more general concept of field of values equivalence, first proposed by G. Starke [43], can in some cases provide the theoretical framework needed to establish mesh-independent convergence for certain preconditioners for Krylov methods like GMRES. Examples include preconditioners for convection-diffusion equations [43], block preconditioners for the Stokes system and other problems of saddle point type [14,22,33,35], preconditioners for the incompressible linearized Navier-Stokes equations [10] and for Rayleigh-Bénard convection [2]. Field of values equivalence has also been applied to the analysis of preconditioned iterative solvers applied to discretizations of the Helmholtz equation; see, e.g., [25,29]. Finally, we refer to [37] for recent work on the use of the field of values to study the convergence of a class of two-grid iterative methods.
For reasons of space we can only give a very succinct overview of how field of values equivalence may be used to obtain h-independent convergence bounds for preconditioned GMRES applied to large linear systems in saddle point form, i.e., 1. Block triangular preconditioners based on approximate Schur complements for the Stokes and Oseen problems [33]; 2. Block diagonal preconditioning of Darcy's equations [35]; 3. Augmented Lagrangian preconditioning of the Oseen problem [10]; 4. Constraint preconditioning of the coupled Stokes-Darcy system [14]; 5. Block triangular preconditioning of the Rayleigh-Bénard system [2].
We refer interested readers to the cited literature for details.

Conclusions
In this expository paper we have illustrated how the field of values has been used in the study of some important problems in numerical analysis, from the approximation of matrix functions to the convergence analysis of preconditioned GMRES for solving large-scale linear systems. While we have not discussed the actual numerical computation of the field of values of a matrix, which is a challenging task in the case of matrices of very large size, we have shown how a priori knowledge of certain properties of the field of values may be sufficient to prove certain useful bounds and even to obtain optimality results for a class of preconditioners for a given problem. Briefly stated, the fields of values must remain bounded and bounded away from any singularities of the underlying function, uniformly in the parameter of interest (which is often, but not always, the matrix dimension n).
Of course, the field of values is no panacea, and approaches based on it will fail if it contains any singularities of the underlying scalar function; for the convergence analysis of GMRES the function is f (z) = z −1 , and the field of values is useless if it contains the origin. Nevertheless, in this case it may still be possible to identify a C-spectral set, i.e., a subset S of the complex plane satisfying Λ(A) ⊂ S ⊂ W( A), not containing 0 (or, more generally, any singularities of the function f ), and such that g(A) ≤ C sup z∈S |g(z)| for all rational functions g bounded on S, where C is a universal constant. We refer to [16] for some examples illustrating this technique. It is, however, too early to say if this approach can be successfully applied to prove convergence bounds for the preconditioned GMRES method in realistic applications. regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.