Backward error analysis of the shift-and-invert Arnoldi algorithm

We perform a backward error analysis of the inexact shift-and-invert Arnoldi algorithm. We consider inexactness in the solution of the arising linear systems, as well as in the orthonormalization steps, and take the non-orthonormality of the computed Krylov basis into account. We show that the computed basis and Hessenberg matrix satisfy an exact shift-and-invert Krylov relation for a perturbed matrix, and we give bounds for the perturbation. We show that the shift-and-invert Arnoldi algorithm is backward stable if the condition number of the small Hessenberg matrix is not too large. This condition is then relaxed using implicit restarts. Moreover, we give notes on the Hermitian case, considering Hermitian backward errors, and finally, we use our analysis to derive a sensible breakdown condition.

Luckily, good implementations, where in particular the orthogonalization is done with care, can be shown to be backward stable [3,8,10,21] in the sense that the computed quantities V k+1 and H k satisfy an exact recurrence with a slightly perturbed matrix: This means that we can compute a basis of an exact Krylov subspace corresponding to a nearby matrix. Since the basis will in general not be perfectly orthonormal, so V H k+1 V k+1 = I , we use the term "Krylov recurrence" instead of "Arnoldi recurrence" when referring to recurrences like (1), as suggested in [24]. If A is Hermitian, then it can be shown that the computed basis spans a Krylov subspace associated with a perturbed Hermitian matrix A + A [15]. There is a catch in this case, though: the small (k + 1) × k matrix associated with this Krylov subspace is in general not the computed Hessenberg matrix.
In this paper we perform a similar backward error analysis of the shift-and-invert Arnoldi algorithm. For example, we show that an implementation of the Arnoldi algorithm applied to A −1 , yields computed matrices V k+1 and H k such that and we give an upper bound for A 2 . Perturbed versions of the shift-and-invert Arnoldi algorithm have been considered in the literature as a part of the theory of inexact methods, see [16,19]. However, these results neglect that the orthonormalization is not performed exactly, and furthermore, assume bounds on linear system residuals that may be unattainable (more on this in Sect. 2). We consider more general linear system residuals and take the error from the orthonormalization into account. Our analysis of how the orthonormalization errors propagate into the shift-and-invert Krylov recurrence highlights the importance of columnwise backward error bounds for QR factorizations, and is thus of a different flavor than the corresponding analysis for standard Arnoldi, done in, for example [8].
We also use our error analysis to motivate when "breakdown" should be declared, that is, when h j+1, j may be considered to be "numerically zero".
The algorithm we study can be divided into two main subproblems: solving linear systems and orthonormalizing vectors. We state our backward error results in such a way that they are independent of how these subproblems are being solved, but we also discuss relevant and commonly used approaches for solving these two tasks.

Technical outline
We study floating point implementations of Algorithm 1, where A is assumed to be of size n × n, σ is the shift, b the starting vector, and k is the maximum number of steps we perform. Throughout the paper · refers to the 2-norm.

Algorithm 1 The Shift-and-invert Arnoldi algorithm
In exact arithmetic, we have which corresponds to classical Gram-Schmidt if implemented as it stands. In floating point arithmetic, orthogonalization routines with better numerical properties, such as modified Gram-Schmidt (MGS), are usually employed.
In the jth iteration in Algorithm 1, a new vector w j is computed and decomposed into a linear combination of v 1 , . . . , v j and a new component that will be the definition of v j+1 . In exact arithmetic, this can be described by the Arnoldi recurrence When the corresponding thing is done in practice, however, errors are present in all steps of the computation. First, we need to solve a linear system. If we use a direct solver the matrix A − σ I needs to be formed. We consider the rounding error in this step as part of the residual from the linear system. This does not affect the norm of the residual significantly, because the rounding error is very small, Here float(A − σ I ) refers to the computed shifted matrix and u is the unit roundoff. Let r j be the said residual from the linear system, so is the actual linear system that has been solved. Then we have the following equality for the computed quantities: where g j is an error coming from the orthonormalization process. Defining We discuss the residual r j and the error g j in Sects. 2 and 3, respectively, and provide bounds for both quantities. In Sect. 4, we use these bounds in order to bound F k , and subsequently the backward error for the shift-and-invert Arnoldi recurrence. In Sect. 5, we explain how the idea of implicit restarting can be used to gain further insight into the backward error. We also discuss in what sense we have Hermitian backward errors if the method is applied to a Hermitian matrix A. Finally, we talk about breakdown conditions: in floating point arithmetic, the test if h j+1, j = 0 in Algorithm 1 is rarely done. Instead one usually checks whether h j+1, j is "small enough". This case is referred to as breakdown. A sensible definition of "small enough" is when the quantity is dominated by errors. We discuss this in more detail and derive backward error bounds for this case.

Notation
The scalar σ refers to a shift while σ min (X ) refers to the smallest singular value of X . The dagger notation X † refers to the Moore-Penrose pseudo-inverse of X . The lower letter u is reserved to denote the unit roundoff if real arithmetic is used, and √ 5 times the unit roundoff if complex arithmetic is used [7]. When the matrix size is understood from the context, we denote zero matrices and identity matrices as 0 and I , respectively. Similarly, the vector e i denotes the ith column of the identity matrix whose size is understood from the context. For a matrix X , the lower case x i refers to the ith column of X and X k to [x 1 x 2 · · · x k ], that is, the first k columns of X .

Errors from linear systems
In this section we discuss bounds on the residual r j from (2).

Backward error bounds
The normwise backward error associated with a computed solution y of a linear system Ax = b is defined as and given by the formula where r = Ay − b [20]. See also [12, p. 120]. This result is true for any vector norm · and its subordinate matrix norm. Thus, if we solve the linear systems in Algorithm 1, up to a backward error bw , then it holds that where r j is defined in (2). If the linear systems are solved by a backward stable direct method, we have bw ≤ φ(n)u, where φ(n) is an algorithm dependent constant. If we are interested in the smallest possible bw such that (4) holds, then we need to compute r j /( A − σ I w j + v j ). However, this may not be feasible for the 2-norm, due to the term A − σ I . In these cases we can replace A − σ I by a lower bound (the tighter the better), and thus obtain an upper bound for bw . We can for instance do a few iterations of the power method applied to (A − σ I ) H (A − σ I ). MATLAB's normest function does exactly this. This would lead to a lower bound of A − σ I , since convergence is always from below. Another possibility is to use the (lower) bound in [13]. We can also bound the matrix 2-norm in terms of the corresponding infinity-norm or 1-norm. The following proposition shows that such bounds can be satisfactory for many sparse matrices, in particular those which can be permuted to banded form.

Proposition 1
Let k row and k col denote the maximum number of nonzero entries in a row and column of A, respectively. Then the following two upper and lower bounds hold: Proof We have A ∞ = Ax ∞ for some x with x ∞ = 1 and at most k row nonzeros. We get which is the desired upper bound for A ∞ . Further, we have which is the desired upper bound for A 1 .
The inequality (4) can also be used as a stopping criterion for iterative linear system solvers [2]. In this case, bw denotes the desired backward error, which is given prior to execution. If we replace A − σ I with a lower bound, then we get a more stringent stopping criterion.

Residual reduction bounds
An alternative to (4) is to use the bound This bound is commonly used as a stopping condition when the linear systems are solved by iterative methods. Unfortunately, as a stopping condition, (5) "may be very stringent, and possibly unsatisfiable" [12, p. 336]. See also [9, pp. 72-73] for a 2 × 2 example that illustrates the pitfall of comparing the norm of the residual with the norm of the right hand side. However, since (5) is de facto commonly used in computer codes it is still worth to study it under the assumption that the stopping criterion is met.

Auxiliary residual bounds
In order to treat both (4) and (5) in a unified way, we consider the following auxiliary bound Clearly, the substitutions ( 1 , 2 ) ← ( bw , bw ) and ( 1 , 2 ) ← ( tol , 0) give back (4) and (5), respectively. We can simplify the bound in (6) in cases when A − σ I is not too ill-conditioned with respect to 2 . To see this we need the following lemma.
Proof We have Reordering gives the result.
The following result yields a family of new residual bounds independent of v j .

Errors from orthonormalization
In this section we are concerned with the orthonormalization error Up to signs, this error can be viewed as the backward error in the ( j + 1)st column of a perturbed QR factorization Thus, we are interested in columnwise backward error bounds for QR factorizations. The next theorem shows how such bounds can be obtained from normwise backward error bounds given in the 2-norm or the Frobenius norm. It applies to floating point algorithms qr(·) that are unaffected by power-of-two column scalings, in the sense where the d i are powers of 2. Barring underflow and overflow, this covers commonly used QR algorithms such as classical and modified Gram-Schmidt with and without (possibly partial) reorthogonalization, Householder QR and Givens QR.

Theorem 4 Let qr(A) denote an algorithm that computes an approximate QR factorization of an n × k matrix A in floating point arithmetic. Suppose further that
Since AD is the backward error from qr(AD) we have for i = 1 : k, from which the theorem follows.
The constant γ in Theorem 4 is obviously algorithm dependent and many bounds exist in the literature. Some of them contain both n and k [23], and others only k [1,5], [12,Theorem 19.13]. In [12, p. 361] a columnwise bound depending on n and k is given. For Krylov methods we usually have n k, so bounds independent from n should certainly be favored. We shall assume that holds for some function η(n, k).

Columnwise backward errors for modified Gram-Schmidt
Our next theorem shows that for MGS, with and without one round of reorthogonalization, η in (10) does not depend on n and is given by where ζ is a modest constant. We need the following forward error result for _axpy operations.

Lemma 5 Let α be a scalar and x and y vectors. If
Thus the componentwise forward error is bounded by |s| ≤ 2u(|αx| + |y|) [14]. We get The next theorem gives columnwise backward error bounds for MGS with and without one round of reorthogonalization.

Theorem 6 Let Q and R denote the computed factors in a QR decomposition of an n × k matrix A, which was obtained by a floating point implementation of modified Gram-Schmidt with or without one round of reorthogonalization. Assume
Then there exists a A such that A + A = Q R with a j ≤ cj a j u, where c = 4(1 + δ) if no reorthogonalization was done and c = 10(1 + δ) 2 if one round of reorthogonalization was done.
Let us pause for a while and discuss the assumptions before we proceed with the proof. Assumption (i) is imposed to keep our analysis cleaner; it does not affect our final bounds in any significant way. Assumption (ii) is needed for the following reason: if we compute y = float x − q j (q H j x) for some 1 ≤ j ≤ k, then, assuming (i), the quantity 1 + (n + 3)u = 1 + q j 2 (n + 3)u is an upper bound for y / x [12, Lemma 3.9]. Thus, (ii) guarantees that we can apply a sequence of k elementary "floating point" projections of the form I − q i q H i to any vector x, and the resulting vector will be bounded in norm by (1 + δ) x .
Proof of Theorem 6 Let R (1) and R (2) denote the strictly upper triangular matrices containing the orthogonalization coefficients corresponding to the first and second round of orthogonalization, respectively. We define R (2) ≡ 0, if no reorthogonalization is done. Assume for a while that R (1) and R (2) are given, and suppose we want to compute This can be viewed as 2( j − 1) _axpy operations. We define a Using Lemma 5 yields is also the result of applying i − 1 elementary floating point projections to a j , so the discussion prior to the proof gives a . The forward error of a computed inner product float(x H y), where x and y are of length n, is bounded by nu x y [14]. Thus, and, similarly, r We have Using the above bounds for f j , the s i and the r i j gives a j < 10(1 + δ) 2 j a j u. If no reorthogonalization was done, then we have s i = 0 for i = j : 2( j − 1), and r i j = 0, f j ≤ (1 + δ) a j u for all j. Taking this into account yields a j < 4(1 + δ) j a j u.
Remark 1 Suppose the perturbed QR factorization (9) was computed using MGS. Then, by taking δ = 1/10 and assuming that the conditions of Theorem 6 hold, we get that η(n, k) in (10) is bounded by η(n, k) ≤ 5k if standard MGS is used, and η(n, k) ≤ 13k if MGS with one round of reorthogonalization is used. We point out that these bounds should not be interpreted as saying that standard MGS should be favored over MGS with reorthogonalization. On the contrary, as we will see in the next section, retaining a well-conditioned basis (which is the effect of reorthogonalization) is of great importance to the shift-and-invert Arnoldi algorithm.

Backward error bounds for the shift-and-invert Arnoldi recurrence
Recall the perturbed Krylov recurrence where We discussed in Sects. 2 and 3 how to bound r j and g j , respectively. By using these bounds, we can now easily bound F k . Assuming (6) and (10) yields Further, from (9) we see that which in turn implies assuming that η(n, j)u < 1. We get and further (assuming that η(n, k) is monotonically increasing in k) where should be thought of as a tiny factor. Similarly, if we assume the bound (8) instead of (6), we get This is the same bound we get from (13) if we replace ( 1 , 2 ) by (0, 2 1 + 2 ).
Having established (13) and (15), we are now ready to reshuffle Eq. (11) in order to derive backward error bounds for the shift-and-invert Krylov recurrence. We will derive perturbed recurrences of the form If we look at this from a backward error perspective, (16) means that we have taken k steps, without errors, of a shift-and-invert Krylov algorithm applied to a perturbed pencil, and all linear systems that occurred in the process must have been consistent. However, in order to rewrite (16) as we need to ensure that A + A − σ I is invertible. We need the following lemma to solve this technicality.

Lemma 7
Let A and V be matrices of size n × n and n × k respectively, such that rank AV = k. Then for any > 0, there exists a matrix X with X < such that A + X is nonsingular and X V = 0. Furthermore, if A is Hermitian, then we may take X to be Hermitian.
Proof Find a unitary matrix Q such that it follows that B 2 is also of full rank. We have which is easily seen to be nonsingular for large enough values of ω.
If we use the bound on F k shown in (13), then we can deduce the following theorem. (13) and √ kκ(V k ) 1 < 1. Then there is a A of rank at most k such that

be of full rank and assume F k is bounded as in
where c kn ( 2 ) is given by (14).
Proof From V k +F k = (A−σ I )V k+1 H k and V k = (A+ A−σ I )V k+1 H k we see that any eligible A has to satisfy AV k+1 H k = −F k . We choose A = −F k (V k+1 H k ) † (which is of rank at most k) which implies A ≤ F k /σ min (V k+1 H k ). Substituting F k by the upper bound given in (13) yields For the denominator we get where we used σ min (XY ) ≤ X σ min (Y ) which holds for any matrices X, Y . Thus which can be reordered to the claimed bound.
If the linear systems are solved up to a normwise backward error bw , and (8) and (15) hold for 2 1 + 2 = 3 bw , then we get the following corollary. (15) with 2 1 + 2 = 3 bw . Then there is a A of rank at most k such that

Corollary 9 Let (A − σ I ) −1 (V k + F k ) = V k+1 H k be of full rank and assume F k is bounded as in
where c kn (·) is given by (14).
A few remarks are in order.

Remark 2 If
A + A − σ I in Theorem 8 and Corollary 9 is singular, then we can invoke Lemma 7 with V = V k+1 H k to obtain a backward error A, arbitrarily close to A, such that The new backward error A will in general have rank greater than k, but its numerical rank is still bounded by k.
Here the definition of numerical rank can be arbitrarily strict, in the sense that we may define the numerical rank as the number of singular values that greater than > 0, for an arbitrarily small .

Remark 3
If the orthonormalization is done properly, using, for instance, MGS with reorthogonalization, then κ(V k+1 ) ≈ 1. In this case we can ignore the factors κ(V k+1 ) and κ(V k ) when evaluating the bounds in Theorem 8 and Corollary 9. In particular this means that the bounds can be estimated cheaply as long as A − σ I (or a good estimate of it) is known.

Remark 4
For the standard eigenvalue problem, shifts are used to find interior eigenvalues, so any sensible shift satisfies |σ | ≤ A . Thus, we have A − σ I ≤ 2 A in practice.

Remark 5
In view of [6], we note that our bounds do not contain the loss-oforthonormality term V H k+1 V k+1 − I . Instead we saw that the condition number of the computed basis V k+1 plays a role in the bounds of the backward error. We note, however, that a small value of V H k+1 V k+1 − I implies that V k+1 is well-conditioned: The next example shows how Theorem 8 can be used to derive a simple a posteriori backward error bound.
Example 1 Suppose a matrix A and a shift σ with |σ | < A are given, and suppose we perform k steps of the shift-and-invert Arnoldi algorithm. To solve the linear systems we use an iterative method that employs (5) as stopping condition, that is, the linear systems are considered "solved" when the residuals are less than some tolerance tol (we ignore the norm of the right hand side since it is approximately one). We use a rather crude tolerance so tol u. For the orthogonalization we use MGS with one round of reorthogonalization so c kn (0) 13ku (cf. Remark 1). If then Theorem 8, with 1 = tol and 2 = 0, and the following remarks, yield that the computed quantities satisfy Here we have used the fact that κ(V k+1 ) ≥ κ(V k ). Since MGS with reorthogonalization was employed, we expect κ(V k+1 ) to be close to one. Thus, (19) tells us that the relative backward error A / A is a modest multiple of the tolerance we used to solve the linear systems. So, in this setting the shift-and-invert Arnoldi algorithm is backward stable.
We end this section with a numerical experiment. We consider two matrices of order n = 1000 and associated shifts. The first matrix is the symmetric tridiagonal matrix , and the associated shift is σ 1 = −2. It is well-known that the eigenvalues of A 1 are given by −2 + 2 cos(π k/(n + 1)), for k = 1 : n, so A 1 − σ 1 I is indeed invertible. The second matrix is the nonnormal matrix also known as the Grcar matrix [11]. The associated shift was chosen to be σ 2 = 1. It is an easy exercise to show that A 2 − σ 2 I is invertible. We implemented the shift-and-invert Arnoldi algorithm in MATLAB R2013a. For orthonormalization we used MGS with one round of reorthogonalization. The matrices were stored in sparse format, and the linear systems were solved using MATLAB's "backslash" and lu routines. We took k = 30 steps with the starting vector [1, 1, . . . , 1] T , and in each iteration we computed the backward error shown in (3), where the residual was evaluated in extended precision (32 digits) and then rounded to double precision. We did this using the vpa function from the Symbolic Math Toolbox. We also computed the errors in extended precision and rounded the result to double precision. For each j = 1 : k and i = 1, 2, we computed where bw was set to be the largest backward error of the linear systems that was encountered in the algorithm, and c jn (3 bw ) := (3 bw + 13 ju)/(1 − 13 ju) (cf. Remark 1). As is mentioned in Remark 3, the above quantity is a good estimate of the bound in Corollary 9. We also evaluated the expression for the backward errors, A given in the proof of Theorem 8, and computed their norms. We did this using the MATLAB routines pinv (for the Moore-Penrose pseudo-inverse) and norm. The quantities B( A are shown in Fig. 1 for j = 1 : 30, i = 1, 2. Our experiment seems to be unaffected by the nonnormality of A 2 . Moreover, even though the (estimated) upper bounds B( A ( j) i ), i = 1, 2, can be seen to be rather pessimistic, they do show that the backward errors are less than √ u. In other words, for both matrices, the upper bounds show that the computation is backward stable up to single precision.

Implicit restarting
The bounds in Theorem 8 and Corollary 9 contain the factor κ(H k ), so if κ(H k ) 1 we cannot guarantee a small backward error. If we recall how Arnoldi locates eigenvalues [25, pp. 257-265], we have, unfortunately, reason to suspect that this is the case. Since Arnoldi does not target the largest eigenvalues, but any isolated eigenvalue cluster, H k := [I k 0]H k is likely to have both large and small eigenvalues, which suggests that H k may be ill-conditioned. We will now show that the situation can be much better than expected if we restrict our attention to the largest eigenvalues of H k , that is, the ones corresponding to eigenvalues of A closest to the shift σ . The idea is to do an implicit (thick) restart [24], and purge the small eigenvalues of H k . Since small eigenvalues of H k correspond to eigenvalues of A far from the shift σ , it is reasonable to assume they are of less interest. Suppose and consider a Schur form H k = QT Q H such that t ii , i = + 1 : k, are the small eigenvalues to be purged. We have where U k = V k Q. Throwing away the last k − columns yields , and E = F k Q , results in a compact recurrence where E ≤ F k . Note that our bound on E depends on k and not . We can now repeat the proof of Theorem 8, and use the bounds E ≤ F k and σ min (U +1 ) ≥ σ min (V k+1 ), and the recurrence (20) instead of the one assumed in the theorem. We get Comparing this to the bound in Theorem 8 we see that κ(H k ) has been replaced by H k /σ min (T ). Further, it holds that It follows that if H k is ill-conditioned due to the small eigenvalues we purged, then H k /σ min (T ) κ(H k ) and (21) shows that the upper bound for the backward error corresponding to the part of the spectrum we care about is much smaller than the upper bound for the general backward error.

Hermitian backward errors
We now restrict the scope to the Hermitian matrix eigenvalue problem, that is, when A = A H and σ is real. Let us mention that we still consider the shift-and-invert Arnoldi algorithm, as it is shown in Algorithm 1, and not the shift-and-invert Lanczos algorithm with a three-term recurrence. In the Hermitian case, Algorithm 1 is also known as the shift-and-invert Lanczos algorithm with full orthogonalization, and it is used in, e.g., ARPACK [17, routine ssaitr.f] and MATLAB's eigs command.
Is it, for a Hermitian A, possible to find a Hermitian backward error A? We have seen in the proof of Theorem 8 that A has to satisfy AV k+1 H k = −F k . Unfortunately the following lemma rules out existence of such a Hermitian A in general.

Lemma 10
Let X ∈ C n×k and F ∈ C n×k . Then there exists a Hermitian E with E X = F if and only if X H F is Hermitian and F X † X = F. In that case, there is such an E with rank(E) ≤ 2k and E * ≤ 2 F * /σ min (X ) where · * denotes the 2-norm or the Frobenius norm.
Proof The proof is simple and, for k = 1, is contained in [18]. We give it for completeness. Let E be any matrix such that E X = F. This implies E X X † X = F X † X and (using X On the other hand, if X H F is Hermitian and F = F X † X , then is also Hermitian. Furthermore, rank(E) ≤ 2k, E X = F, and (using that I − X X † is an orthogonal projector) The next result shows that one still gets a Hermitian backward error if one replaces the Hessenberg matrix H k by some other (k + 1) × k matrix G k . Before we state the theorem, we should clarify what we mean by "backward error" in this case. If we replace H k by something else, we cannot say that the computed quantities (V k+1 and H k ) satisfy a an exact Krylov recurrence of a perturbed input matrix. We can, however, still say that the computed subspace is a Krylov subspace of a perturbed Hermitian input matrix. We refer to this Hermitian perturbation as the backward error.

Theorem 11 Let A be Hermitian and (
Hermitian and V k+1 G k is of full rank. Then there is a Hermitian A of rank at most 2k such that we see that any eligible A has to satisfy Since it is assumed that V k+1 G k is of full rank, Lemma 10 implies that such a Hermitian A exists if is Hermitian. Since the first term on the right hand side is Hermitian by assumption, this is easily seen to be the case. Also by Lemma 10, A is bounded by and is of rank at most 2k.

Remark 6
If A + A − σ I is singular, then we can use the second part of Lemma 7 to find a Hermitian backward error A arbitrarily close to A such that A + A − σ I is invertible.
In order to obtain a small Hermitian backward error, we need to find a matrix G k close to H k such that V H k V k+1 G k is Hermitian. One possibility is where R k , R k+1 are upper triangular QR factors of V k , V k+1 , respectively, and T k is the real symmetric tridiagonal matrix with t j+1, j = t j, j+1 = h j+1, j and t j, j = (h j j ).
Then G k is Hessenberg and computing Ritz pairs is particularly easy: we need to find vectors z and scalars μ such that Here we have used Remark 6 in order to ensure that A + A − σ I is invertible. By using the Krylov relation Inserting the QR factorizations V j = Q j R j , j = k, k + 1 and the formula for G k shown in (22) yields which simplifies to T kz = μz wherez = R k z. So, the Ritz values are just the eigenvalues of T k (which are real, since T k is Hermitian). To obtain the Ritz vectors, we would have to multiplyz with R −1 k . However, since R k is close to the identity matrix if the orthogonalization has been done properly (for instance, by using MGS with reorthogonalization) we can approximatez by z. Thus, (approximations of) Ritz pairs for the choice (22) of G k can be obtained without computing R k , R k+1 . We also note that choosing the eigenpairs of T k to construct Ritz pairs is what is done in practice.

Conditions for breakdown
We now discuss how to derive a sensible breakdown criterion based on our error analysis. We saw in Sect. 1.1 that the computed quantities V j+1 and H j satisfy This recurrence can be rewritten as Note that the first j − 1 columns of F j and F j are identical. For the last column, we have where r j is the residual from the linear system and g j is associated column error from the orthonormalization. It is natural to declare breakdown when the error introduced by neglecting h j+1, j is of the same order as the errors that are present in the computation. This leads us to the following breakdown condition: We can simplify this condition by replacing g j with its bound in (10). This yields We now discuss how to evaluate (23) in practice. If h j+1, j < η(n, j) w j u, then we can declare breakdown without further work. Otherwise we have to take the second term in (23) into consideration. If an iterative linear system solver that guarantees a residual less than some tolerance is used, then we can substitute r j in (23) by the given tolerance. If, for example, (6) is used as a stopping condition for the linear system solver, then r j is replaced by the right hand side of (6). If the residual, or any good bound for it, is not given, then we need to compute it. This is generally the case when the linear systems are solved by a direct method. Let m be a constant such that the following forward error bound holds for an arbitrary vector x If A−σ I is given as a dense matrix, we have m = n 3/2 [12, p. 70]. For sparse matrices, m can be much smaller. The computed residual r j satisfies By comparing to (3), we recognize A − σ I w j mu as a part of the norm of a residual associated with a computed solution with corresponding backward error mu. Thus, we can compute a satisfactory r j if we use an extended precision u such that mu < u. Thus the norm of the computed vector is accurate enough as long as muκ(A−σ I ) 1. If A − σ I is so ill-conditioned that this is not satisfied, then we can use an extended precision u such that muκ(A − σ I ) 1. If (6) and (23) hold, then f j ≤ 2 v j 1 + A − σ I w j ( 2 + η(n, j)u) .
By derivations similar to those leading to (13), we get From this we obtain the following "breakdown analogue" of Theorem 8.

Theorem 12
Let (A − σ I ) −1 (V j + F j ) = V j H j be of full rank and assume F j is bounded as in (24) and √ jκ(V j ) 1 < 1. Then there is a A of rank at most j such that where c jn (·) is given by (14).
The proof is omitted since it is essentially the same as the proof of Theorem 8. In a similar manner, we can get corresponding breakdown analogues to Corollary 9 and Theorem 11.

Conclusion
We have shown that a floating point implementation of the shift-and-invert Arnoldi algorithm, where errors from all steps of the computation are taken into account, yields computed quantities that satisfy an exact shift-and-invert Krylov recurrence of a perturbed matrix. Here, the word "Krylov" is used instead of "Arnoldi" since the computed basis cannot be guaranteed to be perfectly orthogonal. We saw that the condition number of the computed basis V k+1 plays a role in the bounds of the backward error. Further, we have seen that the norm of the backward error A depends on κ(H k ). We have seen that large κ(H k ) are acceptable if the linear systems are only solved to a loose tolerance (18). Otherwise we argued that even if this condition number is large, the restriction to the most important part of the recurrence (that is, what is left after purging the small eigenvalues of H k ) can have a small backward error. For Hermitian matrices A, we have shown that there is a Hermitian backward error A such that the computed basis, that is, the columns of V k+1 , spans a Krylov subspace associated with A + A. However, as in the case of standard Arnoldi [15], the small (k + 1) × k matrix associated this subspace is generally not the computed Hessenberg matrix.
Finally, we noted that our error analysis yields a sensible condition for when to declare breakdown. If this condition is met, we could derive a new set of backward error bounds, which show that an invariant subspace of a perturbed matrix has been found.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.