Disguised and new Quasi-Newton methods for nonlinear eigenvalue problems

In this paper we take a quasi-Newton approach to nonlinear eigenvalue problems (NEPs) of the type $M(\lambda)v=0$, where $M:\mathbb{C}\rightarrow\mathbb{C}^{n\times n}$ is a holomorphic function. We investigate which types of approximations of the Jacobian matrix lead to competitive algorithms, and provide convergence theory. The convergence analysis is based on theory for quasi-Newton methods and Keldysh's theorem for NEPs. We derive new algorithms and also show that several well-established methods for NEPs can be interpreted as quasi-Newton methods, and thereby provide insight to their convergence behavior. In particular, we establish quasi-Newton interpretations of Neumaier's residual inverse iteration and Ruhe's method of successive linear problems.

1. Introduction. One of the most common techniques to improve the convergence or efficiency of Newton's method for nonlinear systems of equations is to replace the Jacobian matrix with a different matrix. Among these quasi-Newton method constructions, sometimes called inexact Newton methods, the most common variation is to keep the Jacobian matrix constant. The factorization of this matrix can be precomputed before carrying out the iterations. This is beneficial, e.g., in situations where the problem stems from a discretization of a PDE, as the resulting system is often large and the Jacobian matrix is sparse with a structure allowing a sparse LU-factorization to be pre-computed.
In this paper we consider nonlinear eigenvalue problems (NEPs) of the type where M : Ω → C n×n . There are various flavors of Newton's method available in the literature (further discussed below) for this class of NEPs. Some of these methods do have the property that the matrix in the linear system to be solved in every iteration remains constant. However, these methods are in general not seen as Jacobian matrix modifications of Newton's method, but are often derived from quite different viewpoints. In this paper we investigate methods resulting from modifying the Jacobian matrix in various ways, and illustrate differences, similarities and efficiency of the resulting methods. It turns out that several well-established approaches for NEPs can be viewed as quasi-Newton methods.
In the NEP-class that we consider in this paper Ω ⊂ C is a closed set, M is analytic in Ω and we suppose λ ∈ Ω. We call the vector v a (right) eigenvector if it satisfies (1.1) and u the left eigenvector if it satisfies u H M (λ) = 0, u = 0. (1.2) We call (λ, v, u) an eigentriplet of (1.1). Without loss of generality we phrase the NEP as a system of equations which is equivalent to (1.1) if c ∈ C n is not orthogonal to the eigenvector v. This condition is not problematic in practice since c can be chosen freely and thus will generically not be orthogonal to any eigenvector. The quasi-Newton approach to (1.3) consists of generating sequences of approximations (µ 1 , x 1 ), (µ 2 , x 2 ), . . . from the relationJ whereJ k is an approximation of the Jacobian matrix The eigenvector and eigenvalue updates will be denoted ∆x k = x k+1 − x k and ∆µ k = µ k+1 − µ k . We consider four specific modifications of the Jacobian matrix, briefly justified as follows. From a quasi-Newton perspective, the most common approach consists of keeping the Jacobian matrix constant, i.e., setting where σ = λ 0 is the starting value for the eigenvalue and x 0 the starting vector for the eigenvector approximation. In this manuscript we refer to this as Quasi-Newton 1 (QN1). In Section 2.1 we show how (1.4) with Jacobian matrix approximation (1.6) can be reformulated such that in every iteration we need to solve one linear system associated with the matrix M (σ). We shall later show (in Section 2.1) that a more accurate approximation of the Jacobian matrix leads to an algorithm which has the same computational cost per iteration as QN1, i.e., it involves the solution of one linear system with the matrix M (σ) per iteration. More precisely, we keep only the (1,1)-block constant by setting (1.7) The quasi-Newton method (1.4) with Jacobian matrix approximation (1.7) will be refered to as Quasi-Newton 2 (QN2). We also investigate a method (which we call Quasi-Newton 3) corresponding to keeping the (1, 1)-block constant as in QN2, but also replace the derivative in the (1,2)-block. We replace the derivative with a finite difference involving the future eigenvalue approximation λ k+1 , i.e., we set where we use the standard notation for divided differences Note that the modification of the Jacobian matrix in (1.8) makes the iteration (1.4) implicit in the sense that the formula (1.4) for the new approximation (λ k+1 , v k+1 ) involves λ k+1 in a nonlinear way. Many implicit variations of Newton's method have been considered in the literature, see e.g., [12,18] and references therein.
It turns out that certain implicit variations of Newton's method improve the convergence and sometimes even increase the convergence order. In contrast to many other implicit Newton methods, our choice of the Jacobian matrix is done with the goal of having a method whose iterates can be computed without solving a (computationally demanding) nonlinear system of equations. This is possible for the specific choice (1.8), as we shall illustrate in Section 2.
We consider one more modified Jacobian matrix which also leads to an implicit method, but now implicit in the eigenvector. The vector x k in the (1,2) block is replaced by the future vector x k+1 , such that we obtain Our study has the following conclusions and contributions: • QN1 and QN2 can be phrased as algorithms only involving one linear solve with M (σ) per iteration (Section 2.1-2.2) • QN3 is equivalent to Neumaier's residual inverse iteration [25] (Section 2.3) • QN4 is equivalent to Ruhe's method of successive linear problems [29] (Section 2.4) • We provide exact characterizations of the convergence of factor for QN1 and QN2, and establish that the convergence factor of QN2 and QN3 are identical (Section 3.1-3.2) • We show how to adapt fundamental theory for inexact Newton method [5] to study QN4 (Section 3.3) • We provide generalizations of convergence rate dependence on eigenvalue clustering analogous to methods for linear eigenvalue methods (Section 4) The studied properties of the methods are illustrated in numerical simulations in Section 5.
Newton's method for linear and nonlinear eigenvalue problems has been studied for decades, and the field is still under active development, as can be observed in the summaries in [22,40]. The technique of deriving methods using an augmented system as in (1.3) was investigated already in 1950's by Unger [38], and was key to characterizing the relationship of inverse iteration (for linear eigenvalue problems) as described by Peters and Wilkinson in [27]. Newton's method based on solving the nonlinear equation β(λ) = 0 where M (λ)v = β(λ)e p was presented in [26] and independently by an essentially equivalent procedure by Lancaster [20]. In 1970's Ruhe [29] also pointed out the relevance of Newton-type methods and how they relate to the inverse iteration. He used the augmented systems of equations to derive variations of the inverse iteration for NEPs as well as the method of successive linear problems.
Newton-type methods have also more recently received considerable attention, e.g., in the PhD thesis of Schreiber [30], where two-sided generalizations of inverse iteration methods as well as Jacobi-Davidson methods are developed. The recent results by Effenberger and Kressner [19,7] contain a generalization of Newton-type methods that allows the computation of several eigenvalues simultaneously. This block Newton approach has been successful in the setting of continuation methods [4]. Variants of Newton methods where the linear system associated with M (σ) is only solved to some accuracy have been studied in [34]. A recent variant of the rational Krylov method can also be interpreted in a Newton-setting [1]. There are several convergence results for the residual inverse iteration and other Newton type methods [34,35,17,37] which are mostly presented in a separated fashion without using quasi-Newton interpretations and results for quasi-Newton methods.
2. Explicit reformulations of the quasi-Newton methods. The formulations of the QN-iterations above are not practical in general. In fact, it is even questionable to call the formulations of QN3 and QN4 iterative, since the Jacobian matrix depends on quantities in an implicit way. Nevertheless, it turns out that certain reformulations of QN3 and QN4 allow us to explicitly compute sequences of approximations, which satisfy (1.4). Also QN1 and QN2 have to be reformulated in order to become practical. We show how to carry out this reformulation to obtain algorithms which do not require solving many linear systems for different matrices, but only for M (σ).
We first make an observation in common for all the considered methods. The last row of the correction equation (1.4) is the same for all methods and for all the choices ofJ k as in (1.6)-(1.10) we see that c H (x k+1 − x k ) = −c H x k + 1. By induction, this implies that i.e., all iterates (except possibly the first iterate) are normalized.
where we define Moreover, by multiplication of (2.2) from the left with c H and using the fact that c H ∆x k = 0 due to (2.1) we have The above equations can be combined into an algorithm. As a precomputation we compute α 0 in (2.5b) and q 0 in (2.5a), and in the iteration we compute y k from (2.3), ∆µ k from (2.4) and subsequently update The algorithm is summarized in Algorithm 1. Remark 2.1 (Properties of Algorithm 1). The advantage of Algorithm 1 over the original formulation (1.4) consists in the fact that the matrix in the linear system to be solved in every step is M (σ). Therefore, a pre-factorization can be carried out in the original problem size. This is of advantage when the problem stems from a partial differential equation and the augmented matrixJ 1,k may be more difficult to factorize. Although the Jacobian matrix approximationJ 1,k ≈ J k is the most common in the context of quasi-Newton approximations, it does not appear very competitive in this setting. The other Jacobian matrix approximations, which are more accurate, lead to better methods in terms of convergence and do not in general require more computation.
Algorithm 1: Quasi-Newton 1 (n-dimensional formulation of (1.6)) input : Initial guess of the eigenpair (µ 0 , x 0 ) ∈ C × C n output: An approximation (µ k , x k ) ∈ C × C n of (λ, v) ∈ C × C n 1 Set σ = µ 0 and factorize the matrix M (σ) 2 Compute q 0 and α 0 from (2.5a) with the use of the factorization from Step 1 3 for k = 0, 1, 2, . . . do 4 Compute y k from (2.3) with the use of the factorization from Step 1 5 Compute ∆µ k and µ k+1 from (2.4) 6 Compute new eigenvector approximation x k+1 from (2.6) end 2.2. The n-dimensional formulation of QN2. A reformulation of (1.4) with the Jacobian matrix approximation (1.7) follows similar steps as the derivation in the previous section. To simplify the notation we set The derivation leads up to the formulas Remark 2.2 (Properties of Algorithm 2). Note that similar to Algorithm 1, Algorithm 2 requires only one linear solve with the matrix M (σ) per iteration. Since Algorithm 2 corresponds to a more accurate approximation of the Jacobian matrix, it is expected to converge faster than Algorithm 1. This difference characterized theoretically and computationally in Section 3.1 and Section 5.
Algorithm 2 involves the vector w σ , which can be computed as in (2.7), i.e., it would require one additional linear solve with M (σ) H . This extra linear solve can however be avoided by treating w σ as a fixed vector (chosen by the user) and then using that c is arbitrary such that we can chose it as c H = w T σ M (σ). This works rather well in practice, but fixing w σ instead of c may make the convergence factor larger if σ is close to the eigenvalue, as we shall further illustrate in Section 4.

The explicit formulation of QN3 is residual inverse iteration.
Although the Jacobi approximation J k ≈J 3,k in (1.8) involves eigenvalue approximations not yet computed, we can proceed with an elimination procedure similar to Section 2.1 and Section 2.2.
We multiply the first block in equation Algorithm 2: Quasi-Newton 2 (n-dimensional formulation of (1.7)) input : Initial guess of the eigenpair (µ 0 , Compute ∆µ k according to (2.8a) using u and w Compute new eigenpair approximation (µ k , x k ) from (2.8a) and (2.8d) by using the factorization computed in Step 1. end This gives an equation which we can simplify as follows: Note that (2.9c) is a scalar-valued equation, in one unknown variable µ k+1 , since x k can be viewed as a known vector. In fact, if we treat λ as a function of x, this is the inverse function of w H M (λ)x = 0 is, which is commonly known as the Rayleigh functional or generalized Rayleigh quotient [39,42,41]. This function generally exists, at least in a neighborhood of a simple eigenvalue [17, Proposition 2.1], and it is a computable quantity for many problems. By multiplying the first block row in (1.4) from the left by M (σ) −1 , we obtain Under the assumption that the Rayleigh functional in (2.9c) is computable, the relations (2.9c) and (2.10b) form an explicit algorithm. In fact, this algorithm is already extensively used in current research, where it is commonly known as residual inverse iteration and it was first introduced by Neumaier in [25]. Residual inverse iteration also forms the basis of some recent state-of-the-art algorithms for NEPs, most importantly the nonlinear Arnoldi method [39].
Theorem 2.3. The Quasi-Newton method (1.4) with the modified Jacobian matrix (1.8) is equivalent to residual inverse iteration as described in [25].
Remark 2.4 (Relation between quasi Newton variant 2 and residual inverse iteration). Due to the analyticity of M (λ), the residual inverse iteration (2.9c) in (2.10b) can also be expressed as From these formulas we see directly that for linear eigenvalue problems where M (λ) = A − λI, QN2 and residual inverse iteration are equivalent. Hence, they are both generalizations of the standard inverse iteration method. This is consistent with the convergence analysis in Section 3 which shows that QN2 and QN3 have the same convergence factor.
2.4. The explicit formulation of QN4 is the method of successive linear problems. In the previous subsection we saw that iterates satisfying (1.4) with the modified Jacobian matrix (1.8) can be computed in practice and the resulting algorithm is in fact equivalent to a well-known method. The Jacobi approximation J k ≈J 4,k in (1.10) also involves a quantity which we do not have access to at iteration k, the vector x k+1 . We now show that similar to QN3, we can carry out an elimination such that the update can be computed in an explicit way. This algorithm also turns out to be equivalent to a well-established method.
The first block row in (1.4) with approximation (1.10) simplifies to Since we know that the iterates x 1 , x 2 , . . . are normalized, we directly identify (2.11b) as a (linear) generalized eigenvalue problem where ∆µ k is the eigenvalue. Hence, we can construct an iteration satisfying (1.4) with Jacobi approximation (1.10) by repeatedly solving the generalized eigenvalue problem (2.11b) and updating the eigenvalue µ k+1 = µ k + ∆µ k . This method is known as the method of successive linear problems and was studied and used by Ruhe in [29], where it was justified directly from a Taylor expansion of M (λ). Theorem 2.5. The Quasi-Newton method (1.4) with the modified Jacobian matrix (1.10) is equivalent to the method of successive linear problems [29].
3.1. Convergence factor analysis of QN1 and QN2. In order to characterize the convergence of QN1 and QN2 we will derive a first-order result. More precisely, we will describe the local behavior, if (x k , µ k ) is close to (λ, v), we describe the behavior with a matrix A ∈ C (n+1)×(n+1) such that In general, we have linear convergence, with a local convergence factor given by the spectral radius of A. The explicit form of A for our first two quasi-Newton methods are given in following theorems. Theorem 3.1 (Local convergence Algorithm 1). Suppose the sequence (µ 1 , x 1 ), (µ 2 , x 2 ), . . . is generated by Algorithm 1 started with (µ 0 , x 0 ) and suppose the sequence converges to the eigenpair (λ, v). Then, the sequence satisfies Proof. For notational convenience letJ 1 : C n+1 → C n+1 denote the function corresponding toJ 1,k in (1.6). In this fixed-point setting, our quasi-Newton method can be expressed as The A-matrix in (3.1), corresponding to a fixed point map, is given by the Jacobian of ϕ. In our case this can be explicitly expressed, by using the structure of the Jacobian matrix evaluated in the eigenpair More precisely, we have The A-matrix in (3.2) follows from (3.5b) after the application of the Schur complement formula forJ −1 1, * . Theorem 3.2 (Local convergence Algorithm 2). Suppose the sequence (µ 1 , x 1 ), (µ 2 , x 2 ), . . . is generated by Algorithm 1 started with (µ 0 , x 0 ) and suppose the sequence converges to the eigenpair (λ, v). Then, the sequence satisfies The proof follows the same reasoning as in the proof of Theorem 3.1, except that the Jacobian matrix of the fixed point map ϕ in (3.5a) in this case becomes The application of the Schur complement formula directly leads to (3.6).

3.2.
Local convergence of QN3. Note that QN3 is not a fixed point iteration in the formulation (1.8). However, since residual inverse iteration and QN3 are equivalent in the sense of Theorem 2.3, we already have a convergence factor available in [17]. More surprisingly, the convergence factor for residual inverse iteration (given in [17]) is identical to the convergence factor of QN2. Corollary 3.3 (Convergence factor equivalence QN2 and QN3). The non-zero eigenvalues of A 2 given in (3.6) are the same as the non-zero eigenvalues of and the convergence factors of QN2 and QN3 are the same. Proof. Since QN3 is equivalent to residual inverse iteration according to Theorem 2.3 we can directly use the convergence characterization in [17]. More precisely, [17,Theorem 3.1] states that the convergence factor of residual inverse iteration is the largest eigenvalue of the matrix B in (3.7).
It remains to show that the non-zero eigenvalues of B are the same as the nonzero eigenvalues of A 2 given in (3.6). Clearly, the non-zero eigenvalues of A 2 are the non-zero eigenvalues of the (1,1)-block of A 2 . The equivalence is based on the general property that if a matrix C and vectors c and v satisfy, c H C = 0 and c H v = 1, then C and C(I − vc H ) have same non-zero eigenvalues. This can be seen from the fact that if By using this general property where C is the (1,1)-block of (3.6), we obtain B = C(I − vc H ) where B is given in (3.7).
3.3. Local convergence of QN4. The quasi-Newton method corresponding to (1.10) is different in character in comparison to the other quasi-Newton methods we have considered. This methos does not involve the computation of a linear system for a constant matrix. Although some convergence results for the method of successive linear problems (and therefore also QN4) are available in the literature [15], it is natural in our quasi-Newton approach to characterize the convergence using general results for quasi-Newton methods. It turns out that (1.10) is a very accurate approximation of the Jacobian matrix and we can apply results for quasi-Newton methods given in [5].
The characterization in [5] is mainly based on a quantity which describes the inexactness the quasi-Newton method by comparing it with a step involving the exact Jacobian matrix. More precisely, we consider the vector r k in [5], which in our context becomes Various results are given in [5] in terms of the norm of r k . In particular, the result [5, Theorem 3.3] demonstrates that we have a local quadratic convergence if r k ≤ O( F k 2 ). Due to the fact that all iterates are normalized as in (2.1), this condition can be simplified in our case and stated as In order to use this result, we now suppose that the Jacobian matrix in the eigenpair denoted J * and defined by (3.4) is invertible. The invertability condition is satisfied if the eigenpair (λ, v) is such that c H v = 0 and the eigenvalue λ is simple. This implies that the function F : C n+1 → C n+1 is invertible in a neighborhood of (λ, v). If (µ k , x k ) is in this neighborhood, we have from the implicit function theorem that Theorem 3.4 (Convergence QN4). Suppose (µ 1 , x 1 ), (µ 2 , x 2 ), . . . are iterates satisfyng (1.10) and suppose they converge to the eigenpair (λ, v). If (λ, v) is a simple or semi-simple eigenpair, then (µ k , x k ) converges at least quadratically.
Proof. From (3.8) and properties of the two-norm we have In the last step we used that, due to the assumption that (µ 1 , x 1 ), (µ 2 , x 2 ), . . . converges to an eigenpair, we have for sufficiently large k. By using the expansion (3.9) we find directly that r k ≤ O( M (µ k )x k 2 ), which by [5, Theorem 3.3] implies quadratic convergence.

Interpretation of convergence factors.
In order to provide further insight to the linearly convergent quasi-Newton methods, we now make characterizations of the matrix A. To this end, we use results associated with (what is commonly referred to as) Keldysh's theorem; see the general formulation in [23], and the more recent descriptions in the context of NEPs, e.g. [3,34]. For simple eigenvalues, Keldysh's theorem implies that there exists a function R 1 (σ), analytic in a neighborhood of the eigenvalue, such that where (λ 1 , v 1 , u 1 ) is the eigentriplet of a simple eigenvalue. We first observe that the convergence factor of QN2 and QN3 can be directly analyzed with Keldysh's theorem, since In the last inequality we used that The relationship (4.2) indicates that the convergence factor depends linearly on the shift eigenvalue distance, and linearly in M ′ (λ 1 ). Remark 4.1 (Double non-semisimple eigenvalues). In case the eigenvalue λ 1 is a double non-semisimple eigenvalue, there exist so called generalized eigenvectors v 1 and u 1 such that where u and v are the left and right eigenvectors corresponding to λ 1 . According to [3, Theorem 2.6] with L = 1 and m 1 = 2, there is a neighborhood U of λ 1 where we have the expansion where R(σ) is analytic in U . Then, instead of (3.6) the iteration matrix A 2 is of the form where R(σ) contains the contribution from all the eigenvalues other than λ 1 . Thus the iteration matrix A − vc T contains the factor 1 σ−λ1 and unlike in the case of a simple eigenvalue, the convergence factor is not asymptotically proportional to |λ 1 − σ|.

Eigenvalue clustering and condition number.
The convergence factor bound (4.2) provides insight on how the convergence depends on the shift if the shift-eigenvalue distance is small (consistent with what was pointed out in [17]). We now show that different insight can be provided by using a more general form of Keldysh's theorem. This applies to the situation when the shift-eigenvalue distance is not necessarily small.
Inverse iteration for linear eigenvalue problems (with normalization c H v = 1) has the following property for diagonalizable matrices. The convergence factor for the eigentriplet (λ 1 , v 1 , u 1 ) can be bounded in terms of reciprocal eigenvalue-shift distances weighted with the condition number where κ i = u i v i /|u H i v i | is the eigenvalue condition number (following the standard definition [21]) and P 1 is the projector P 1 = I − v 1 c H .
In order to generalize this property, we use a more general form of Keldysh's theorem. We let Γ ⊂ Ω be a simple, closed, piecewise-smooth curve and denote the eigenvalues in its interior by λ 1 , · · · , λ k ∈ int(Γ). Then, Keldysh's theorem states that where R Γ is analytic in int(Γ). The following result provides an analogue of the eigenvalue clustering property (4.4), under the assumption that R Γ (σ) is small. Corollary 4.2 (Eigenvalue clustering). Suppose that Ω is a closed simply connected domain with boundary Γ. Suppose that M is analytic in this domain and that all the eigenvalues are simple. Denote the corresponding eigentriplets by (λ 1 , v 1 , u 1 ),. . . , (λ k , v k , u k ), with normalization c H v 1 = · · · = c H v k = 1. Then, the convergence factor for QN2 and QN3 are bounded by where κ i is the eigenvalue condition number for NEPs, is the remainder-term in Keldysh's theorem in (4.5).
Proof. The result follows the steps in (4.2) but instead using the form (4.5) and matrix norm inequalities.

4.2.
Characterization of Keldysh's remainder term R Γ (σ). The analysis in the previous subsection indicates that a dependence on the eigenvalue clustering similar to the linear case can be expected under the condition that R Γ is small. Keldysh's theorem is, in a certain sense, a matrix version of the partial fraction expansion of an analytic function (also known as Mittag-Leffler's theorem) and has been characterized in [10]. See [45] for a partial fraction expansion characterization for delay eigenvalue problems. A precise characterization of R Γ and of its norm for the general case using [10] is somewhat involved. We take a less ambitious approach and point out certain situations where it is small or vanishes. Although R Γ does not always vanish in the limit when Γ encloses C, it does vanish under certain assumptions. We characterize several of such situations next, and note that these results are general for nonlinear eigenvalue problems. Therefore they may be of interest also outside the scope of quasi-Newton methods.
This leads directly to a sufficient condition for vanishing R Γ , involving M (λ) −1 in the limit λ → ∞. Then, the set of eigenvalues is finite and Proof. Since (4.7) implies that for every ε > 0 there exists an R such that sup |z|>R M (z) −1 < ε, we have in particular that there exists r such that M (z) −1 has no poles outside a disk of radius r. The eigenvalues of (1.1) are roots of the analytic function det (M (λ)). An analytic function only has a finite number of roots in a compact subset of the complex plane, and we therefore only have a finite number of eigenvalues in the disk of radius r. Take Γ to be a circle of radius r. Then, using the representation given by Lemma 4.3, we get the bound Letting r → ∞, the claim follows. Remark 4.5 (Generalizations to higher order multiplicities). Lemma 4.3 and Lemma 4.4 have the assumption that the eigenvalues are simple. A generalization to higher algebraic and/or geometric multiplicities seems feasible but more involved. Then, the expression (4.6) follows from [3, Corollary 2.8], and the fact that when the contour Γ encircles λ i and z, and ℓ ≥ 1, it holds that by the residue theorem. Then the condition ||M (λ) −1 || → 0 as |λ| → ∞, and the inequality (4.9) imply that R Γ (z) = 0. From the above lemmas we conclude the following result which states that the R Γ vanishes if the NEP is the sum of a polynomial with leading non-singular coefficient and a term which decays sufficiently fast.
Theorem 4.6. Suppose M is analytic in C and suppose all eigenvalues are simple. Moreover, suppose M (λ) is of the form A i λ i for some matrices A 0 , . . . , A N ∈ C n×n such that that A N is non-singular, and 2. λ −N F (λ) → 0 as |λ| → ∞. Then, R Γ (λ) = 0 and the representation (4.8) holds for M (λ) −1 .
In addition to the polynomial eigenvalue problem with invertible leading coefficient matrix (eventually perturbed in the sense of Theorem 4.6) the conditions of Lemma 4.4 (and the representation (4.8), subsequently) hold for several problems in the literature. For instance the following class of rational eigenvalue problems are often encountered in practice [2]. Let A, B and C i , 1 ≤ i ≤ k, be square matrices such that B is invertible, and let where σ i are given poles. These results hold also in the context of certain modifications of the symmetric eigenvalue problem [14] and [33]. Let A, B ∈ C n×n be such that B is invertible, and let s : C → C be a function such that s(λ) → C, C constant, as |λ| → ∞. Let The condition on the limit behaviour of s(λ) holds in particular for [14,Example 2].
Remark 4.7 (A counterexample). Although the above theory shows that R Γ does vanish in many situations similar to the linear case, it is not always zero, as can be seen from the example where f is a scalar analytic function. Then, and when we select Γ such that encloses the two eigenvalues λ 1 = 1 and λ 2 = 2, we have 5. Numerical simulations.

Rational eigenvalue problem.
We illustrate several properties of the results with numerical simulations 1 . We consider the problem 'loaded string' from the NLEVP collection [2]. The problem is of the form where A, B, C ∈ R n×n , and B is invertible. We set n = 20. All the eigenvalues are real positive, and to make the spectrum more clustered in the left end, we multiply the original coeffient matrix C by n. The spectrum of the problem is shown in Figure 5.1.  We first place the initial value (µ 0 , x 0 ) close to the right-most eigenvalue λ ≈ 5170, such that µ 0 = λ + 5.0 and where v is the exact eigenvector corresponding to λ and a is a scalar. We set c = x 0 and σ = µ 0 . Since we are here mainly concerned with convergence properties, we use computation with high precision arithmetic with a sufficiently high precision, such that round-off errors are not influencing the figures. As shown in Figure 5.2, the performance of the QN1 is found to be sensitive to the distance of x 0 from v. As expected from Corollary 3.3, the convergence curves of the Quasi-Newton 2 and the residual inverse iteration are very close to each other. Figure 5.3 shows the numerically estimated convergence factors and also the a priori convergence factor estimates. For the QN1 the a priori converge factor equals ρ(A 1 ), i.e., the spectral radius of the matrix A 1 given in Theorem 3.1. For the QN2 and QN3, i.e., the residual inverse iteration, it equals ρ(B), where B is given in Corollary 3.3. The estimated convergence factor ρ k at iteration k is computed by where w k = [ v k µ k ] and w * denotes the exact solution w * = [ v λ ]. Then, we consider the initial value (µ 0 , x 0 ) close to a left-end eigenvalue λ ≈ 9.07, such that again µ 0 = λ + 5.0, x 0 = v + a · [1 . . . 1] T (a > 0), and c = x 0 and σ = µ 0 . As can be expected from the bound given in 4.2, the convergence is now slower since λ is in a cluster of eigenvalues. This is depicted by Figure 5.4. Again, the performance of the QN1 is found to be sensitive to the distance of x 0 from v.
As shown in Figure 5.4, the convergence is now slower than for the right-most eigenvalue, as is expected from the clustering of the spectrum depicted in Figure 5.1,

Quadratic eigenvalue problem.
In this section we illustrate the influence of the eigenvalue clustering on the convergence factor. The bounds for the convergence factor illustrating this effect were derived in Section 3 and can be clearly identified from the following example. Consider the quadratic eigenvalue problem  where A 1 , A 2 ∈ C 10×10 are diagonal matrices. The set eigenvalues of this problem is the uninion of eigenvalues A 1 and A 2 . We choose the diagonal elements of A 1 and A 2 such that eigenvalues of M (λ) are 0.1 and 19 equally distributed points on the circle of radius r, as illustrated in Figure 5.5 for r = 0.5.
We first place the initial eigenpair approximation (µ 0 , x 0 ) to the origin, such that µ 0 = 0 and where v is the exact eigenvector corresponding to the eigevalue 0.1. We set c = x 0 and σ = µ 0 . Figure 5.6 shows the convergence of the methods and also the numerically estimated convergence factors for the three first quasi-Newton methods. QN1 is found again to have the slowest convergence, and the a priori convergence factor estimates computed using Theorem 3.1 and Corollary 3.3 are again found to be sharp.
Let ρ r be the spectral radius of the iteration matrix B given in Corollary 3.3, i.e., the convergence factor for QN2 and QN3 (the residual inverse iteration). In Figure 5.7 we illustrate how ρ r behaves as a function of r, the radius of the circle. The value of ρ r is computed for 10 different values of r varying from 10 −1/2 to 10 5 . As expected from Corollary 4.2 we observe that ρ r ∼ 1/r.

A large scale problem.
We now consider a large-scale NEP which arises in the study of waves traveling in a periodic medium [16,36]. More precisely, the waveguide eigenvalue problem, without the Cayley transformation, associated to the waveguide described in [16, Section 5.2] is considered. The problem is formulated as We choose the discretization parameters n x = 200 and n z = 201, which means that the size of the NEP is n = n x n z + 2n z = 40602. The matrix C T 2 and the second degree polynomials Q(λ) and C 1 (λ) are sparse, and the matrix P (λ) is dense and it is defined by nonlinear functions of λ involving square roots of polynomials. The matrix-vector product P (λ)w is efficiently computed using two Fast Fourier Transforms (FFTs) and a multiplication with a diagonal matrix. The linear systems involving the matrix M (σ) can be solved by precomputing a Schur complement. See [16] for a full description of the problem.
We compare QN1, QN2 and QN3 (residual inverse iteration) for approximating a specific eigenpair. The pair (λ, v) denotes the accurate approximation of the wanted eigenpair. The shift σ is selected close to the wanted eigenvalue λ, more precisely, |σ − λ| ≈ 0.42 (see Figure 5.8), and the initial guess x 0 is selected such that x 0 − v ≈ 10 −5 . The error is computed as the absolute value of the distance between the wanted eigenvalue and the current eigenvalue approximation, namely |µ k − λ|. If an initial guess of the eigenvector is provided, all the methods present similar convergence rate and QN2 and QN3 are slightly faster then QN1 (see Figure 5.9.) The block (1, 2) of the matrix A 1 defined in (3.2) has norm approximatively 10 −3 . This may suggest that the spectral radius of this matrix is close to the spectral radius of B given in (3.7). In particular the convergence rate of QN1 is expected to be close to the convergence rate of QN2 and QN3. This is consistent with the numerical simulation. With a random initial guess x 0 of the eigenvector, QN1 does not converge whereas QN2 and QN3 still converge with the same convergence rate but with they require more iterations since the initial error is larger. See Figure 5.9. This can be justified observing that, in this case, the block (1, 2) of the matrix A 1 defined in (3.2) has norm approximatively 10 −1 . Therefore, the convergence factor of QN1 is expected to be significantly different from QN2 and QN3 by using the previous reasoning.
6. Conclusions. We have here presented four iterative methods and showed how they can be analyzed with techniques of quasi-Newton methods, and Keldysh's theorem. We have also illustrated how two well-established methods can be interpreted as quasi-Newton methods. A secondary conclusion that we wish to stress here is that, in general, methods of this type can in an insightful way be analyzed in the framework of quasi-Newton methods. We have illustrated how linear, as well as higher order convergence, can be approached with quasi-Newton results. There are many results for quasi-Newton methods, which can potentially be applied to these methods, e.g., techniques which improve the convergence basin [9,11], scaling techniques [6], adaption for non-smooth (or almost non-smooth) problems [24] and other convergence results [44].
The method corresponding to keeping only the (1,1)-block constant, i.e., (1.7), is to our knowledge new method in the context of NEPs. Moreover, although it has several similarities with residual inverse iteration in terms of convergence and behavior for linear problems, it can be more attractive than residual inverse iteration. Note that unlike QN2, residual inverse iteration requires the solution to a nonlinear scalar problem. For some NEPs, the action of M (λ) is only implicitly available, e.g., in the form of a differential equation as in [28]. Therefore, the computation of a solution to the scalar nonlinear equation (2.9c) is computationally more demanding than computing the product M ′ (µ)r which is required in (2.8b).
We do have a negative conclusion in this paper. We have concluded that QN1 does not appear very competitive in practice since better convergence is usually achieved from QN2, although QN1 is based on the most common quasi-Newton approach, i.e., keeping the Jacobian matrix constant. There is another important Newton-like algorithm which we have not considered in this manuscript, the Jacobi-Davidson algorithm ( [30,32]). Although the Jacobi-Davidson algorithm does have interpretations in terms of Newton's method, as e.g., pointed out in [31, Section 6], our results here are not directly applicable. In [31, Section 6] the authors point out that the correction equation of the Jacobi-Davidson algorithm can be derived from Newton's method on the nonlinear equation G(x) = M (p(x))x where p(x) is the Rayleigh functional. Note that this nonlinear equation is different from our augmented system (1.3). Moreover, since the solution is a manifold and the solution (eigenvector) is not isolated in the standard sense which prevents us from directly applying results for quasi-Newton methods.
Finally, we wish to specifically stress that the value of the presented results may be of interested considerably beyond the scope of the presented methods. Several of the methods presented here form the basis of other state-of-the-art such as the subspace accelerated extensions of residual inverse iteration (the nonlinear Arnoldi method [39]), preconditioned versions [8] and inner-outer-iteration constructions as in [43]. Those extensions may possibly also be interpreted in a quasi-Newton setting, although the extension requires attention beyond this manuscript.