Abstract
A new iterative method for solving large scale symmetric nonlinear eigenvalue problems is presented. We firstly derive an infinite dimensional symmetric linearization of the nonlinear eigenvalue problem, then we apply the indefinite Lanczos method to this specific linearization, resulting in a short-term recurrence. We show how, under specific assumption on the starting vector, this method can be carried out in finite arithmetic and how the exploitation of the problem structure leads to improvements in terms of computation time. The eigenpair approximations are extracted with the nonlinear Rayleigh-Ritz procedure combined with a specific choice of the projection space. We illustrate how this extraction technique resolves the instability issues that may occur due to the loss of orthogonality in many standard Lanczos-type methods.
Similar content being viewed by others
1 Introduction
We consider the nonlinear eigenvalue problem (NEP) which consists of computing \((\lambda , v) \in D \times {{\mathbb {C}}}^n {\setminus } \left\{ 0 \right\} \) such that
where
with \(D \subset {{\mathbb {C}}}\) open disk, \(f_m: D \rightarrow {{\mathbb {C}}}\) analytic functions, and \(A_m \in {{\mathbb {C}}}^{n \times n}\) for \(m=1, \dots , p\). In this work we focus on the symmetric NEP, namely we assume that
Equivalently, \(M(\lambda )\) can be expressed as (2) with symmetric matrices \(A_m^T = A_m\) for \(m=1, \dots , p\). Notice that the complex matrices \(A_m\) and \(M(\lambda )\) are assumed to be symmetric but not necessarily Hermitian. The NEP arises in many areas such as: stability analysis, control theory, wave propagation, etc, and it has been studied in various settings. See the review papers [21, 39], the PhD theses [15, 52], and the problem collection [6]. Specialized software for NEPs has been recently produced: the package NEP-PACK [26], the library SLEPc [23], and even more open-source software. An approach for solving the NEP consists of constructing a linear eigenvalue problem (linearization) whose eigenvalues approximate, or correspond to, the eigenvalues of the original NEP [2, 3, 14, 20, 32, 35, 47]. When the NEP has specific structures such as being: symmetric, Hermitian, Hamiltonian, palindromic, etc, it is preferable to construct a linearization that preserves these structures. Theoretical and algorithmic aspects of structured linearizations have been extensively analyzed [10, 12, 18, 36, 37, 41, 51]. In particular, it has been shown that methods based on structure preserving linearizations, in certain applications, are more efficient than other methods that do not take into account the structure [38, 40]. For the polynomial eigenvalue problem (PEP), i.e., the special case where \(f_m(\lambda )\) in (1) are polynomials, symmetric linearizations are extensively characterized in [11, 24]. A well established class of methods for solving symmetric eigenvalue problems are Lanczos-like methods. More precisely, the Lanczos method, and its variants, can be applied for solving symmetric generalized eigenvalue problems \(A x = \lambda B x\) where \(A,B \in {{\mathbb {C}}}^{n \times n}\) are symmetric matrices. The original approach [31] was developed for the case \(B=I\), a generalization for B positive definite is presented in [44, Ch.15, Sect.11]. A further extension of this approach, known as indefinite Lanczos method, for the case where A and B are Hermitian or symmetric matrices is discussed in [45] and [4, Section 8.6.1]. Lanczos methods belong to the class of Krylov methods. They exploit the fact that, during the construction of the orthogonal basis of the Krylov space, due to the symmetry of the problem, the orthogonalization can be performed in a more efficient way with a three-term recurrence.In exact arithmetic, this approach reduces drastically the complexity. However, in floating-point arithmetic and without further specializations, the basis vectors are often affected by loss of orthogonality, resulting in slow convergence of the Ritz pairs and numerical instability [1, 46, 48, 55]. Even if Lanczos methods are in certain cases unstable and in general less efficient, in terms of number of iterations, with respect to Arnoldi methods, they can be efficiently implemented in a distributed computing framework. On the other hand, Arnoldi methods are more difficult to parallelize due to the full orthogonalization procedure. Moreover, there are various ways to resolve the instability and improve the efficiency of Lanczos methods such as: selective and partial re-orthogonalization, deflation, structured restart techniques, etc. See [1, 46, 48, 55] and references therein.
In this work, we present a new symmetric linearization for the symmetric NEP, resulting in a symmetric, linear, and infinite dimensional eigenvalue problem. Symmetric generalized eigenvalue problems can be solved with the indefinite Lanczos method [4, Section 8.6.1]. We present a new method that corresponds to adapting, in an efficient and robust way, the indefinite Lanczos method to the derived linearization. In order to cure the slow convergence, that is due to the loss of orthogonality, we extract the eigenpair approximations by using the nonlinear Rayleigh–Ritz procedure combined with a proper choice of the projection space, which is derived by the structure of the linearization. We also present numerical experiments that show that the proposed method is competitive, in terms of robustness and complexity, with Arnoldi-like methods for NEPs that perform the full orthogonalization.
The paper is organized as follows: in Sect. 2 we prove that the symmetric NEP is equivalent to a symmetric, linear, and infinite dimensional eigenvalue problem. In Sect. 3 we derive a method, in finite arithmetic, which consists of applying the indefinite Lanczos method to the derived linearization. In Sect. 4 we show how the computation time of the resulting method can be considerably reduced by exploiting additional NEP structures. In Sect. 5 we illustrate the performance of this new approach with numerical simulations by solving large and sparse NEPs. The simulations were carried out in the Julia programming language [9] with NEP-PACK [26], which is an open source Julia package for NEPs.
The method we derive can been seen as an Arnoldi-like method applied to an iteratively expanding linearization, or equivalently to a infinite dimensional linear operator. Other methods that are based on these ideas are: infinite Arnoldi [29] and its tensor variant [28], NLEIGS [22], and CORK [53]. There are also methods based on the bi-orthogonalization procedure, which also lead to a three-term recurrence, presented in [19, 33]. However, these methods and their variations, in the way they are presented and without further research, are not capable of taking advantage of the symmetry of the NEP.
In the rest of this work, vectors and matrices are denoted as \(v = [v_i]_{i=1}^{n}\) and \(A=[a_{i,j}]_{i,j=1}^{n,m}\) respectively, whereas bold letters represent block-vectors and block-matrices with infinite size, namely \(\varvec{v} = [v_i]_{i=1}^{\infty }\) with \(v_i \in {{\mathbb {C}}}^n\) and \(\varvec{A}=[A_{i,j}]_{i,j=1}^{\infty }\) with \(A_{i,j} \in {{\mathbb {C}}}^{n \times n}\). The matrix \([\varvec{A}]_k = [A_{i,j}]_{i,j=1}^{k} \in {{\mathbb {C}}}^{n k \times n k}\) consists of the main sub-matrix obtained by extracting the first k-blocks. The Kronecker product and the Hadamard (element-wise) product are denoted by \(\otimes \) and \(\circ \) respectively. The vectors \(e_j\) and \(\varvec{e}_j\) have zeros as elements except one in the j-th position whereas e and \(\varvec{e}\) are the vectors with all ones. Without loss of generality, after a change of variables in (2), we assume that the region of interest \(D \subset {{\mathbb {C}}}\) is a disk centered in the origin. Unless differently specified, the matrices \(M_i\) will denote the derivatives \(M^{(i)}(0)\). We will denote by \(A^T\) the transpose (not conjugate transpose) of the matrix \(A \in {{\mathbb {C}}}^{n \times n}\).
2 Indefinite Lanczos method in infinite dimensional settings
In order to derive a symmetric linearization for NEPs, we first review a specific linearization technique for PEPs. This technique consists of symmetrizing the companion linearization and it is presented in [41, Theorem 2.1] that we recall below. The approach is previously reported in [30, Ch.4 Sect. 2] for the scalar case.
Theorem 1
(Mehrmann and Watkins [41], Lancaster [30]) Consider the polynomial eigenvalue problem \(M(\lambda )v=0\) where \(M(\lambda )=\sum _{j=0}^k M_j \lambda ^j\) and \(M_k\) nonsingular. Then, the pencil \(A-\lambda B \in {{\mathbb {C}}}^{nk \times nk}\), where
has the same eigenvalues as \(M(\lambda )\). If \(M_j\) are symmetric, i.e., \(M(\lambda )\) is symmetric, then A and B are symmetric. If \(M(\lambda ) v = 0 \), then \([v^T, \lambda v^T, \dots , \lambda ^{k-1} v^T]^T\) is an eigenvector of \(A-\lambda B\).
The proof is based on the following argument. A pair \((\lambda ,x)\) that fulfill \(M(\lambda )x=0\) defines an eigenpair of the companion linearization [20], namely
Let us define the symmetrizer matrix S, which is nonsingular [41], as
To obtain (4) we multiply (5) on the left by the matrix (6).
The main disadvantage of using this linearization in practice is that the blocks forming the eigenvectors of the pencil, defined by (4), grow or decay exponentially, depending on the value of \(|\lambda |\). More precisely, the norm of the j-th block is \(|\lambda |^j \Vert x \Vert \) and, if the PEP has high degree, this leads to overflow or underflow when the eigenpairs of (4) are computed numerically. In order to resolve this issue, in this section we consider the scaled companion linearization presented [29, Section 5.1]. We extend the ideas used in Theorem 1 to symmetrize the scaled companion linearization. Moreover, we consider the NEP in its general form (1), therefore we derive a linearization that involves vectors and matrices with infinite size.
2.1 An infinite dimensional symmetric linearization
The NEP (1) is equivalent to a linear and infinite dimensional eigenvalue problem, see [29, Section 5.1]. More precisely, if \((\lambda ,x)\) is an eigenpair of (1), the following relation between vectors and matrices of infinite size is fulfilled
The equation (7) defines a linear and infinite dimensional eigenvalue problem
where \(\varvec{A}, \varvec{B}, \varvec{x}\) are matrices and vector of infinite size defined accordingly. Clearly, the linearization (7) is never symmetric. However, if the NEP is symmetric, i.e., it holds (3), then it is possible to symmetrize (7) with a similar technique as in Theorem 1. More precisely, since we consider a scaled and infinite companion linearization, in the following theorem we derive a scaled and infinite version of the matrix (6) that symmetrizes (7).
Theorem 2
(Symmetric linearization) Assume that the NEP (2) is symmetric, i.e., it holds (3), then there exists a unique matrix \(\varvec{C}\) such that
is a symmetrizer for (7), namely
is a symmetric eigenvalue problem. The vector \(\varvec{e}\) has infinite length with ones in all the entries. The coefficients of the matrix \(\varvec{C}\) fulfill the following relations
Proof
We start by observing that \(M_j\), for \(j \ge 0\), are symmetric matrices as consequence of (3). The relations (2) uniquely define a matrix \(\varvec{C}\) since the first column is fixed in (11a) and the j-th column is computed by the \((j-1)\)-th column in (11b). We start by showing that the matrix \(\varvec{C}\) is symmetric. Let us consider \(i>j\), namely \(i=j+k\) for some positive integer k. By iteratively using (11b) we obtain the relations
that combined together give
that is, \(c_{i,j}=c_{j,i}\). The case \(i<j\) is analogous and we conclude that the matrix \(\varvec{C}\) is symmetric.
By multiplying (8) on the left by the matrix (9) we get
and
The matrix \(\varvec{S}\varvec{A}\) is symmetric because \(\varvec{C}\) and \(M_j\), for \(j \ge 0\), are symmetric. By using (11a) we get that the first block-row of \(\varvec{S}\varvec{B}\) is equal to its first block column, whereas the equation (11b) and the symmetry of \(\varvec{C}\) gives the relation
which directly implies that the (i, j)-th and the (j, i)-th blocks of \(\varvec{S}\varvec{B}\) are equal. Hence the matrix \(\varvec{S}\varvec{B}\) is symmetric and (10) is a symmetric eigenvalue problem. \(\square \)
Remark 1
The relation between the eigenpairs of the original NEP (1) and the companion linearization (7) is presented with details in [29], where the space where the eigenvectors of (7) are defined is analyzed. The eigenpairs of (7) are also eigenvalues of (10). In case the matrix \(\varvec{S}\) is singular, (10) may have spurious eigenvalues. Therefore, since our approach is based on solving (10), it is essential to compute the residual with respect to the original problem (1).
Remark 2
The eigenvalue problems (7) and (10) have the same eigenpairs if the symmetrizer (9) is nonsingular, namely \(\varvec{S}\varvec{x}= 0\) only for \(\varvec{x}= 0\). In the next section we assume that \([\varvec{S}]_{2N}\) is invertible for an N large enough. This condition can be phrases in terms of solvability of a specific matrix equation as discussed in Observation 2.
The method that we refer to as infinite Lanczos, or shortly ILAN, consists of applying the indefinite Lanczos method (Algorithm 1), described in the next section, to the symmetric eigenvalue problem (10).
3 Infinite Lanczos method
3.1 Indefinite Lanczos method
Eigenpair approximations to the generalized eigenvalue problem \(Ax=\lambda Bx\), with \(A,B \in {{\mathbb {C}}}^{n \times n}\) symmetric matrices, not necessarily Hermitian, and A nonsingular, can be obtained by using the indefinite Lanczos method [4, Section 8.6.1] that is summarized in Algorithm 1. The method consists of computing an orthogonal basis of the Krylov space
by using, instead of the (standard) Euclidean scalar product, the indefinite scalar product defined by the matrix B, namely \(x^T B y\) is the B-product between \(x,y \in {{\mathbb {C}}}^n\). The fact that \(A^{-1} B\) is self-adjoint, with respect to this indefinite scalar product, leads to the property that the B-orthogonal basis of the Krylov space can be computed with a three-term recurrence. In particular, at the k-th iteration of Algorithm 1, the following relations are fulfilled
where the diagonal matrix \(\varOmega _{k+1}:={{\,\textrm{diag}\,}}(\omega _1, \dots , \omega _{k+1})\) and the tridiagonal matrix \(T_{k+1,k}=[t_{i,j}]_{i,j=1}^{k+1}\) contain the orthogonalization and normalization coefficients. The matrix \(Q_{k+1}\) is B-orthogonal in the sense of (15b) and its columns, generated with a three-term recurrence, span the Krylov space (14). The Ritz pairs of (15a), defined as follows,
provide an approximation to the eigenpairs of the original problem. Since the indefinite scalar product defined by B is in general degenerate, there may be cases of break down in Algorithm 1. We refer to [4, Section 8.6.1] and reference therein for a detailed discussion of this issue.
3.2 Infinite Lanczos method in finite arithmetic
We now derive a method that consists of applying the indefinite Lanczos method (Algorithm 1) to the symmetric eigenvalue problem
obtained by extracting the main block sub-matrices from (10), where N is a non-fixed parameter greater than the number of iterations performed in Algorithm 1. The method we derive is independent on N and, under the assumption that \(\varvec{S}\) given in (9) is invertibile, corresponds to apply Algorithm 1 directly to the linear infinite dimensional eigenvalue problem (10) with a specific starting vector. This equivalence is formally presented in Theorem 5 at the end of this section.
Algorithm 1 can be efficiently applied to (17) by exploiting the structure of the matrices (17). We start with Step 2 that can be performed as stated in the following result.
Theorem 3
(Action of \( [\varvec{S}\varvec{A}]_N^{-1} [\varvec{S}\varvec{B}]_N \)) Assume that \([\varvec{S}]_{2N}\) is invertible, let \(q_k \in {{\mathbb {C}}}^{Nn}\) be such that only its first k blocks are nonzero, corresponding to the columns of \(Q_k:=[{\tilde{q}}_1, \dots , {\tilde{q}}_k]\). Then, only the first \(k+1\) blocks of \(w=[\varvec{S}\varvec{A}]_N^{-1} [\varvec{S}\varvec{B}]_N q_k\) are nonzero, corresponding to the columns of \(W:=[w_1, \dots , w_{k+1}]\) given by
where \(D \in {{\mathbb {R}}}^{k \times (k+1)}\) is a diagonal matrix with coefficients \(d_{j,j+1}=1/j\) and
Proof
By using the specific structure of the matrices (12) and (13), the nonzero blocks of w fulfill \({{\,\textrm{vec}\,}}(W) = ([\varvec{S}\varvec{A}]_N)^{-1} ([\varvec{S}\varvec{B}]_N) {{\,\textrm{vec}\,}}([Q_k, \ 0])\). We can then derive the following relations
Hence \({{\,\textrm{vec}\,}}(W) = ([\varvec{A}]_{N})^{-1} ([\varvec{B}]_{N}) {{\,\textrm{vec}\,}}([Q_k, \ 0])\), this directly implies (18) and (19), c.f., [29, Section 4.2]. \(\square \)
By using the previous result, we conclude that in Algorithm 1, if \(q_1\) has only the first block which is nonzero, then \(q_k\) at the k-th iteration will have k nonzero blocks. This is due to the fact that, none of the steps, except Step 2, introduce fill-in in the vectors \(q_1, q_2, \dots , q_k\). In Step 4 of Algorithm 1 the products \(z^T q_{k}\), \(z^T q_{k-1}\) and \(z^T w\) are computed. Observe that the vectors multiplied by z have at most \(k+1\) nonzero blocks. Therefore, even if \(z=[\varvec{S}\varvec{B}]_N w\) is in general a full vector, only the first \(k+1\) blocks are required. These blocks can be computed as follows.
Theorem 4
(Action of \([\varvec{S}\varvec{B}]_N\)) Let us consider \(z:=[\varvec{S}\varvec{B}]_N w\), where w is given as in Theorem 3. Then the first \(k+1\) blocks of z, corresponding to the columns of \(Z:=[z_1, \dots , z_{k+1}]\), fulfill the relation
where \(G_{k+1} \in {{\mathbb {R}}}^{(k+1) \times (k+1)}\) has coefficients \(g_{j,1}=g_{1,j}=1/j\) for \(j=1, \dots , k+1\) and \(g_{i,j}=c_{i-1,j}/j\) and
Proof
Since w has only the first \(k+1\) blocks that are nonzero, we can express
By using (13) and that \(M_j = \displaystyle \sum _{m=1}^p f_m^{(j)}(0) A_m\), we can decompose
Equation (20) follows by combining (22) and (21), and by using the properties of the Kronecker product. \(\square \)
Observation 1
The scalar product between the vectorization of two matrices can be carried out directly in matrix form by using the Hadamard product as follows: \(({{\,\textrm{vec}\,}}Z)^T {{\,\textrm{vec}\,}}W = {\tilde{e}}^T (Z \circ W) e\) with \({\tilde{e}} \in {{\mathbb {R}}}^n\) and \(e \in {{\mathbb {R}}}^k\).
Observation 2
With the same reasoning as in Theorem 4, we can decompose (9) as
Therefore, we can relate the invertibility of \([\varvec{S}]_{2N}\) with the solvability of the following linear matrix equation
for any \(B \in {{\mathbb {C}}}^{2N \times n}\). Linear matrix equations are extensively studied in recent literature. See the review paper [49] and reference therein. In the numerical examples reported in Sect. 5 we never encounter a case when \([\varvec{S}]_{2N}\) was singular. A case when this matrix is obviously singular is when the NEP is defined by polynomial functions. Although the theory does not cover this case, we have successfully applied the method we are deriving without introducing any breakdown or instability.
Figure 1 illustrates the structure of the matrices and vectors, involved in Algorithm 1, when applied to (17) with a starting vector that has only the first block which is nonzero. At iteration k only the vectors \(q_{k-1}, q_{k}\) are needed, therefore they are the only vectors that need to be stored.
Algorithm 2 is the combination of the results presented in this section. More precisely, Algorithm 2 is the reformulation of Algorithm 1, applied to (17), where the nonzero blocks of the vectors \(q_k, q_{k-1}, w\), and the needed blocks of z, are stored as columns of the matrices \(Q_k, Q_{k-1}, W\) and Z. Moreover, the size of the linearization (17) is implicitly expanded at each iteration. Observe that at iteration k only the first \(2k+1\) derivatives \(f^{(j)}_m(0)\) for \(m=1, \dots , p\) and \(j=1, \dots ,2k+1\) are needed. We now conclude this section by showing the equivalence between Algorithm 1, directly applied to the infinite dimensional problem (10), and Algorithm 2.
Theorem 5
(Infinite dimensional equivalence) Assume that the matrix \(\varvec{S}\), given in (9), is invertible and let \(\varvec{q}_1\) be an infinite length vector with only the first block \(q_1\) nonzero. Then, Algorithm 1, with stating vector \(\varvec{q}_1\), is applicable to (10) and the matrices \(Q_k\), that have as columns the first k nonzero blocks of \(\varvec{q}_k\), \( T_k\), and \(\omega _{k}\) are equal to the homonymous matrices generated by Algorithm 2 with starting matrix \(Q_1=[q_1]\).
Proof
We denote by \(\varvec{q}_1, \varvec{q}_2, \dots , \varvec{q}_{k}\) the infinite-length vectors generated by Algorithm 1 and by \(Q_1, Q_2, \dots , Q_k\) the matrices generated by Algorithm 2. The proof is based on induction over the iteration count k. The result is trivial for \(k=1\). Suppose the results holds for some k. In Step 2 of Algorithm 1, by using that \(\varvec{S}\) is invertible, we have
By using the induction hypothesis, \(\varvec{q}_k\) has k nonzero blocks, corresponding to the column of the matrix \(Q_k\) generated at the \((k-1)\)-th iteration of Algorithm 2. Because of the structure of the matrices (12) and (13), we get that \(\varvec{w}\) has only the first k blocks which are nonzero, corresponding to the columns of the matrix W that fulfills (18). Therefore this matrix corresponds to the matrix computed in Step 2 of Algorithm 2. In the Step 3 of Algorithm 1 we compute \(\varvec{z} = \varvec{S}\varvec{B}\varvec{w}\). This vector is in general full. However, in the Step 4 of Algorithm 1 the products \(\varvec{z}^T \varvec{q}_k\), \(\varvec{z}^T \varvec{q}_{k-1}\) and \(\varvec{z}^T \varvec{w}\) are computed. By induction hypothesis, \(\varvec{q}_{k}\), \(\varvec{q}_{k-1}\) have respectively k and \(k-1\) nonzero blocks, corresponding to the columns of the matrices \(Q_{k}\) and \(Q_{k-1}\) generated by Algorithm 2. Therefore, only the first \(k+1\) blocks of \(\varvec{z}\) are required and they can be computed with the same reasoning of Theorem 4. More precisely, the first \(k+1\) blocks of \(\varvec{z}\) are the columns of the matrix Z that fulfills (20). Therefore this matrix coincides with the matrix generated by Step 3 of Algorithm 2. In order to conclude that \(\varvec{q}_{k+1}\) has only we first \(k+1\) nonzero blocks, corresponding to the columns of the matrix \(Q_{k+1}\) generated by Algorithm 2, we only need to use the equality \(\Vert M \Vert _F = \Vert {{\,\textrm{vec}\,}}(M) \Vert _2\), which holds for every matrix M. \(\square \)
3.3 Robust extraction of eigenpair approximations
We propose to enhance Algorithm 2 as follows. We consider the projected NEP
where \(V \in {{\mathbb {C}}}^{n \times k}\) is an orthogonal matrix. Under the assumption that V posses good approximation properties, eigenpair approximations to the NEP (1) are given by \((\lambda , Vz)\). This can be seen as the Galerkin projection method that uses the range of V as projection space. This technique for extracting eigenpair approximations, called nonlinear Rayleigh–Ritz procedure or subspace acceleration, is often used to improve properties of more basic algorithms, e.g., the nonlinear Arnoldi method [54], Jacobi–Davidson methods [7, 16], infinite Arnoldi [27], block preconditioned harmonic projection methods [56], and many more.
In our framework, there is a natural choice for the projection space. The matrix V is chosen as the orthogonal matrix whose columns span the subspace of vectors obtained by extracting the first column from \(Q_1, \dots , Q_k\) generated by Algorithm 2. The reason this matrix contains good approximation properties is due to the following argument. In the Ritz pairs extraction described in Sect. 3.1, the eigenvector approximations of (7) (or equivalently of (10)) are given by the first block row of the Ritz vectors (16). Thus, the eigenvector approximations to the NEP (1) are also obtained by the first block of these Ritz vectors and thus by the first block row of the Krylov basis, namely by the columns of the proposed matrix V.
The projected problem (23) has size k equal to the number of iterations, which is typically, in the context of Krylov methods, a small number, i.e., \(k \ll n\). Therefore, the projected problem (23) has small size and solving (23) is not the computationally dominating part of Algorithm 2. In the numerical experiments we have tested, for solving (23), the following methods: Beyn’s contour integral method [8, Algorithm 1], NLEIGS [22] and IAR [29]. The choice the method for solving (23) depends on the features of the original problem (1) and there is not a favorite candidate. For example, one may want to exploit that the projected problem (23) is defined by the same nonlinear functions and inherits several features of the original NEP (1), e.g., being symmetric, palindromic, polynomial, etc. Therefore, this problem can be solved in a more robust way with structure preserving methods.
4 Indefinite scalar product computation
Under the assumption that the linear systems with the matrix \(M_0\) can be efficiently solved, e.g., exploiting the sparsity, the dominating part of Algorithm 2 is the Step 3, namely the computation of (20), which has complexity \({\mathcal {O}}(k^2n)\). In this section we derive efficient methods for computing this quantity.
4.1 Computation of step 3: general case
The following theorem provides an effective approximation to (20) without any specific assumption on the coefficients of (2).
Theorem 6
Let \(U,V \in {{\mathbb {R}}}^{n \times q}\) the factors of the best rank q approximation, with respect to the Euclidean norm, to the matrix \(G_{k+1}\). Then
is such that
Proof
The approximation (24) is obtained by replacing \(G_{k+1}\) with \(U V^T\) in (20) and using \(u_j v_j^T \circ F_m = {{\,\textrm{diag}\,}}(u_j) F_m {{\,\textrm{diag}\,}}(v_j)\). The equation (25) follows by the triangular inequality and by the fact that the Frobenius norm is sub-multiplicative with respect to the Hadamard product. \(\square \)
The approximation (24) is effective, with q small, since the matrix \(G_k\), which is problem independent, has a fast decay in the singular values. In Fig. 2 the singular valuesFootnote 1 of this matrix are displayed for different sizes k. Moreover, the computation of (24), requires less computation time than (20) since the products with the Hankel matrices \(F_j\) can be efficiently computed with FFTs [34, Section 4]. The complexity for computing (24) is \({\mathcal {O}}(n k \log k)\).
4.2 Computation of step 3: delay eigenvalue problem
The stability analysis of delay systems of the form
is related to solving NEPs, referred to as delay eigenvalue problems, see [25, Ch. 2] and [42, Ch. 1 Sect. 1.2], defined as follows:
In this case the matrices \(F_m\) have at most rank one. More precisely, a direct computation leads to \(F_1=-e_1 e_1^T\), \(F_2=0\) and for \(m \ge 3\) we get \(F_m=-\tau _m v v^T\) with \(v_j=(-\tau _m)^{j-1}\). Therefore, the computation time of (24) is much lower than (20) since it involves only products with low rank matrices. The complexity for computing of (24) is reduced, by exploiting the low-rank structure, to \({\mathcal {O}}(n p k)\).
4.3 Computation of Step 3: polynomial plus low-rank structured NEPs
In certain applications (2) can be written as sum of a polynomial and a low-rank part. See, e.g., [15, Ch. 2 Sect. 4], [52, Ch. 1 Sect. 1.2], [6, gun problem] and [5, Sect. 6.2.2]. More precisely:
with \(U_m, U_m \in {{\mathbb {C}}}^{n \times r_m} \) and \(r_m \ll n\). In this case we split (20) in the polynomial and low-rank terms, namely \(Z=Z_p+Z_{lr}\) with
The term (29a) can be efficiently approximated as in (24) by exploiting the low-rank structure of the matrices \(F_m\). Moreover, since \(U_m U_m^T\) in (29b) have low rank, the computation of Z with (29), respecting the order given by the parentheses in (29b), requires less computation time than (20). The complexity of (29) is \({\mathcal {O}}((d+r)n)\) where \(r=\displaystyle \max _{d+1 \le t \le p} r_m\).
5 Numerical simulations
In the following numerical experimentsFootnote 2 we use, as error measure, the relative residual norm defined as follows
An eigenpair approximation is marked as “converged” if \({{\,\textrm{Err}\,}}(\lambda ,x)<10^{-8}\). The software used in these simulations is implemented in the Julia programming language [9], and publicly available in the Julia package NEP-PACK [26].Footnote 3
5.1 Delay eigenvalue problem
We consider the delay eigenvalue problem arising from the spatial discretization of the following partial delay differential equation
where \(a(\xi )=-\xi _1 \sin (\xi _1+\xi _2)\), resulting in a problem of the form (26) where all the matrices are real and symmetric. The spatial domain is partitioned with a uniform equispaced grid with N points in each direction. The Laplace operator is discretized by the 5-points stencil finite difference approximation, leading to a NEP of size \(n=N^2\), cf. [5, Section 6.2.1]. More precisely, the NEP is defined as \(M(\lambda )=-\lambda I + A_2 + e^{-\lambda } A_3\) where \(I \in {{\mathbb {R}}}^{n \times n}\) and the other matrix coefficients are given by
with \(h:=\pi /(N-1)\) discretization step. We run 50 iterations of Algorithm 2. In Figs. 3 and 4 we illustrate the robustness of the strategy for extracting the eigenpair approximations presented in Sect. 3.3. More precisely, we compare the standard approach for extracting the eigenpair approximations, which is based on the computation of the Ritz pairs, with the more robust approach consisting of solving the projected NEP. In Fig. 3 is displayed the spectrum and the converged eigenvalues, respectively computed with the Ritz and the projected NEP approach. In this example we solve the projected NEP with the Beyn contour integral methodFootnote 4 [8]. The error history is presented in Fig. 4. As expected, the convergence of the Algorithm 2, with the Ritz pair approximation, appear to be slower with respect to the convergence of the Algorithm 2 with the more robust eigenpair extraction based on solving the projected NEP. Therefore, an important aspect that strongly influences the convergence of Algorithm 2 consists in using a robust algorithm for solving the projected NEP. Similar conclusions are reached in [13], in our case this is even more critical due to the loss of orthogonality of the Lanczos method which Algorithm 2 is based on.
The performance of Algorithm 2 is affected by the method used for solving the projected NEP. In Table 1 we compare the time, and the number of computed eigenvalues, after 50 iterations of Algorithm 2 combined with three different NEP solves for the projected problem: Beyn contour integral method (with the same settings as before), IAR [29] (50 iterations) Footnote 5 and IAR (100 iterations). For the latter variation of Algorithm 2, besides the 11 eigenvalues in the disk centered in zero with radius 4, which is the region used in the Beyn contour integral method, several eigenvalues outside this disk are also computed.
5.2 A benchmark problem
We now illustrate the performance of Algorithm 2 for solving a NEP that is symmetric but not Hermitian. We consider the gun problem that belong to the problem collection [6]. This NEP has the following form
where \(\sigma _2 = 108.8774\). The matrices \(A_j \in {{\mathbb {R}}}^{9956 \times 9956}\), for \(j=1, \dots , 4\), are real and symmetric. The NEP (31) can be written in the form (28) since the matrix coefficients of the nonlinear part have low-rank, namely \({{\,\textrm{rank}\,}}(A_3)=19\) and \({{\,\textrm{rank}\,}}(A_4)=65\). The eigenvalues of interest are located inside the the closed disk centered in \(250^2\) and with radius \(5 \cdot 10^4\). Before applying Algorithm 2, the problem is shifted and scaled. We set the parameters to \(\lambda =\lambda _0 + \alpha {{\hat{\lambda }}}\) where \(\lambda _0=300^2\) and \(\alpha =(300-200)^2\). This problem has been solved with various methods [19, 22, 29, 33, 53] and, by numerical evidences, there are 21 eigenvalues in the region of interest. We now use IAR for solving the projected NEP. More precisely, we test two variants: IAR (50 iterations) and IAR (200 iterations). As showed in the numerical experiment in Sect. 5.1, the robustness of the whole Algorithm 2 is effected by the choice of the method used for solving the projected NEP. In Fig. 5 we can see that more eigenvalues converge when we solve more accurately the projected problem. The error history is presented in Fig. 6. Solving more accurately the projected NEP is necessary to compute the outermost eigenvalues.
5.3 A random symmetrized problem
In conclusion we illustrate how Algorithm 2 can be used for solving a nonsymmetric NEP. We introduce a symmetrization technique, consisting of doubling the problem size, based in the idea presented in [43, Sect. 5]. Namely, we define the symmetrized NEP as
Observe that if \(\left( \lambda , [y^T,x^T]^T \right) \) is an eigenpair of (32), then \((\lambda ,x)\) is an eigenpair of \(M(\lambda )\). We now consider the symmetrization, in the sense of (32), of the following NEP that is artificially constructed:
where \(A_j \in {{\mathbb {C}}}^{500 \times 500}\) are defined as follows: \(A_1\) is the bidiagonal matrix with elements equal to 500 in the upper and lower diagonal, \(A_2\) is the identity matrix, \(A_3=A_1/500\) and \(A_4\) is a diagonal matrix with elements equal to i (complex unit) in the lower diagonal. We perform 50 iterations of Algorithm 2 and solve the projected NEP with NLEIGS Footnote 6 [22] by targeting the eigenvalues contained in the square with opposite vertices in \(-1.5-1.5 i\) and \(0.5+0.5 i\). In Fig. 7 is illustrated the spectrum and the converged eigenvalues, in the region of interest, after 50 iterations of Algorithm 2. The error history is illustrated in Fig. 8.
6 Conclusions and outlook
We have presented a method for solving symmetric NEPs. We have also illustrated how the problem structure, in particular the structure of the matrices and functions in (2), can be exploited in order to reduce the computation time. However, there are NEPs that cannot be written in the format (2), e.g., the waveguide eigevalue problem [28], the reformulation of the Dirichlet eigenvalue problem with the boundary element method [17, 50], etc. For some of these problems, only a routine for computing \(M_k x\) is available. We believe that further research can potentially extend the applicability of Infinite Lanczos to such problems.
In the numerical experiment in Sect. 5.3, we have successfully solved a nonsymmetric NEP in the following way. Firstly we constructed a symmetric NEP (32) whose eigenvalues are also eigenvalues of the original NEP. Then we applied the infinite Lanczos to the symmetrized problem (32). The matrices (32) have clearly a very well defined block structure. We believe that infinite Lanczos can be further specialized for solving nonsymmetric NEPs by exploiting these structures. In conclusion we also believe that similar ideas can be extended to NEPs that are Hermitian, namely, \(M(\lambda )^H=M(\bar{\lambda })\) where \({{\bar{\lambda }}}\) represents the complex conjugate of \(\lambda \in {{\mathbb {C}}}\) and \(M(\lambda )^H\) the Hermitian, or conjugate transpose, of the matrix \(M(\lambda )\).
Notes
The singular values are computed in BigFloat arithmetic using the package GenericSVD.
All simulations were carried out with Intel octa core i7-4770 CPU 3.40GHz and 24 GB RAM.
The scripts reproducing several of the presented examples are directly available in the web-page: https://people.kth.se/~gmele/InfLan/
The disk of interest is set with center in the origin and radius 4 with \(N=1000\) discretization points and \(\text{ tol}_{\tiny \text{ res }}=\text{ tol}_{\tiny \text{ rank }}=10^{-8}\).
We perform 50 iteration of IAR for solving the projected NEP and return the converged eigenvalues.
We used the static variant with Leja-Bagby points automatically generated from the polygonal target. The shift is in zero and it is kept constant during the iterations.
References
Abdel-Rehim, A.M., Morgan, R.B., Nicely, D.A., Wilcox, W.: Deflated and restarted symmetric Lanczos methods for eigenvalues and linear equations with multiple right-hand sides. SIAM J. Sci. Comput. 32(1), 129–149 (2010)
Amiraslani, A., Corless, R.M., Lancaster, P.: Linearization of matrix polynomials expressed in polynomial bases. IMA J. Numer. Anal. 29(1), 141–157 (2008)
Antoniou, E.N., Vologiannidis, S.: A new family of companion forms of polynomial matrices. Electron. J. Linear Algebra 11(411), 78–87 (2004)
Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., van der Vorst, H.: Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, vol. 11. Siam, Philadelphia (2000)
Betcke, M., Voss, H.: Restarting iterative projection methods for Hermitian nonlinear eigenvalue problems with minmax property. Numer. Math. 135(2), 397–430 (2017)
Betcke, T., Higham, N.J., Mehrmann, V., Schröder, C., Tisseur, F.: NLEVP: a collection of nonlinear eigenvalue problems. ACM Trans. Math. Softw. 39(2), 7 (2013)
Betcke, T., Voss, H.: A Jacobi-Davidson-type projection method for nonlinear eigenvalue problems. Future Gener. Comput. Syst. 20(3), 363–372 (2004)
Beyn, W.J.: An integral method for solving nonlinear eigenvalue problems. Linear Algebra Appl. 436(10), 3839–3863 (2012)
Bezanson, J., Edelman, A., Karpinski, S., Shah, V.B.: Julia: a fresh approach to numerical computing. SIAM Rev. 59(1), 65–98 (2017)
Bueno, M., Curlett, K., Furtado, S.: Structured strong linearizations from Fiedler pencils with repetition i. Linear Algebra Appl. 460, 51–80 (2014)
Bueno, M., Dopico, F., Furtado, S., Medina, L.: A block-symmetric linearization of odd degree matrix polynomials with optimal eigenvalue condition number and backward error. Calcolo 55(3), 32 (2018)
Bueno, M., Furtado, S.: Structured strong linearizations from Fiedler pencils with repetition ii. Linear Algebra Appl. 463, 282–321 (2014)
Chen, H., Maeda, Y., Imakura, A., Sakurai, T., Tisseur, F.: Improving the numerical stability of the Sakurai-Sugiura method for quadratic eigenvalue problems. JSIAM Lett. 9, 17–20 (2017)
De Terán, F., Dopico, F.M., Mackey, D.S.: Fiedler companion linearizations and the recovery of minimal indices. SIAM J. Matrix Anal. Appl. 31(4), 2181–2204 (2010)
Effenberger, C.: Robust solution methods for nonlinear eigenvalue problems. Ph.D. Thesis, EPF Lausanne, Switzerland (2013)
Effenberger, C.: Robust successive computation of eigenpairs for nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. 34(3), 1231–1256 (2013)
Effenberger, C., Kressner, D.: Chebyshev interpolation for nonlinear eigenvalue problems. BIT Numer. Math. 52(4), 933–951 (2012)
Faßbender, H., Saltenberger, P.: Block Kronecker ansatz spaces for matrix polynomials. Linear Algebra Appl. 542, 118–148 (2018)
Gaaf, S.W., Jarlebring, E.: The infinite bi-Lanczos method for nonlinear eigenvalue problems. SIAM J. Sci. Comput. 39(5), S898–S919 (2017)
Gohberg, I., Lancaster, P., Rodman, L.: Matrix Polynomials. SIAM, Philadelphia (2009)
Güttel, S., Tisseur, F.: The nonlinear eigenvalue problem. Acta Numer. 26, 1–94 (2017)
Güttel, S., Van Beeumen, R., Meerbergen, K., Michiels, W.: NLEIGS: A class of fully rational Krylov methods for nonlinear eigenvalue problems. SIAM J. Sci. Comput. 36(6), A2842–A2864 (2014)
Hernandez, V., Roman, J.E., Vidal, V.: SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. ACM Trans. Math. Softw. 31(3), 351–362 (2005)
Higham, N.J., Mackey, D.S., Mackey, N., Tisseur, F.: Symmetric linearizations for matrix polynomials. SIAM J. Matrix Anal. Appl. 29(1), 143–159 (2006)
Jarlebring, E.: The spectrum of delay-differential equations: numerical methods, stability and perturbation. Ph.D. thesis, Inst. Comp. Math, TU Braunschweig (2008)
Jarlebring, E., Bennedich, M., Mele, G., Ringh, E., Upadhyaya, P.: NEP-PACK: a Julia package for nonlinear eigenvalue problems (2018). https://github.com/nep-pack
Jarlebring, E., Meerbergen, K., Michiels, W.: An Arnoldi method with structured starting vectors for the delay eigenvalue problem. IFAC Proc. Vol. 43(2), 57–62 (2010)
Jarlebring, E., Mele, G., Runborg, O.: The waveguide eigenvalue problem and the tensor infinite Arnoldi method. SIAM J. Sci. Comput. 39(3), A1062–A1088 (2017)
Jarlebring, E., Michiels, W., Meerbergen, K.: A linear eigenvalue algorithm for the nonlinear eigenvalue problem. Numer. Math. 122(1), 169–195 (2012)
Lancaster, P.: Lambda-Matrices and Vibrating Systems. Pergamon Press, Oxford (1966)
Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. B 45, 255–282 (1950). https://doi.org/10.6028/jres.045.026
Lawrence, P.W., Van Barel, M., Van Dooren, P.: Backward error analysis of polynomial eigenvalue problems solved by linearization. SIAM J. Matrix Anal. Appl. 37(1), 123–144 (2016)
Lietaert, P., Meerbergen, K., Tisseur, F.: Compact two-sided Krylov methods for nonlinear eigenvalue problems. SIAM J. Sci. Comput. 40(5), A2801–A2829 (2018)
Luk, F.T., Qiao, S.: A fast eigenvalue algorithm for Hankel matrices. Linear Algebra Appl. 316(1–3), 171–182 (2000)
Mackey, D.S., Mackey, N., Mehl, C., Mehrmann, V.: Vector spaces of linearizations for matrix polynomials. SIAM J. Matrix Anal. Appl. 28(4), 971–1004 (2006)
Mackey, D.S., Mackey, N., Mehl, C., Mehrmann, V.: Numerical methods for palindromic eigenvalue problems: computing the anti-triangular Schur form. Numer. Linear Algebra Appl. 16(1), 63–86 (2009)
Mackey, D.S., Mackey, N., Mehl, C., Mehrmanns, V.: Palindromic polynomial eigenvalue problems: Good vibrations from good linearizations. Tech. rep., DFG Research Center Matheon, “Mathematics for key technologies” in Berlin, TU Berlin, Berlin, Germany (2005). http://www.matheon.de/
Mackey, D.S., Mackey, N., Mehl, C., Mehrmanns, V.: Structured polynomial eigenvalue problems: good vibrations from good linearizations. SIAM J. Matrix Anal. Appl. 28(4), 1029–1051 (2006)
Mehrmann, V., Voss, H.: Nonlinear eigenvalue problems: a challenge for modern eigenvalue methods. GAMM-Mitteilungen 27(2), 121–152 (2004)
Mehrmann, V., Watkins, D.: Structure-preserving methods for computing eigenpairs of large sparse skew-Hamiltonian/Hamiltonian pencils. SIAM J. Sci. Comput. 22(6), 1905–1925 (2001)
Mehrmann, V., Watkins, D.: Polynomial eigenvalue problems with Hamiltonian structure. Electron. trans. numer. anal. 13, 106–118 (2002)
Michiels, W., Niculescu, S.I.: Stability and Stabilization of Time-Delay Systems: An Eigenvalue-Based Approach. SIAM, Philadelphia (2007)
Nour-Omid, B.: Applications of the Lanczos method. Comput. Phys. Commun. 53(1–3), 157–168 (1989)
Parlett, B.N.: The Symmetric Eigenvalue Problem, vol. 20. SIAM, Philadelphia (1998)
Parlett, B.N., Chen, H.C.: Use of indefinite pencils for computing damped natural modes. Linear Algebra Appl. 140, 53–88 (1990)
Parlett, B.N., Scott, D.S.: The Lanczos algorithm with selective orthogonalization. Math. Comp. 33(145), 217–238 (1979)
Robol, L., Vandebril, R., Dooren, P.V.: A framework for structured linearizations of matrix polynomials in various bases. SIAM J. Matrix Anal. Appl 38(1), 188–216 (2017)
Simon, H.D.: The Lanczos algorithm with partial reorthogonalization. Math. Comp. 42(165), 115–142 (1984)
Simoncini, V.: Computational methods for linear matrix equations. SIAM Rev. 58(3), 377–441 (2016)
Steinbach, O., Unger, G.: A boundary element method for the Dirichlet eigenvalue problem of the Laplace operator. Numer. Math. 113(2), 281–298 (2009)
Su, Y., Bai, Z.: Solving rational eigenvalue problems via linearization. SIAM J. Matrix Anal. Appl. 32(1), 201–216 (2011)
Van Beeumen, R.: Rational Krylov methods for nonlinear eigenvalue problems. Ph.D. thesis, KU Leuven (2015)
Van Beeumen, R., Meerbergen, K., Michiels, W.: Compact rational Krylov methods for nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl. 36(2), 820–838 (2015)
Voss, H.: An arnoldi method for nonlinear eigenvalue problems. BIT Numer. Math. 44(2), 387–401 (2004)
Wu, K., Simon, H.: Thick-restart Lanczos method for large symmetric eigenvalue problems. SIAM J. Matrix Anal. Appl. 22(2), 602–616 (2000)
Xue, F.: A block preconditioned harmonic projection method for large-scale nonlinear eigenvalue problems. SIAM J. Sci. Comput. 40(3), A1809–A1835 (2018)
Acknowledgements
The author wishes to thank Sarah Gaaf (Utrecht University) who suggested a first approach to extend Theorem 1 to the scaled companion linearization. The author also thanks Elias Jarlebring (KTH Royal Institute of Technology) who has provided feedback and help in the first stage of the project. The author gratefully acknowledge the support of the Swedish Research Council under Grant No. 621-2013-4640.
Funding
Open access funding provided by Royal Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mele, G. The infinite Lanczos method for symmetric nonlinear eigenvalue problems. Calcolo 60, 19 (2023). https://doi.org/10.1007/s10092-023-00511-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10092-023-00511-x