Abstract
We are concerned with accurate eigenvalue decomposition of a real symmetric matrix A. In the previous paper (Ogita and Aishima in Jpn J Ind Appl Math 35(3): 1007–1035, 2018), we proposed an efficient refinement algorithm for improving the accuracy of all eigenvectors, which converges quadratically if a sufficiently accurate initial guess is given. However, since the accuracy of eigenvectors depends on the eigenvalue gap, it is difficult to provide such an initial guess to the algorithm in the case where A has clustered eigenvalues. To overcome this problem, we propose a novel algorithm that can refine approximate eigenvectors corresponding to clustered eigenvalues on the basis of the algorithm proposed in the previous paper. Numerical results are presented showing excellent performance of the proposed algorithm in terms of convergence rate and overall computational cost and illustrating an application to a quantum materials simulation.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let A be a real symmetric \(n \times n\) matrix. Since solving a standard symmetric eigenvalue problem \(Ax = \lambda x\), where \(\lambda \in {\mathbb {R}}\) is an eigenvalue of A and \(x \in {\mathbb {R}}^{n}\) is an eigenvector of A associated with \(\lambda \), is ubiquitous in scientific computing, it is important to develop reliable numerical algorithms for calculating eigenvalues and eigenvectors accurately. Excellent overviews on the symmetric eigenvalue problem can be found in references [20, 23].
We are concerned with the eigenvalue decomposition of A such that
where \({X}\) is an \(n \times n\) orthogonal matrix whose ith columns are eigenvectors \(x_{(i)}\) of A (called an eigenvector matrix) and \({D} = (d_{ij})\) is an \(n \times n\) diagonal matrix whose diagonal elements are the corresponding eigenvalues \(\lambda _{i} \in {\mathbb {R}}\), i.e., \(d_{ii} = \lambda _{i}\) for \(i = 1, \ldots , n\). Throughout the paper, we assume that
and the columns of \({X}\) are ordered correspondingly.
We here collect notation used in this paper. Let I and O denote the identity matrix and the zero matrix of appropriate size, respectively. Unless otherwise specified, \(\Vert \cdot \Vert \) means \(\Vert \cdot \Vert _{2}\), which denotes the Euclidean norm for vectors and the spectral norm for matrices. For legibility, if necessary, we distinguish between the approximate quantities and the computed results, e.g., for some quantity \(\alpha \) we write \(\widetilde{\alpha }\) and \(\widehat{\alpha }\) as an approximation of \(\alpha \) and a computed result for \(\alpha \), respectively.
The accuracy of an approximate eigenvector depends on the gap between the corresponding eigenvalue and its nearest neighbor eigenvalue (cf., e.g., [20, Theorem 11.7.1]). For simplicity, suppose all eigenvalues of A are simple. Let \(\widehat{X} \in {\mathbb {R}}^{n \times n}\) be an approximation of \({X}\). Let \(z_{(i)}:=\widehat{x}_{(i)}/\Vert \widehat{x}_{(i)}\Vert \) for \(i = 1, 2, \ldots , n\), where \(\widehat{x}_{(i)}\) are the ith columns of \(\widehat{X}\). Moreover, for each i, suppose that the Ritz value \(\mu _{i}:=z_{(i)}^{\mathrm {T}}Az_{(i)}\) is closer to \(\lambda _{i}\) than to any other eigenvalues. Let \( gap (\mu _{i})\) denote the smallest difference between \(\mu _{i}\) and any other eigenvalue, i.e., \( gap (\mu _{i}) := \min _{j \ne i}\mu _{i}\lambda _{j}\). Then, it holds for all i that
Suppose \(\widehat{X}\) is obtained by some backward stable algorithm with the relative rounding error unit \({\mathbf {u}}\) in floatingpoint arithmetic. For example, \({\mathbf {u}}= 2^{53}\) for IEEE 754 binary64. Then, there exists \(\varDelta _{A}^{(i)}\) such that
which implies \(\Vert Az_{(i)}\mu _{i}z_{(i)}\Vert = {\mathcal {O}}(\Vert A\Vert {\mathbf {u}})\), and hence, for all i,
The smaller the eigenvalue gap, the worse the accuracy of a computed eigenvector. Therefore, refinement algorithms for eigenvectors are useful for obtaining highly accurate results. For example, highly accurate computations of a few or all eigenvectors are crucial for largescale electronic structure calculations in material physics [24, 25], in which specific interior eigenvalues with associated eigenvectors need to be computed. On related work on refinement algorithms for symmetric eigenvalue decomposition, see the previous paper [17] for details.
In [17], we proposed a refinement algorithm for the eigenvalue decomposition of A, which works not for an individual eigenvector but for all eigenvectors. Since the algorithm is based on Newton’s method, it converges quadratically, provided that an initial guess is sufficiently accurate. In practice, although the algorithm refines computed eigenvectors corresponding to sufficiently separated simple eigenvalues, it cannot refine computed eigenvectors corresponding to “nearly” multiple eigenvalues. This is because it is difficult for standard numerical algorithms in floatingpoint arithmetic to provide sufficiently accurate initial approximate eigenvectors corresponding to nearly multiple eigenvalues as shown in (2). The purpose of this paper is to remedy this problem, i.e., we aim to develop a refinement algorithm for the eigenvalue decomposition of a symmetric matrix with clustered eigenvalues.
We briefly explain the idea of our proposed algorithm. We focus on the socalled \(\sin \theta \) theorem by Davis–Kahan [5, Section 2] as follows. For an index set \({\mathcal {J}}\) with \({\mathcal {J}}=\ell < n\), let \(X_{{\mathcal {J}}} \in {\mathbb {R}}^{n\times \ell }\) denote the eigenvector matrix comprising \(x_{(j)}\) for all \(j \in {\mathcal {J}}\). For \(1\le k \le \ell \), let \(\mu _{k}\) denote the Ritz values for the subspace spanned by some given vectors with \(\mu _{1}\le \cdots \le \mu _{\ell }\), and let \(z_{k}\) be the corresponding normalized Ritz vectors. Assume that the eigenvalues \(\lambda _{i}\) for all \(i \not \in {\mathcal {J}}\) are entirely outside of \([\mu _{1},\mu _{\ell }]\). Let \( Gap \) denote the smallest difference between the Ritz values \(\mu _{k}\) for all k, \(1\le k \le \ell \), and the eigenvalues \(\lambda _{i}\) for all \(i \not \in {\mathcal {J}}\), i.e., \( Gap := \min \{\mu _{k}  \lambda _{i}~:~1 \le k \le \ell , \ i \not \in {\mathcal {J}}\}\). Moreover, let \(Z_{{\mathcal {J}}}:=[z_{1},\ldots , z_{\ell }] \in {\mathbb {R}}^{n\times \ell }\). Then, we obtain
This indicates that the subspace spanned by eigenvectors associated with the clustered eigenvalues is not very sensitive to perturbations, provided that the gap between the clustered eigenvalues and the others is sufficiently large. That means backward stable algorithms can provide a sufficiently accurate initial guess of the “subspace” corresponding to the clustered eigenvalues. To extract eigenvectors from the subspace correctly, relatively larger gaps are necessary between the clustered eigenvalues as can be seen from (2). Thus, we first apply the algorithm (Algorithm 1: \(\mathsf {RefSyEv}\)) in the previous paper [17] to the initial approximate eigenvector matrix for improving the subspace corresponding to the clustered eigenvalues. Then, we divide the entire problem into subproblems, each of which corresponds to each cluster of eigenvalues. Finally, we expand eigenvalue gaps in each subproblem by using a diagonal shift and compute eigenvectors of each subproblem, which can be used for refining approximate eigenvectors corresponding to clustered eigenvalues in the entire problem.
One might notice that the above procedure is similar to the classical shiftinvert technique to transform eigenvalue distributions. In addition, the MRRR algorithm [6] also employs a shift strategy to increase relative gaps between clustered eigenvalues for computing the associated eigenvectors. In other words, it is well known that the diagonal shift is useful for solving eigenvalue problems accurately. Our contribution is to show its effectiveness on the basis of appropriate error analysis with the adaptive use of higher precision arithmetic, which leads to the derivation of the proposed algorithm.
In the same spirit of the previous paper [17], our proposed algorithm primarily comprises matrix multiplication, which accounts for the majority of the computational cost. Therefore, we can utilize higher precision matrix multiplication efficiently. For example, XBLAS [13] and other efficient algorithms [16, 19, 22] based on socalled errorfree transformations for accurate matrix multiplication are available for practical implementation.
The remainder of the paper is organized as follows. In Sect. 2, we recall the refinement algorithm (Algorithm 1) proposed in the previous paper [17] together with its convergence theory. For practical use, we present a rounding error analysis of Algorithm 1 in finite precision arithmetic in Sect. 3, which is useful for setting working precision and shows achievable accuracy of approximate eigenvectors obtained by using Algorithm 1. In Sect. 4, we show the behavior of Algorithm 1 for clustered eigenvalues, which explains the effect of nearly multiple eigenvalues on computed results and leads to the derivation of the proposed algorithm. On the basis of Algorithm 1, we propose a refinement algorithm (Algorithm 2: \(\mathsf {RefSyEvCL}\)) that can also be applied to matrices with clustered eigenvalues in Sect. 5. In Sect. 6, we present some numerical results showing the behavior and performance of the proposed algorithm together with an application to a quantum materials simulation as a realworld problem.
For simplicity, we basically handle only real matrices. As mentioned in the previous paper [17], the discussions in this paper can also be extended to generalized symmetric (Hermitian) definite eigenvalue problems.
2 Basic algorithm and its convergence theory
In this section, we introduce the refinement algorithm proposed in the previous paper [17], which is the basis of the algorithm proposed in this paper.
Let \(A = A^{\mathrm {T}} \in {\mathbb {R}}^{n \times n}\). The eigenvalues of A are denoted by \(\lambda _{i} \in {\mathbb {R}}\), \(i = 1, \ldots , n\). Then \(\Vert A\Vert = \max _{1 \le i \le n}\lambda _{i} = \max (\lambda _{1},\lambda _{n})\). Let \({X} \in {\mathbb {R}}^{n \times n}\) denote an orthogonal eigenvector matrix comprising normalized eigenvectors of A, and let \(\widehat{X}\) denote an approximation of \({X}\) with \(\widehat{X}\) being nonsingular. In addition, define \(E \in {\mathbb {R}}^{n \times n}\) such that
In the previous paper, we presented the following algorithm for the eigenvalue decomposition of A, which is designed to be applied iteratively. For later use in Sect. 5, the algorithm also allows the case where an input \(\widehat{X}\) is rectangular, i.e., \(\widehat{X} \in {\mathbb {R}}^{n \times \ell }\), \(\ell < n\).
In [17, Theorem 1], we presented the following theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given \(\widehat{X}\) is sufficiently close to X.
Theorem 1
(Ogita–Aishima [17]) Let A be a real symmetric \(n \times n\) matrix with simple eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix \(X \in {\mathbb {R}}^{n\times n}\). For a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in real arithmetic, and \(X'\) is the quantity calculated in Algorithm 1. Define E and \(E'\) such that \(X=\widehat{X}(I+E)\) and \(X=X'(I+E')\), respectively. If
then we have
In the following, we review the discussion in [17, §3.2] for exactly multiple eigenvalues. If \(\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}\) corresponding to multiple eigenvalues \(\lambda _{i}=\lambda _{j}\), we compute \(\widetilde{e}_{ij}=\widetilde{e}_{ji}={r}_{ij}/2\) for (i, j) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j}\le \omega \).
To investigate the above exceptional process, define the index sets \({\mathcal {M}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {M}}}\), for multiple eigenvalues \(\{ {\lambda }_{i} \}_{i \in {\mathcal {M}}_{k}}\) satisfying the following conditions:
Note that the eigenvectors corresponding to multiple eigenvalues are not unique. Hence, using the above index sets, let Y be an eigenvector matrix defined such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{1}Y\) corresponding to \(\{\lambda _{i}\}_{i \in {\mathcal {M}}_{k}}\) are symmetric and positive definite. Since then Y is unique, define F such that \(Y=\widehat{X}(I+F)\). Define \(R:=I\widehat{X}^{\mathrm {T}}\widehat{X}\) and \(S:=\widehat{X}^{\mathrm {T}}A\widehat{X}\). Then, using the orthogonality \(Y^{\mathrm {T}}Y = I\) and the diagonality \(Y^{\mathrm {T}}AY = D\), we have
where \(\epsilon := \Vert F\Vert \) and
The above equations can be obtained in the same manner as in our previous paper [17, Eqs. (7) and (11)] by replacing E with F in the equations.
In a similar way to Newton’s method (cf. e.g., [3, p. 236]), dropping the second order terms in (7) and (8) yields Algorithm 1, and the next convergence theorem is provided [17, Theorem 2].
Theorem 2
(Ogita–Aishima [17]) Let A be a real symmetric \(n \times n\) matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \({\mathcal {M}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {M}}}\), satisfying (6). Let \({\mathcal {V}}\) be the set of \(n \times n\) orthogonal eigenvector matrices of A. For a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in real arithmetic, and \(X'\) and \(\omega \) are the quantities calculated in Algorithm 1. Let \(Y, Y' \in {\mathcal {V}}\) be defined such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{1}Y\) and \((X')^{1}Y'\) corresponding to \(\{\lambda _{i}\}_{i \in {\mathcal {M}}_{k}}\) are symmetric and positive definite. Define F and \(F'\) such that \(Y=\widehat{X}(I+F)\) and \(Y'=X'(I+F')\), respectively. Furthermore, suppose that
Then, we obtain
On the basis of the above convergence theorems, let us consider the iterative refinement using Algorithm 1:
Then, \(X^{(\nu + 1)} = X^{(\nu )}(I + \widetilde{E}^{(\nu )})\) for \(\nu = 0, 1, \ldots \), where \(\widetilde{E}^{(\nu )} = (\widetilde{e}_{ij}^{(\nu )})\) are the quantities calculated in line 7 of Algorithm 1. In practice, it is likely that ordinary precision floatingpoint arithmetic, such as IEEE 754 binary32 or binary64, is used for calculating an approximation \(\widehat{X}\) to an eigenvector matrix \({X}\) of a given symmetric matrix A by some backward stable algorithm. It is natural to use such \(\widehat{X}\) as an initial guess \(X^{(0)}\) in Algorithm 1. However, if A has nearly multiple eigenvalues, it is difficult to obtain a sufficiently accurate \(X^{(0)}\) in ordinary precision floatingpoint arithmetic such that Algorithm 1 works well. To overcome this problem, we develop a practical algorithm for clustered eigenvalues, which is proposed as Algorithm 2 in Sect. 5.
3 Rounding error analysis for basic algorithm
If Algorithm 1 is performed in finite precision arithmetic with the relative rounding error unit \({\mathbf {u}}_{h}\), the accuracy of a refined eigenvector matrix \(X'\) is restricted by \({\mathbf {u}}_{h}\). Since \(\widehat{X}\) is improved quadratically when using real arithmetic, \({\mathbf {u}}_{h}\) must correspond to \(\Vert E\Vert ^{2}\) to preserve the convergence property of Algorithm 1. We explain the details in the following. For simplicity, we consider the real case. The extension to the complex case is obvious.
Let \({\mathbb {F}}_{h}\) be a set of floatingpoint numbers with the relative rounding error unit \({\mathbf {u}}_{h}\). We define the rounding operator \( fl _{h}\) such that \( fl _{h}: {\mathbb {R}}\rightarrow {\mathbb {F}}_{h}\) and assume the use of the following standard floatingpoint arithmetic model [11]. For \(a, b \in {\mathbb {F}}_{h}\) and \(\circ \in \{ +, , \times , / \}\), it holds that
For example, it is satisfied in IEEE 754 floatingpoint arithmetic barring overflow and underflow.
Suppose all elements of A and \(\widehat{X}\) are exactly representable in \({\mathbb {F}}_{h}\), i.e., \(A, \widehat{X} \in {\mathbb {F}}_{h}^{n \times n}\), and \(\Vert \widehat{x}_{(i)}\Vert \approx 1\) for all i. Let \(\widehat{R}\), \(\widehat{S}\), and \(\widehat{E}\) denote the computed results of R, S, and \(\widetilde{E}\) in Algorithm 1, respectively. Define \(\varDelta _{R}\), \(\varDelta _{S}\), and \(\varDelta _{E}\) such that
From a standard rounding error analysis as in [11], we obtain
For the computed results \(\widehat{\lambda }_{i}\) of \(\widetilde{\lambda }_{i}\), \(i = 1, 2, \ldots , n\), in Algorithm 1,
For all (i, j) satisfying \(\widehat{\lambda }_{i}  \widehat{\lambda }_{j} > \widehat{\omega }\), where \(\widehat{\omega }\) is an approximation of \(\omega \) computed in floatingpoint arithmetic in Algorithm 1,
Then
For other (i, j), we have
Then
In summary, we obtain
where \(\beta \) is the reciprocal of the minimum gap between the eigenvalues normalized by \(\Vert A\Vert \). For the computed result \(\widehat{X}'\) of \(X' = \widehat{X}(I + \widetilde{E})\) in Algorithm 1,
and, using (10),
Thus, if a given \(\widehat{X}\) is sufficiently close to X in such a way that the assumption (3) holds, combining (5) and (11) yields
If A has nearly multiple eigenvalues and (3) does not hold, then the convergence of Algorithm 1 to an eigenvector matrix of A is guaranteed neither in real arithmetic nor in finite precision arithmetic regardless of the value of \({\mathbf {u}}_{h}\). We will deal with such an illconditioned case in Sect. 5.
Remark 1
As can be seen from (12), with a fixed \({\mathbf {u}}_{h}\), iterative use of Algorithm 1 eventually computes an approximate eigenvector matrix that is accurate to \({\mathcal {O}}(\beta {\mathbf {u}}_{h})\), provided that the assumption (3) in Theorem 1 holds in each iteration. This will be confirmed numerically in Sect. 6. \(\square \)
Let us consider the most likely scenario where \(\widehat{X}\) is computed by some backward stable algorithm in ordinary precision floatingpoint arithmetic with the relative rounding error unit \({\mathbf {u}}\). From (2), we have
under the assumption that \(\beta \approx \Vert A\Vert /\min _{1 \le i \le n} gap (\mu _{i})\). Thus,
From (12), we obtain
Therefore, \({\mathbf {u}}_{h}\) should be less than \(\beta ^{2}{\mathbf {u}}^{2}\) in order to preserve convergence speed for the first iteration by Algorithm 1.
Suppose that \(\Vert \widehat{X}  X\Vert = c\beta {\mathbf {u}}\) and \(\Vert \widehat{X}'  X\Vert = c'\beta ^{3}{\mathbf {u}}^{2}\) where c and \(c'\) are some constants. If \(c''\beta ^{2}{\mathbf {u}}< 1\) for \(c'' := c'/c\), then an approximation of X is improved in the sense that \(\Vert \widehat{X}'  X\Vert < \Vert \widehat{X}  X\Vert \). In other words, if \(\beta \) is too large such that \(c''\beta ^{2}{\mathbf {u}}\ge 1\), Algorithm 1 may not work well.
In general, define \(E^{(\nu )} \in {\mathbb {R}}^{n \times n}\) such that \(X = \widehat{X}^{(\nu )}(I + E^{(\nu )})\) for \(\nu = 0, 1, \ldots \), where \(\widehat{X}^{(0)}\) is an initial guess and \(\widehat{X}^{(\nu )}\) is a result of the \(\nu \)th iteration of Algorithm 1 with working precision \({\mathbf {u}}_{h}^{(\nu )}\) for \(\nu = 1, 2, \ldots \). To preserve the convergence speed, we need to set \({\mathbf {u}}_{h}^{(\nu )}\) satisfying \({\mathbf {u}}_{h}^{(\nu )} < \Vert E^{(\nu  1)}\Vert ^{2}\) as can be seen from (12). Although we do not know \(\Vert E^{(\nu  1)}\Vert \), we can estimate \(\Vert E^{(\nu  1)}\Vert \) by \(\Vert \widetilde{E}^{(\nu  1)}\Vert \) where \(\widetilde{E}^{(\nu  1)}\) is computed at the (\(\nu  1\))st iteration of Algorithm 1.
4 Effect of nearly multiple eigenvalues in basic algorithm
In general, a given matrix A in floatingpoint format does not have exactly multiple eigenvalues. It is necessary to discuss the behavior of Algorithm 1 for A with some nearly multiple eigenvalues \(\lambda _{i}\approx \lambda _{j}\) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j}\le \omega \) in line 7. We basically discuss the behavior in real arithmetic. The effect of the rounding error is briefly explained in Remark 2 at the end of this section.
For simplicity, we assume \(\widetilde{\lambda }_{1} \le \widetilde{\lambda }_{2} \le \cdots \le \widetilde{\lambda }_{n}\). In the following analysis, define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) with
which means that the clustered eigenvalues of \(A_{\omega }\) are all multiple in each cluster. Then, \(A_{\omega }\) is a perturbed matrix such that
Throughout this section, we assume that
Importantly, although each individual eigenvector associated with the nearly multiple eigenvalues is very sensitive to perturbations, the subspace spanned by such eigenvectors is not sensitive. Thus, \(\widehat{X}\) computed by a backward stable algorithm is sufficiently close to an eigenvector matrix of \(A_{\omega }\). Below, we show that Algorithm 1 computes \(\widetilde{E}\) that approximates an exact eigenvector matrix \(Y_{\omega }\) defined in the same manner as Y in Sect. 2. Note that \(Y_{\omega }\) is the eigenvector matrix of the above \(A_{\omega }\) close to A, where \(A_{\omega }\) has exactly multiple eigenvalues.
Recall that the submatrices of \(\widehat{X}^{1}Y_{\omega }\) corresponding to the multiple eigenvalues of \(A_{\omega }\) are symmetric and positive definite. Then, we see that Algorithm 1 computes an approximation of \(Y_{\omega }\) as follows. Define R and \(S_{\omega }\) as
corresponding to \(A_{\omega }\). We see \(S_{\omega }\) is considered a perturbed matrix of \(S := \widehat{X}^{\mathrm {T}}A\widehat{X}\). Note that, in Algorithm 1, \(\widetilde{E}\) is computed with R and S. Here, we introduce an ideal matrix \(\widetilde{E}_{\omega }\) computed with R and \(S_{\omega }\), where \(\widetilde{E}_{\omega }\) is quadratically convergent to \(F_{\omega }\). In the following, we estimate \(\widetilde{E}_{\omega }\widetilde{E}\) due to the above perturbation. To this end, we estimate each element of \(S_{\omega }S\) as in the following lemma.
Lemma 1
Let A be a real symmetric \(n \times n\) matrix with eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix X. In Algorithm 1, for a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) as in (14), and
In addition, define \(S_{\omega }:=\widehat{X}^{\mathrm {T}}A_{\omega }\widehat{X}\). Then, we have
for all (i, j), where
Proof
Define Q such that \(X=Y_{\omega }Q\). Then, Q is a block diagonal matrix. More precisely, for \(Q=(q_{ij})\), we have
It is easy to see that
Let \(D_{Q}:=Q(DD_{\omega })Q^{\mathrm {T}}\). In a similar way to (8), we have
where
Then (17) follows. Moreover, noting \(D_{Q}\) is a block diagonal matrix, we obtain (18). \(\square \)
For the perturbation analysis of \(\widetilde{E}\), the next lemma is crucial.
Lemma 2
Let A be a real symmetric \(n \times n\) matrix with eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix X. In Algorithm 1, for a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) as in (14). Assume that (15) is satisfied. Define \(R = (r_{ij})\) and \(S_{\omega } = (s^{(\omega )}_{ij})\) such that \(R:=I\widehat{X}^{\mathrm {T}}\widehat{X}\) and \(S_{\omega }:=\widehat{X}^{\mathrm {T}}A_{\omega }\widehat{X}\). Suppose positive numbers \(\omega _{1}\) and \(\omega _{2}\) satisfy
We assume that, for all (i, j) in line 7 of Algorithm 1, the formulas of \(\widetilde{e}^{(\omega )}_{ij}\) are the same as those of \(\widetilde{e}_{ij}\), i.e.,
where \(\widetilde{\lambda }^{(\omega )}_{i}=s^{(\omega )}_{ii}/(1r_{ii})\) for \(i = 1, 2, \ldots , n\), as in line 4. Moreover, let
Then, for (i, j) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j}\le \omega \), we have
Moreover, for (i, j) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j}> \omega \), we have
Proof
For (i, j) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j}\le \omega \), since we see
we have (20). Next, for \(\widetilde{\lambda }^{(\omega )}_{i}\), \(i=1,\ldots ,n\), we have
Thus, from (19), we have
For (i, j) such that \(\widetilde{\lambda }_{i}\widetilde{\lambda }_{j} > \omega \), since we see
we evaluate the errors based on the following inequalities:
In the righthand side, using (22) and (23), we see
Therefore, we obtain (21). \(\square \)
In the following, we estimate \(\omega _{1}\) and \(\omega _{2}\) in Lemma 2. In the righthand sides of (17) and (18) in Lemma 1, we see
If D, F, S in (8) are replaced with \(D_{\omega },F_{\omega },S_{\omega }\), respectively, we see \(s^{(\omega )}_{ij}={\mathcal {O}}(\Vert A_{\omega }\Vert \epsilon _{\omega })\ (i\not =j)\) as \(\epsilon _{\omega } \rightarrow 0\). In addition, \(r_{ij}={\mathcal {O}}(\epsilon _{\omega })\ (i\not =j)\) as \(\epsilon _{\omega } \rightarrow 0\) from (7). Hence, letting \(\omega _{1}=\delta _{\omega }\) and \(\omega _{2}=2\epsilon _{\omega }\delta _{\omega }\) in Lemma 2, we obtain
Since we suppose \(\delta _{\omega }={\mathcal {O}}(\Vert A\Vert \epsilon _{\omega })\) in the situation where \(\widehat{X}\) is computed by a backward stable algorithm, we have
Therefore, \(\widetilde{E}\) is sufficiently close to \(\widetilde{E}_{\omega }\) and \(F_{\omega }\) under the above mild assumptions. Although \(Y_{\omega }\) can be very far from any eigenvector matrix of A, the subspace spanned by the columns of \(Y_{\omega }\) corresponding to the clustered eigenvalues adequately approximates that by the exact eigenvectors whenever \(\Vert A_{\omega }A\Vert \) is sufficiently small. In the following, we derive an algorithm for clustered eigenvalues using such an important feature.
Remark 2
In this section, we proved that \(\widetilde{E}\) is sufficiently close to \(F_{\omega }\) under the mild assumptions. In Sect. 3, the effect of rounding errors on \(\widetilde{E}\) is evaluated as in (10), i.e., \(\varDelta _{E}:=\widehat{E}\widetilde{E}\) is sufficiently small, where \(\widehat{E}\) is computed in finite precision arithmetic. The rounding error analysis is not caused by the perturbation analysis to \(F_{\omega }\) in this section. Thus, it is easy to see that \(\Vert \widehat{E}F_{\omega }\Vert \le \Vert \widehat{E}F_{\omega }\Vert +\Vert \widehat{E}F_{\omega }\Vert \) simply holds for the individual estimation for each error \(\Vert \widehat{E}F_{\omega }\Vert \) and \(\Vert \widehat{E}F_{\omega }\Vert \) respectively, and hence, the computed \(\widehat{E}\) is sufficiently close to \(F_{\omega }\) corresponding to \(A_{\omega }\). \(\square \)
5 Proposed algorithm for nearly multiple eigenvalues
On the basis of the basic algorithm (Algorithm 1), we propose a practical version of an algorithm for improving the accuracy of computed eigenvectors of symmetric matrices that can also deal with nearly multiple eigenvalues.
Recall that, in Algorithm 1, we choose \(\widetilde{e}_{ij}\) for all (i, j) as
where \(\omega \) is defined in line 6 of the algorithm.
5.1 Observation
First, we show the drawback of Algorithm 1 concerning clustered eigenvalues. For this purpose, we take
as an example, where \(\lambda _{2}\) and \(\lambda _{3}\) are nearly double eigenvalues for small \(\varepsilon \). We set \(\varepsilon = 2^{50} \approx 10^{15}\) and adopt the MATLAB builtin function \(\mathsf {eig}\) in IEEE 754 binary64 arithmetic to obtain \(X^{(0)} := \widehat{X}\). Then, we apply Algorithm 1 iteratively to A and \(X^{(\nu )}\) beginning from \(\nu = 0\). To check the accuracy of \(X^{(\nu )}\) with respect to orthogonality and diagonality, we display \(R^{(\nu )} := I  (X^{(\nu )})^{\mathrm {T}}X^{(\nu )}\) and \(S^{(\nu )} := (X^{(\nu )})^{\mathrm {T}}AX^{(\nu )}\).
For \(X^{(0)}\) obtained by \(\mathsf {eig}\), we obtain the following results.
The following shows the results of two iterations of Algorithm 1 in real arithmetic.
In the first iteration, \(\widetilde{\lambda }_{2}  \widetilde{\lambda }_{3} \approx 1.77 \cdot 10^{15}\) and \(\omega \approx 2.17 \cdot 10^{15}\), so that \(\widetilde{\lambda }_{2}  \widetilde{\lambda }_{3} < \omega \) and Algorithm 1 regards \(\widetilde{\lambda }_{2}\) and \(\widetilde{\lambda }_{3}\) as clustered eigenvalues. Then, the diagonality corresponding to \(\lambda _{2}\) and \(\lambda _{3}\) is not improved due to the choice (24), while the orthogonality of \(X^{(1)}\) is refined due to the choice (25). In the second iteration, \(\widetilde{\lambda }_{2}  \widetilde{\lambda }_{3} \approx 1.77 \cdot 10^{15}\) and \(\omega \approx 1.66 \cdot 10^{16}\), so that \(\widetilde{\lambda }_{2}  \widetilde{\lambda }_{3} > \omega \) and Algorithm 1 regards \(\widetilde{\lambda }_{2}\) and \(\widetilde{\lambda }_{3}\) as separated eigenvalues. However, \(\Vert E\Vert \approx 4.69\cdot 10^{2} > 1/100\), i.e., the assumption (3) in Theorem 1 is not satisfied. As a result, the orthogonality of \(X^{(2)}\) corresponding to \(\lambda _{2}\) and \(\lambda _{3}\) is badly broken, and the refinement of the diagonality stagnates with respect to the nearly double eigenvalues \(\lambda _{2}\) and \(\lambda _{3}\).
In the following, we overcome such a problem for general symmetric matrices.
5.2 Outline of the proposed algorithm
As mentioned in Sect. 1, the \(\sin \theta \) theorem by Davis–Kahan suggests that backward stable algorithms can provide a sufficiently accurate initial guess of a subspace spanned by eigenvectors associated with clustered eigenvalues for each cluster. We explain how to refine approximate eigenvectors by extracting them from the subspace correctly.
Suppose that Algorithm 1 is applied to \(A = A^{\mathrm {T}} \in {\mathbb {R}}^{n \times n}\) and its approximate eigenvector matrix \(\widehat{X} \in {\mathbb {R}}^{n \times n}\). Then, we obtain \(X'\), \(\widetilde{\lambda }\), and \(\omega \) where \(X' \in {\mathbb {R}}^{n \times n}\) is a refined approximate eigenvector matrix, \(\widetilde{\lambda }_{i}\), \(i = 1, 2, \ldots , n\), are approximate eigenvalues, and \(\omega \in {\mathbb {R}}\) is the criterion that determines whether \(\widetilde{\lambda }_{i}\) are clustered. Using \(\widetilde{\lambda }\) and \(\omega \), we can easily obtain the index sets \({\mathcal {J}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\), for the clusters \(\{\widetilde{\lambda }_{i}\}_{i \in {\mathcal {J}}_{k}}\) of the approximate eigenvalues satisfying all the following conditions (see also Fig. 1).
Now the problem is how to refine \(X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\), which denotes the matrix comprising approximate eigenvectors corresponding to the clustered approximate eigenvalues \(\{\widetilde{\lambda }_{i}\}_{i \in {\mathcal {J}}_{k}}\).
From the observation about the numerical results in the previous section, we develop the following procedure for the refinement.

1.
Find clusters of approximate eigenvalues of A and obtain the index sets \({\mathcal {J}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\) for those clusters.

2.
Define \(V_{k} := X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\) where \(n_{k} := {\mathcal {J}}_{k}\).

3.
Compute \(T_{k} = V_{k}^{\mathrm {T}}(A  \mu _{k}I)V_{k}\) where \(\mu _{k} := (\min _{i \in {\mathcal {J}}_{k}}\widetilde{\lambda }_{i} + \max _{i \in {\mathcal {J}}_{k}}\widetilde{\lambda }_{i})/2\).

4.
Perform the following procedure for each \(T_{k} \in {\mathbb {R}}^{n_{k} \times n_{k}}\).

(i)
Compute an eigenvector matrix \(W_{k}\) of \(T_{k}\).

(ii)
Update \(X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\) by \(V_{k}W_{k}\).

(i)
This procedure is interpreted as follows. We first apply an approximate similarity transformation to A using a refined eigenvector matrix \(X'\), such as \(S' := (X')^{\mathrm {T}}AX'\). Then, we divide the problem for \(S' \in {\mathbb {R}}^{n \times n}\) into subproblems for \(S'_{k} \in {\mathbb {R}}^{n_{k} \times n_{k}}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\), corresponding to the clusters. We then apply a diagonal shift to \(S'_{k}\), such as \(T_{k} := S'_{k}  \mu _{k}I\) to relatively separate the clustered eigenvalues around \(\mu _{k}\). Rather than using these to obtain \(T_{k}\), we perform steps 2 and 3 in view of computational efficiency and accuracy. Finally, we update the columns of \(X'\) corresponding to \({\mathcal {J}}_{k}\) using an eigenvector matrix \(W_{k}\) of \(T_{k}\) by \(V_{k}W_{k}\).
5.3 Proposed algorithm
Here, we present a practical version of a refinement algorithm for eigenvalue decomposition of a real symmetric matrix A, which can also be applied to the case where A has clustered eigenvalues.
In Algorithm 2, the function \({\mathsf {f}}{\mathsf {l}}(C)\) rounds an input matrix \(C \in {\mathbb {R}}^{n \times n}\) to a matrix \(T \in {\mathbb {F}}^{n \times n}\), where \({\mathbb {F}}\) is a set of floatingpoint numbers in ordinary precision, such as the IEEE 754 binary64 format. Here, “roundtonearest” rounding is not required; however, some faithful rounding, such as chopping, is desirable. Moreover, the function \(\mathsf {eig}(T)\) is similar to the MATLAB function, which computes all approximate eigenvectors of an input matrix \(T \in {\mathbb {F}}^{n \times n}\) in working precision arithmetic. This is expected to adopt some backward stable algorithm as implemented in the LAPACK routine xSYEV [2]. From lines 13–17 in Algorithm 2, we aim to obtain sufficiently accurate approximate eigenvectors \(X'(:,{\mathcal {J}}_{k})\) of A, where the columns of \(X'(:,{\mathcal {J}}_{k})\) correspond to \({\mathcal {J}}_{k}\). For this purpose, we iteratively apply Algorithm 1 (\(\mathsf {RefSyEv}\)) to \(A  \mu _{k}I\) and \(V_{k}^{(\nu )}\) until \(V_{k}^{(\nu )}\) for some \(\nu \) becomes as accurate as other eigenvectors associated with wellseparated eigenvalues. Note that the spectral norms \(\Vert \widetilde{E}\Vert _{2}\) and \(\Vert \widetilde{E}_{k}\Vert _{2}\) can be replaced by the Frobenius norms \(\Vert \widetilde{E}\Vert _{\mathrm {F}}\) and \(\Vert \widetilde{E}_{k}\Vert _{\mathrm {F}}\).
For the example (26), we apply Algorithm 2 (\(\mathsf {RefSyEvCL}\)) to A and the same initial guess \(X^{(0)}\) as before. The results of two iterations are as follows.
Thus, Algorithm 2 works well for this example, i.e., the approximate eigenvectors corresponding to the nearly double eigenvalues \(\lambda _{2}\) and \(\lambda _{3}\) are improved in terms of both orthogonality and diagonality.
Remark 3
For a generalized symmetric definite eigenvalue problem \(Ax = \lambda Bx\) where A and B are real symmetric with B being positive definite, we can modify the algorithms as follows.

In Algorithm 1 called at line 2 in Algorithm 2, replace \(R \leftarrow I  \widehat{X}^{\mathrm {T}}\widehat{X}\) with \(R \leftarrow I  \widehat{X}^{\mathrm {T}}B\widehat{X}\).

Replace \(A_{k} \leftarrow A  \mu _{k}I\) with \(A_{k} \leftarrow A  \mu _{k}B\) in line 8 of Algorithm 2.
Note that B does not appear in Algorithm 1 called at line 14 in Algorithm 2. \(\square \)
6 Numerical results
We present numerical results to demonstrate the effectiveness of the proposed algorithm (Algorithm 2: RefSyEvCL). All numerical experiments discussed in this section were conducted using MATLAB R2016b on our workstation with two CPUs (3.0 GHz Intel Xeon E52687W v4 (12 cores)) and 1 TB of main memory, unless otherwise specified. Let \({\mathbf {u}}\) denote the relative rounding error unit (\({\mathbf {u}}= 2^{24}\) for IEEE binary32 and \({\mathbf {u}}= 2^{53}\) for binary64). To realize multipleprecision arithmetic, we adopt Advanpix Multiprecision Computing Toolbox version 4.2.3 [1], which utilizes wellknown, fast, and reliable multipleprecision arithmetic libraries including GMP and MPFR. We also use the multipleprecision arithmetic with sufficiently long precision to simulate real arithmetic. In all cases, we use the MATLAB function norm for computing the spectral norms \(\Vert R\Vert \) and \(\Vert S  \widetilde{D}\Vert \) in Algorithm 1 in binary64 arithmetic, and we approximate \(\Vert A\Vert \) by \(\max (\widetilde{\lambda }_{1},\widetilde{\lambda }_{n})\). We discuss numerical experiments for some dozens of seeds for the random number generator, and all results are similar to those provided in this section. Therefore, we adopt the default seed as a typical example using the MATLAB command rng(‘default’) to ensure reproducibility of problems.
6.1 Convergence property
Here, we confirm the convergence property of the proposed algorithm for various eigenvalue distributions.
6.1.1 Various eigenvalue distributions
In the same way as the previous paper [17], we again generate real symmetric and positive definite matrices using the MATLAB function randsvd from Higham’s test matrices [11] by the following MATLAB command.
The eigenvalue distribution and condition number of A can be controlled by the input arguments \(\texttt {mode} \in \{1,2,3,4,5\}\) and \(\texttt {cnd} =: \alpha \ge 1\), as follows:

1.
one large: \(\lambda _{1} \approx 1\), \(\lambda _{i} \approx \alpha ^{1}\), \(i = 2,\ldots ,n\)

2.
one small: \(\lambda _{n} \approx \alpha ^{1}\), \(\lambda _{i} \approx 1\), \(i = 1,\ldots ,n1\)

3.
geometrically distributed: \(\lambda _{i} \approx \alpha ^{(i  1)/(n  1)}\), \(i = 1,\ldots ,n\)

4.
arithmetically distributed: \(\lambda _{i} \approx 1  (1  \alpha ^{1})(i  1)/(n  1)\), \(i = 1,\ldots ,n\)

5.
random with uniformly distributed logarithm: \(\lambda _{i} \approx \alpha ^{r(i)}\), \(i = 1,\ldots ,n\), where r(i) are pseudorandom values drawn from the standard uniform distribution on (0, 1).
Here, \(\kappa (A) \approx \texttt {cnd}\) for \(\texttt {cnd} < {\mathbf {u}}^{1} \approx 10^{16}\). As shown in [17], for \(\texttt {mode} \in \{1,2\}\), there is a cluster of nearly multiple eigenvalues, so that Algorithm 1 (RefSyEv) does not work effectively.
As in [17], we set \(n = 10\) and \(\texttt {cnd} = 10^{8}\) to generate moderately illconditioned problems in binary64 and consider the computed results obtained using multipleprecision arithmetic with sufficiently long precision as the exact eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). We compute \(X^{(0)}\) as an initial approximate eigenvector matrix using the MATLAB function eig in binary64 arithmetic.
In the previous paper [17], we observed the quadratic convergence of Algorithm 1 in the case of \(\texttt {mode} \in \{3,4,5\}\), while Algorithm 1 failed to improve the accuracy of the initial approximate eigenvectors in the case of \(\texttt {mode} \in \{1, 2\}\), since the test matrices for \(\texttt {mode} \in \{1, 2\}\) have nearly multiple eigenvalues. To confirm the behavior of Algorithm 2, we apply Algorithm 2 to the same examples. The results are shown in Fig. 2, which provides \(\max _{1 \le i \le n}\widehat{\lambda }_{i}  \lambda _{i}/\lambda _{i}\) as the maximum relative error of the computed eigenvalues \(\widehat{\lambda }_{i}\), \(\Vert \mathrm {offdiag}(\widehat{X}^{\mathrm {T}}A\widehat{X})\Vert /\Vert A\Vert \) as the diagonality of \(\widehat{X}^{\mathrm {T}}A\widehat{X}\), \(\Vert I  \widehat{X}^{\mathrm {T}}\widehat{X}\Vert \) as the orthogonality of a computed eigenvector matrix \(\widehat{X}\), and \(\Vert \widehat{E}\Vert \) where \(\widehat{E}\) is a computed result of \(\widetilde{E}\) in Algorithm 1. Here, \(\mathrm {offdiag}(\cdot )\) denotes the offdiagonal part of a given matrix. The horizontal axis shows the number of iterations \(\nu \) of Algorithm 2. As can be seen from the results, Algorithm 2 works very well even in the case of \(\texttt {mode} \in \{1, 2\}\).
6.1.2 Clustered eigenvalues
As an example of clustered eigenvalues, we show the results for the Wilkinson matrix [23], which is symmetric and tridiagonal with pairs of nearly equal eigenvalues. The Wilkinson matrix \(W_{n} = (w_{ij}) \in {\mathbb {R}}^{n \times n}\) consists of diagonal entries \(w_{ii} := \frac{n  2i + 1}{2}\), \(i = 1, 2, \ldots , n\), and super and subdiagonal entries being all ones. We apply Algorithm 2 to the Wilkinson matrix with \(n = 21\). The results are displayed in Fig. 3. As can be seen, Algorithm 2 works well.
Next, we show the convergence behavior of Algorithm 2 with limited computational precision for larger matrices with various \(\beta \), which denotes the reciprocal of the minimum gap between the eigenvalues normalized by \(\Vert A\Vert \) as defined in (10). If \(\beta \) is too large such as \(\beta ^{2}{\mathbf {u}}\ge 1\) as mentioned at the end of Sect. 3, we cannot expect to improve approximate eigenvectors by Algorithm 1. We generate test matrices as follows. Set \(k \in {\mathbb {N}}\) with \(k \le n  2\). Let \(A = QDQ^{\mathrm {T}}\), where Q is an orthogonal matrix and D is a diagonal matrix where
Then, k eigenvalues are clustered close to 1 with the gap \(\beta ^{1}\), and \(n  k\) eigenvalues are distributed equally in \([1,\frac{1}{2}]\). We compute \(A \approx \widetilde{Q}\widetilde{D}\widetilde{Q}^{\mathrm {T}}\) in IEEE 754 binary64 arithmetic, where \(\widetilde{Q}\) is a pseudorandom approximate orthogonal matrix and \(\widetilde{D}\) is a floatingpoint approximation of D. We fix \(n = 100\) and \(k = 10\) and vary \(\beta \) between \(10^{2}\) and \(10^{14}\). To make the results more illustrative, we provide a less accurate initial guess \(X^{(0)}\) using binary32 arithmetic. In the algorithm, we adopt binary128 (socalled quadruple precision) for highprecision arithmetic. Then, the maximum relative accuracy of the computed results is limited to \({\mathbf {u}}_{h} = 2^{113} \approx 10^{34}\). For binary128 arithmetic, we use the multipleprecision toolbox, in which binary128 arithmetic is supported as a special case by the command mp.Digits(34). The results are shown in Fig. 4. As can be seen, Algorithm 2 can refine the computed eigenvalues until their relative accuracy obtains approximately \({\mathbf {u}}_{h}\). Both the orthogonality and diagonality of the computed eigenvectors are improved until approximately \(\beta {\mathbf {u}}_{h}\). This result is consistent with Remark 1. For \(\beta \in \{10^{8}, 10^{14}\}\), Algorithm 1 cannot work because \(X^{(0)}\) is insufficiently accurate and the assumption (3) is not satisfied. We confirm that this problem can be resolved by Algorithm 2.
6.2 Computational speed
To evaluate the computational speed of the proposed algorithm (Algorithm 2), we first compare the computing time of Algorithm 2 to that of an approach that uses multipleprecision arithmetic (MPapproach). Note that the timing should be observed for reference because the computing time for Algorithm 2 strongly depends on the implementation of accurate matrix multiplication. Thus, we adopt an efficient method proposed by Ozaki et al. [19] that utilizes fast matrix multiplication routines such as xGEMM in BLAS. To simulate multipleprecision numbers and arithmetic in Algorithm 2, we represent \(\widehat{X} = \widehat{X}_{1} + \widehat{X}_{2} + \cdots + \widehat{X}_{m}\) with \(\widehat{X}_{k}\), \(k = 1, 2, \dots , m\), being floatingpoint matrices in working precision, such as “doubledouble” (\(m = 2\)) and “quaddouble” (\(m = 4\)) precision format [10] and use the concept of errorfree transformations [16].
In the multipleprecision toolbox [1], the MRRR algorithm [6] via Householder reduction is implemented sophisticatedly with parallelism to solve symmetric eigenvalue problems.
For comparison of timing, we generate a pseudorandom real symmetric \(n \times n\) matrix with clustered eigenvalues in a similar way to Sect. 6.1.2. To construct several eigenvalue clusters, we change diagonal elements of D to
Then, there are c clusters close to \(1/c, 2/c, \ldots , 1\) with k eigenvalues and the gap \(\beta ^{1}\) in each cluster, and \(n  ck\) eigenvalues are distributed equally in \([1,\frac{1}{2}]\). We set \(n = 1000\), \(c = 5\), \(k = 100\), and \(\beta = 10^{12}\), i.e., the generated \(1000 \times 1000\) matrix has five clusters with 100 eigenvalues and the gap \(10^{12}\) in each cluster. We compare the measured computing time of Algorithm 2 to that of the MPapproach, which is shown in Table 1 together with \(\Vert \widehat{E}\Vert \) and \(n_{{\mathcal {J}}}\), where \(n_{{\mathcal {J}}}\) is the number of eigenvalue clusters identified in Algorithm 2. At \(\nu = 2\), Algorithm 2 successfully identifies five eigenvalue clusters as \(n_{{\mathcal {J}}} = 5\) corresponding to \(c = 5\).
For comparison of timing on a lower performance computer, we also conducted numerical experiments using MATLAB R2017b on our laptop PC with a 2.5 GHz Intel Core i77660U (2 cores) CPU and 16 GB of main memory. In a similar way to the previous example, we set \(n = 500\), \(c = 5\), \(k = 10\), and \(\beta = 10^{12}\), i.e., the generated \(500 \times 500\) matrix has five clusters with 10 eigenvalues and the gap \(10^{12}\) in each cluster. As can be seen from Table 2, the result is similar to that in Table 1.
Next, we address more largescale problems. The test matrices are generated using the MATLAB function randn with \(n \in \{2000, 5000, 10{,}000\}\), such as B = randn(n) and A = B + B’. We aim to compute all the eigenvectors of a given real symmetric \(n \times n\) matrix A with the maximum accuracy allowed by the binary64 format. To make the results more illustrative, we provide a less accurate initial guess \(X^{(0)}\) using eig in binary32, and we then refine \(X^{(\nu )}\) by Algorithm 2. For efficiency, we use binary64 arithmetic for \(\nu = 1, 2\), and accurate matrix multiplication based on errorfree transformations [19] for \(\nu = 3\). As numerical results, we provide \(\Vert \widehat{E}\Vert \), \(n_{{\mathcal {J}}}\), and the measured computing time. The results are shown in Table 3. As can be seen, Algorithm 2 improves the accuracy of the computed results up to the limit of binary64 (\({\mathbf {u}}= 2^{53} \approx 10^{16}\)). For \(n \in \{5000, 10{,}000\}\), Algorithm 2 requires much computing time in total compared with eig in binary32 as \(n_{{\mathcal {J}}}\) increases. This is because the problems generally become more illconditioned for larger n. In fact, on the minimum gap between eigenvalues, \(\beta = 2.71 \cdot 10^{5}\) for \(n = 2000\), \(\beta = 2.31 \cdot 10^{6}\) for \(n = 5000\), and \(\beta = 2.26 \cdot 10^{6}\) for \(n = 10{,}000\). Thus, it is likely that binary32 arithmetic cannot provide a sufficiently accurate initial guess \(X^{(0)}\) for a largescale random matrix.
6.3 Application to a realworld problem
Finally, we apply the proposed algorithm to a quantum materials simulation that aims to understand electronic structures in material physics. The problems can be reduced to generalized eigenvalue problems, where eigenvalues and eigenvectors correspond to electronic energies and wave functions, respectively. To understand properties of materials correctly, it is crucial to determine the order of eigenvalues [12] and to obtain accurate eigenvectors [24, 25].
We deal with a generalized eigenvalue problem \(Ax = \lambda Bx\) arising from a vibrating carbon nanotube within a supercell with s, p, d atomic orbitals [4]. The matrices A and B are taken from ELSES matrix library [7] as VCNT22500, where A and B are real symmetric \(n \times n\) matrices with B being positive definite and \(n = 22500\). Our goal is to compute accurate eigenvectors and separate all the eigenvalues of the problem for determining their order. To this end, we use a numerical verification method in [14] based on the Gershgorin circle theorem (cf. e.g. [9, Theorem 7.2.2] and [23, pp. 71ff]), which can rigorously check whether all eigenvalues are separated and determine an existing range of each eigenvalue.
Let \(\varLambda (B^{1}A)\) be the set of the eigenvalues of \(B^{1}A\). Here, all the eigenvalues of \(B^{1}A\) are real from the assumption of A and B. Let \(\widehat{X} \in {\mathbb {R}}^{n \times n}\) be an approximate eigenvector matrix of \(B^{1}A\) with \(\widehat{X}\) being nonsingular. Then, it is expected that \(C := \widehat{X}^{1}B^{1}A\widehat{X}\) is nearly diagonal. Although it is not possible, in general, to calculate \(C = (c_{ij})\) exactly in finite precision arithmetic, we can efficiently obtain an enclosure of C. Note that we compute neither an enclosure of \(B^{1}\) nor that of \(\widehat{X}^{1}\) explicitly. Instead, we compute an approximate solution \(\widehat{C}\) of linear systems \((B\widehat{X})C = A\widehat{X}\) and then verify the accuracy of \(\widehat{C}\) using Yamamoto’s method [26] with matrixbased interval arithmetic [18, 21] for obtaining an enclosure of C. Suppose \(\widehat{D} = \mathrm {diag}(\widehat{\lambda }_{i})\) is a midpoint matrix and \(G = (g_{ij})\) is a radius matrix with \(g_{ij} \ge 0\) satisfying
Then, the Gershgorin circle theorem implies
It can also be shown that if all the disks \([\widehat{\lambda }_{i}  e_{i}, \widehat{\lambda }_{i} + e_{i}]\) are isolated, then all the eigenvalues are separated, i.e., each disk contains precisely one eigenvalue of \(B^{1}A\) [23, pp. 71ff].
We first computed an approximate eigenvector matrix \(\widehat{X}\) of \(B^{1}A\) using the MATLAB function \(\mathsf {eig}(A,B)\) in binary64 arithmetic as an initial guess, and \(\widehat{X}\) was obtained in 235.17 s. Then, we had \(\max _{1 \le i \le n}e_{i} = 2.75 \times 10^{7}\), and 10 eigenvalues with 5 clusters could not be separated due to relatively small eigenvalue gaps. We next applied Algorithm 2 to A, B, and \(\widehat{X}\) in higher precision arithmetic in a similar way to Sect. 6.2, and obtained a refined approximate eigenvector matrix \(\widehat{X}'\) in 597.52 s. Finally, we obtained \(\max _{1 \le i \le n}e_{i} = 1.58 \times 10^{14}\) and confirmed that all the eigenvalues can successfully be separated.
References
Advanpix: Multiprecision Computing Toolbox for MATLAB, Code and documentation. http://www.advanpix.com/ (2016)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)
Atkinson, K., Han, W.: Theoretical Numerical Analysis, 3rd edn. Springer, New York (2009)
Cerdá, J., Soria, F.: Accurate and transferable extended Hückeltype tightbinding parameters. Phys. Rev. B 61, 7965–7971 (2000)
Davis, C., Kahan, W.M.: The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7, 1–46 (1970)
Dhillon, I.S., Parlett, B.N.: Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra Appl. 387, 1–28 (2004)
ELSES matrix library, Data and documentation. http://www.elses.jp/matrix/ (2018)
GMP: GNU Multiple Precision Arithmetic Library, Code and documentation. http://gmplib.org/ (2018)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)
Hida, Y., Li, X. S., Bailey, D. H.: Algorithms for quaddouble precision floating point arithmetic. In: Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pp. 155–162. IEEE Computer Society Press (2001)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)
Lee, D., Hoshi, T., Sogabe, T., Miyatake, Y., Zhang, S.L.: Solution of the \(k\)th eigenvalue problem in largescale electronic structure calculations. J. Comput. Phys. 371, 618–632 (2018)
Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28, 152–205 (2002)
Miyajima, S.: Numerical enclosure for each eigenvalue in generalized eigenvalue problem. J. Comput. Appl. Math. 236, 2545–2552 (2012)
MPFR: The GNU MPFR Library, Code and documentation. http://www.mpfr.org/ (2018)
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26, 1955–1988 (2005)
Ogita, T., Aishima, K.: Iterative refinement for symmetric eigenvalue decomposition. Jpn. J. Ind. Appl. Math. 35(3), 1007–1035 (2018)
Oishi, S.: Fast enclosure of matrix eigenvalues and singular values via rounding mode controlled computation. Linear Algebra Appl. 324, 133–146 (2001)
Ozaki, K., Ogita, T., Oishi, S., Rump, S.M.: Errorfree transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numer. Algorithms 59, 95–118 (2012)
Parlett, B.N.: The Symmetric Eigenvalue Problem. Classics in Applied Mathematics, vol. 20, 2nd edn. SIAM, Philadelphia (1998)
Rump, S.M.: Fast and parallel interval arithmetic. BIT Numer. Math. 39, 534–554 (1999)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floatingpoint summation part II: sign, \(K\)fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31, 1269–1302 (2008)
Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Clarendon Press, Oxford (1965)
Yamamoto, S., Fujiwara, T., Hatsugai, Y.: Electronic structure of charge and spin stripe order in \({\rm La}_{2x}\,{\rm Sr}_{x}\,{\rm NiO}_{4}\) (\(x = \frac{1}{3}, \frac{1}{2}\)). Phys. Rev. B 76, 165114 (2007)
Yamamoto, S., Sogabe, T., Hoshi, T., Zhang, S.L., Fujiwara, T.: Shifted conjugateorthogonalconjugategradient method and its application to double orbital extended Hubbard model. J. Phys. Soc. Jpn. 77, 114713 (2008)
Yamamoto, T.: Error bounds for approximate solutions of systems of equations. Jpn. J. Appl. Math. 1, 157–171 (1984)
Acknowledgements
The first author would like to express his sincere thanks to Professor Chen Greif at the University of British Columbia for his valuable comments and helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was partially supported by CREST, JST and JSPS KAKENHI Grant numbers 16H03917, 25790096.
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what reuse is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and reuse information, please contact the Rights and Permissions team.
About this article
Cite this article
Ogita, T., Aishima, K. Iterative refinement for symmetric eigenvalue decomposition II: clustered eigenvalues. Japan J. Indust. Appl. Math. 36, 435–459 (2019). https://doi.org/10.1007/s13160019003484
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160019003484
Keywords
 Accurate numerical algorithm
 Iterative refinement
 Symmetric eigenvalue decomposition
 Clustered eigenvalues