Iterative refinement for symmetric eigenvalue decomposition II: clustered eigenvalues

We are concerned with accurate eigenvalue decomposition of a real symmetric matrix A. In the previous paper (Ogita and Aishima in Jpn J Ind Appl Math 35(3): 1007–1035, 2018), we proposed an efficient refinement algorithm for improving the accuracy of all eigenvectors, which converges quadratically if a sufficiently accurate initial guess is given. However, since the accuracy of eigenvectors depends on the eigenvalue gap, it is difficult to provide such an initial guess to the algorithm in the case where A has clustered eigenvalues. To overcome this problem, we propose a novel algorithm that can refine approximate eigenvectors corresponding to clustered eigenvalues on the basis of the algorithm proposed in the previous paper. Numerical results are presented showing excellent performance of the proposed algorithm in terms of convergence rate and overall computational cost and illustrating an application to a quantum materials simulation.


Introduction
Let A be a real symmetric n ×n matrix. Since solving a standard symmetric eigenvalue problem Ax = λx, where λ ∈ R is an eigenvalue of A and x ∈ R n is an eigenvector of A associated with λ, is ubiquitous in scientific computing, it is important to develop reliable numerical algorithms for calculating eigenvalues and eigenvectors accurately. Excellent overviews on the symmetric eigenvalue problem can be found in references [20,23].
We are concerned with the eigenvalue decomposition of A such that where X is an n × n orthogonal matrix whose ith columns are eigenvectors x (i) of A (called an eigenvector matrix) and D = (d i j ) is an n × n diagonal matrix whose diagonal elements are the corresponding eigenvalues λ i ∈ R, i.e., d ii = λ i for i = 1, . . . , n. Throughout the paper, we assume that λ 1 ≤ λ 2 ≤ · · · ≤ λ n , and the columns of X are ordered correspondingly. We here collect notation used in this paper. Let I and O denote the identity matrix and the zero matrix of appropriate size, respectively. Unless otherwise specified, · means · 2 , which denotes the Euclidean norm for vectors and the spectral norm for matrices. For legibility, if necessary, we distinguish between the approximate quantities and the computed results, e.g., for some quantity α we write α and α as an approximation of α and a computed result for α, respectively.
The accuracy of an approximate eigenvector depends on the gap between the corresponding eigenvalue and its nearest neighbor eigenvalue (cf., e.g., [20,Theorem 11.7.1]). For simplicity, suppose all eigenvalues of A are simple. Let X ∈ R n×n be an approximation of X . Let z (i) := x (i) / x (i) for i = 1, 2, . . . , n, where x (i) are the i-th columns of X . Moreover, for each i, suppose that the Ritz value μ i := z T (i) Az (i) is closer to λ i than to any other eigenvalues. Let gap(μ i ) denote the smallest difference between μ i and any other eigenvalue, i.e., gap(μ i ) := min j =i |μ i −λ j |. Then, it holds for all i that . (2) The smaller the eigenvalue gap, the worse the accuracy of a computed eigenvector. Therefore, refinement algorithms for eigenvectors are useful for obtaining highly accurate results. For example, highly accurate computations of a few or all eigenvectors are crucial for large-scale electronic structure calculations in material physics [24,25], in which specific interior eigenvalues with associated eigenvectors need to be computed. On related work on refinement algorithms for symmetric eigenvalue decomposition, see the previous paper [17] for details.
the subspace corresponding to the clustered eigenvalues. Then, we divide the entire problem into subproblems, each of which corresponds to each cluster of eigenvalues. Finally, we expand eigenvalue gaps in each subproblem by using a diagonal shift and compute eigenvectors of each subproblem, which can be used for refining approximate eigenvectors corresponding to clustered eigenvalues in the entire problem. One might notice that the above procedure is similar to the classical shift-invert technique to transform eigenvalue distributions. In addition, the MRRR algorithm [6] also employs a shift strategy to increase relative gaps between clustered eigenvalues for computing the associated eigenvectors. In other words, it is well known that the diagonal shift is useful for solving eigenvalue problems accurately. Our contribution is to show its effectiveness on the basis of appropriate error analysis with the adaptive use of higher precision arithmetic, which leads to the derivation of the proposed algorithm.
In the same spirit of the previous paper [17], our proposed algorithm primarily comprises matrix multiplication, which accounts for the majority of the computational cost. Therefore, we can utilize higher precision matrix multiplication efficiently. For example, XBLAS [13] and other efficient algorithms [16,19,22] based on so-called error-free transformations for accurate matrix multiplication are available for practical implementation.
The remainder of the paper is organized as follows. In Sect. 2, we recall the refinement algorithm (Algorithm 1) proposed in the previous paper [17] together with its convergence theory. For practical use, we present a rounding error analysis of Algorithm 1 in finite precision arithmetic in Sect. 3, which is useful for setting working precision and shows achievable accuracy of approximate eigenvectors obtained by using Algorithm 1. In Sect. 4, we show the behavior of Algorithm 1 for clustered eigenvalues, which explains the effect of nearly multiple eigenvalues on computed results and leads to the derivation of the proposed algorithm. On the basis of Algorithm 1, we propose a refinement algorithm (Algorithm 2: RefSyEvCL) that can also be applied to matrices with clustered eigenvalues in Sect. 5. In Sect. 6, we present some numerical results showing the behavior and performance of the proposed algorithm together with an application to a quantum materials simulation as a real-world problem.
For simplicity, we basically handle only real matrices. As mentioned in the previous paper [17], the discussions in this paper can also be extended to generalized symmetric (Hermitian) definite eigenvalue problems.

Basic algorithm and its convergence theory
In this section, we introduce the refinement algorithm proposed in the previous paper [17], which is the basis of the algorithm proposed in this paper.
Let A = A T ∈ R n×n . The eigenvalues of A are denoted by λ i ∈ R, i = 1, . . . , n. Then A = max 1≤i≤n |λ i | = max(|λ 1 |, |λ n |). Let X ∈ R n×n denote an orthogonal eigenvector matrix comprising normalized eigenvectors of A, and let X denote an approximation of X with X being nonsingular. In addition, define E ∈ R n×n such that In the previous paper, we presented the following algorithm for the eigenvalue decomposition of A, which is designed to be applied iteratively. For later use in Sect. 5, the algorithm also allows the case where an input X is rectangular, i.e., X ∈ R n× , < n. Algorithm 1 RefSyEv: Refinement of approximate eigenvectors of a real symmetric matrix. Higher-precision arithmetic is required for all the computations except line 6.

9: end function
In [17, Theorem 1], we presented the following theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given X is sufficiently close to X . Theorem 1 (Ogita-Aishima [17]) Let A be a real symmetric n × n matrix with simple eigenvalues λ i , i = 1, 2, . . . , n, and a corresponding orthogonal eigenvector matrix X ∈ R n×n . For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in real arithmetic, and X is the quantity calculated in Algorithm 1. Define E and E such that X = X (I + E) and X = X (I + E ), respectively. If then we have lim sup In the following, we review the discussion in [17, §3.2] for exactly multiple eigenvalues. If λ i ≈ λ j corresponding to multiple eigenvalues λ i = λ j , we compute e i j = e ji = r i j /2 for (i, j) such that | λ i − λ j | ≤ ω.
To investigate the above exceptional process, define the index sets M k , k = 1, 2, . . . , n M , for multiple eigenvalues {λ i } i∈M k satisfying the following conditions: Note that the eigenvectors corresponding to multiple eigenvalues are not unique. Hence, using the above index sets, let Y be an eigenvector matrix defined such that, for all k, the n k × n k submatrices of X −1 Y corresponding to {λ i } i∈M k are symmetric and positive definite. Since then Y is unique, define F such that Y = X (I + F). Define R := I − X T X and S := X T A X . Then, using the orthogonality Y T Y = I and the diagonality Y T AY = D, we have where := F and The above equations can be obtained in the same manner as in our previous paper [17, Eqs. (7) and (11)] by replacing E with F in the equations. In a similar way to Newton's method (cf. e.g., [3, p. 236]), dropping the second order terms in (7) and (8) yields Algorithm 1, and the next convergence theorem is provided [17,Theorem 2].
Theorem 2 (Ogita-Aishima [17]) Let A be a real symmetric n × n matrix with the eigenvalues λ i , i = 1, 2, . . . , n. Suppose A has multiple eigenvalues with index sets M k , k = 1, 2, . . . , n M , satisfying (6). Let V be the set of n ×n orthogonal eigenvector matrices of A. For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in real arithmetic, and X and ω are the quantities calculated in Algorithm 1. Let Y , Y ∈ V be defined such that, for all k, the n k × n k submatrices of X −1 Y and (X ) −1 Y corresponding to {λ i } i∈M k are symmetric and positive definite. Define F and F such that Y = X (I + F) and Y = X (I + F ), respectively. Furthermore, suppose that Then, we obtain On the basis of the above convergence theorems, let us consider the iterative refinement using Algorithm 1: i j ) are the quantities calculated in line 7 of Algorithm 1. In practice, it is likely that ordinary precision floating-point arithmetic, such as IEEE 754 binary32 or binary64, is used for calculating an approximation X to an eigenvector matrix X of a given symmetric matrix A by some backward stable algorithm. It is natural to use such X as an initial guess X (0) in Algorithm 1. However, if A has nearly multiple eigenvalues, it is difficult to obtain a sufficiently accurate X (0) in ordinary precision floating-point arithmetic such that Algorithm 1 works well. To overcome this problem, we develop a practical algorithm for clustered eigenvalues, which is proposed as Algorithm 2 in Sect. 5.

Rounding error analysis for basic algorithm
If Algorithm 1 is performed in finite precision arithmetic with the relative rounding error unit u h , the accuracy of a refined eigenvector matrix X is restricted by u h . Since X is improved quadratically when using real arithmetic, u h must correspond to E 2 to preserve the convergence property of Algorithm 1. We explain the details in the following. For simplicity, we consider the real case. The extension to the complex case is obvious.
Let F h be a set of floating-point numbers with the relative rounding error unit u h . We define the rounding operator fl h such that fl h : R → F h and assume the use of the following standard floating-point arithmetic model [11]. For a, b ∈ F h and • ∈ {+, −, ×, /}, it holds that For example, it is satisfied in IEEE 754 floating-point arithmetic barring overflow and underflow. Suppose all elements of A and X are exactly representable in F h , i.e., A, X ∈ F n×n h , and x (i) ≈ 1 for all i. Let R, S, and E denote the computed results of R, S, and E in Algorithm 1, respectively. Define Δ R , Δ S , and Δ E such that From a standard rounding error analysis as in [11], we obtain For the computed results λ i of λ i , i = 1, 2, . . . , n, in Algorithm 1, where ω is an approximation of ω computed in floating-point arithmetic in Algorithm 1, Then For other (i, j), we have Then In summary, we obtain where β is the reciprocal of the minimum gap between the eigenvalues normalized by A . For the computed result X of X = X (I + E) in Algorithm 1, and, using (10), Thus, if a given X is sufficiently close to X in such a way that the assumption (3) holds, combining (5) and (11) yields If A has nearly multiple eigenvalues and (3) does not hold, then the convergence of Algorithm 1 to an eigenvector matrix of A is guaranteed neither in real arithmetic nor in finite precision arithmetic regardless of the value of u h . We will deal with such an ill-conditioned case in Sect. 5.

Remark 1
As can be seen from (12), with a fixed u h , iterative use of Algorithm 1 eventually computes an approximate eigenvector matrix that is accurate to O(βu h ), provided that the assumption (3) in Theorem 1 holds in each iteration. This will be confirmed numerically in Sect. 6.
Let us consider the most likely scenario where X is computed by some backward stable algorithm in ordinary precision floating-point arithmetic with the relative rounding error unit u. From (2), we have under the assumption that β ≈ A / min 1≤i≤n gap(μ i ). Thus, From (12), we obtain Therefore, u h should be less than β 2 u 2 in order to preserve convergence speed for the first iteration by Algorithm 1.
Suppose that X − X = cβu and X − X = c β 3 u 2 where c and c are some constants. If c β 2 u < 1 for c := c /c, then an approximation of X is improved in the sense that X − X < X − X . In other words, if β is too large such that c β 2 u ≥ 1, Algorithm 1 may not work well.

Effect of nearly multiple eigenvalues in basic algorithm
In general, a given matrix A in floating-point format does not have exactly multiple eigenvalues. It is necessary to discuss the behavior of Algorithm 1 for A with some nearly multiple eigenvalues λ i ≈ λ j such that | λ i − λ j | ≤ ω in line 7. We basically discuss the behavior in real arithmetic. The effect of the rounding error is briefly explained in Remark 2 at the end of this section.
For simplicity, we assume λ 1 ≤ λ 2 ≤ · · · ≤ λ n . In the following analysis, define which means that the clustered eigenvalues of A ω are all multiple in each cluster. Then, A ω is a perturbed matrix such that Throughout this section, we assume that Importantly, although each individual eigenvector associated with the nearly multiple eigenvalues is very sensitive to perturbations, the subspace spanned by such eigenvectors is not sensitive. Thus, X computed by a backward stable algorithm is sufficiently close to an eigenvector matrix of A ω . Below, we show that Algorithm 1 computes E that approximates an exact eigenvector matrix Y ω defined in the same manner as Y in Sect. 2. Note that Y ω is the eigenvector matrix of the above A ω close to A, where A ω has exactly multiple eigenvalues.
Recall that the submatrices of X −1 Y ω corresponding to the multiple eigenvalues of A ω are symmetric and positive definite. Then, we see that Algorithm 1 computes an approximation of Y ω as follows. Define R and S ω as corresponding to A ω . We see S ω is considered a perturbed matrix of S := X T A X . Note that, in Algorithm 1, E is computed with R and S. Here, we introduce an ideal matrix E ω computed with R and S ω , where E ω is quadratically convergent to F ω . In the following, we estimate E ω − E due to the above perturbation. To this end, we estimate each element of S ω − S as in the following lemma.
Lemma 1 Let A be a real symmetric n ×n matrix with eigenvalues λ i , i = 1, 2, . . . , n, and a corresponding orthogonal eigenvector matrix X . In Algorithm 1, for a given nonsingular X ∈ R n×n , define (14), In addition, define S ω := X T A ω X . Then, we have It is easy to see that In a similar way to (8), we have Then (17) follows. Moreover, noting D Q is a block diagonal matrix, we obtain (18).
For the perturbation analysis of E, the next lemma is crucial.

Lemma 2 Let
A be a real symmetric n ×n matrix with eigenvalues λ i , i = 1, 2, . . . , n, and a corresponding orthogonal eigenvector matrix X . In Algorithm 1, for a given nonsingular X ∈ R n×n , define (14). Assume that (15) is satisfied. Define R = (r i j ) and S ω = (s (ω) i j ) such that R := I − X T X and S ω := X T A ω X . Suppose positive numbers ω 1 and ω 2 satisfy We assume that, for all (i, j) in line 7 of Algorithm 1, the formulas of e (ω) i j are the same as those of e i j , i.e., e (ω) Moreover, for Proof we have (20). Next, for λ (ω) Thus, from (19), we have For (i, j) such that | λ i − λ j | > ω, since we see we evaluate the errors based on the following inequalities: In the right-hand side, using (22) and (23), we see Therefore, we obtain (21).
Since we suppose δ ω = O( A ω ) in the situation where X is computed by a backward stable algorithm, we have Therefore, E is sufficiently close to E ω and F ω under the above mild assumptions. Although Y ω can be very far from any eigenvector matrix of A, the subspace spanned by the columns of Y ω corresponding to the clustered eigenvalues adequately approximates that by the exact eigenvectors whenever A ω − A is sufficiently small. In the following, we derive an algorithm for clustered eigenvalues using such an important feature.

Remark 2
In this section, we proved that E is sufficiently close to F ω under the mild assumptions. In Sect. 3, the effect of rounding errors on E is evaluated as in (10), i.e., Δ E := E − E is sufficiently small, where E is computed in finite precision arithmetic.
The rounding error analysis is not caused by the perturbation analysis to F ω in this section. Thus, it is easy to see that E − F ω ≤ E − F ω + E − F ω simply holds for the individual estimation for each error E − F ω and E − F ω respectively, and hence, the computed E is sufficiently close to F ω corresponding to A ω .

Proposed algorithm for nearly multiple eigenvalues
On the basis of the basic algorithm (Algorithm 1), we propose a practical version of an algorithm for improving the accuracy of computed eigenvectors of symmetric matrices that can also deal with nearly multiple eigenvalues.
Recall that, in Algorithm 1, we choose e i j for all (i, j) as where ω is defined in line 6 of the algorithm.
In the following, we overcome such a problem for general symmetric matrices.

Outline of the proposed algorithm
As mentioned in Sect. 1, the sin θ theorem by Davis-Kahan suggests that backward stable algorithms can provide a sufficiently accurate initial guess of a subspace spanned by eigenvectors associated with clustered eigenvalues for each cluster. We explain how to refine approximate eigenvectors by extracting them from the subspace correctly. Suppose that Algorithm 1 is applied to A = A T ∈ R n×n and its approximate eigenvector matrix X ∈ R n×n . Then, we obtain X , λ, and ω where X ∈ R n×n is a refined approximate eigenvector matrix, λ i , i = 1, 2, . . . , n, are approximate eigenvalues, and ω ∈ R is the criterion that determines whether λ i are clustered. Using λ and ω, we can easily obtain the index sets J k , k = 1, 2, . . . , n J , for the clusters { λ i } i∈J k of the approximate eigenvalues satisfying all the following conditions (see also Fig. 1).
Now the problem is how to refine X (:, J k ) ∈ R n×n k , which denotes the matrix comprising approximate eigenvectors corresponding to the clustered approximate eigenvalues { λ i } i∈J k .
From the observation about the numerical results in the previous section, we develop the following procedure for the refinement.
1. Find clusters of approximate eigenvalues of A and obtain the index sets J k , k = 1, 2, . . . , n J for those clusters.

Define
Perform the following procedure for each T k ∈ R n k ×n k .
This procedure is interpreted as follows. We first apply an approximate similarity transformation to A using a refined eigenvector matrix X , such as S := (X ) T AX . Then, we divide the problem for S ∈ R n×n into subproblems for S k ∈ R n k ×n k , k = 1, 2, . . . , n J , corresponding to the clusters. We then apply a diagonal shift to S k , such as T k := S k − μ k I to relatively separate the clustered eigenvalues around μ k . Rather than using these to obtain T k , we perform steps 2 and 3 in view of computational efficiency and accuracy. Finally, we update the columns of X corresponding to J k using an eigenvector matrix W k of T k by V k W k .

Proposed algorithm
Here, we present a practical version of a refinement algorithm for eigenvalue decomposition of a real symmetric matrix A, which can also be applied to the case where A has clustered eigenvalues.
In Algorithm 2, the function fl(C) rounds an input matrix C ∈ R n×n to a matrix T ∈ F n×n , where F is a set of floating-point numbers in ordinary precision, such as the IEEE 754 binary64 format. Here, "round-to-nearest" rounding is not required; however, some faithful rounding, such as chopping, is desirable. Moreover, the function eig(T ) is similar to the MATLAB function, which computes all approximate eigenvectors of an input matrix T ∈ F n×n in working precision arithmetic. This is expected to adopt some backward stable algorithm as implemented in the LAPACK routine xSYEV [2]. From lines 13-17 in Algorithm 2, we aim to obtain sufficiently accurate approximate eigenvectors X (:, J k ) of A, where the columns of X (:, J k ) correspond to J k . For this purpose, we iteratively apply Algorithm 1 (RefSyEv) to A − μ k I and V for some ν becomes as accurate as other eigenvectors associated with well-separated eigenvalues. Note that the spectral norms E 2 and E k 2 can be replaced by the Frobenius norms E F and E k F .
For the example (26), we apply Algorithm 2 (RefSyEvCL) to A and the same initial guess X (0) as before. The results of two iterations are as follows. Determine the index sets J k , k = 1, . . . , n J , as in (27) for eigenvalue clusters using D = diag( λ i ) and ω.
n J : The number of clusters. 5: for k ← 1, 2, . . . , n J do Refine eigenvectors for each cluster. 6: Determine the shift constant μ k . 8: A k ← A − μ k I Shift A for separating clustered eigenvalues. 9: Conversion from higher-precision to ordinary precision. 11: Compute eigenvectors of C k in ordinary precision. 12: if E k 2 ≥ 1, return, end if Improvement cannot be expected. 16: Update the columns of X corresponding to J k . 19: end for 20: end function Thus, Algorithm 2 works well for this example, i.e., the approximate eigenvectors corresponding to the nearly double eigenvalues λ 2 and λ 3 are improved in terms of both orthogonality and diagonality.

Remark 3
For a generalized symmetric definite eigenvalue problem Ax = λBx where A and B are real symmetric with B being positive definite, we can modify the algorithms as follows.
-In Algorithm 1 called at line 2 in Algorithm 2, replace Note that B does not appear in Algorithm 1 called at line 14 in Algorithm 2.

Numerical results
We present numerical results to demonstrate the effectiveness of the proposed algorithm (Algorithm 2: RefSyEvCL). All numerical experiments discussed in this section were conducted using MATLAB R2016b on our workstation with two CPUs (3.0 GHz Intel Xeon E5-2687W v4 (12 cores)) and 1 TB of main memory, unless otherwise specified. Let u denote the relative rounding error unit (u = 2 −24 for IEEE binary32 and u = 2 −53 for binary64). To realize multiple-precision arithmetic, we adopt Advanpix Multiprecision Computing Toolbox version 4.2.3 [1], which utilizes well-known, fast, and reliable multiple-precision arithmetic libraries including GMP and MPFR. We also use the multiple-precision arithmetic with sufficiently long precision to simulate real arithmetic. In all cases, we use the MATLAB function norm for computing the spectral norms R and S − D in Algorithm 1 in binary64 arithmetic, and we approximate A by max(| λ 1 |, | λ n |). We discuss numerical experiments for some dozens of seeds for the random number generator, and all results are similar to those provided in this section. Therefore, we adopt the default seed as a typical example using the MATLAB command rng('default') to ensure reproducibility of problems.

Convergence property
Here, we confirm the convergence property of the proposed algorithm for various eigenvalue distributions.

Various eigenvalue distributions
In the same way as the previous paper [17], we again generate real symmetric and positive definite matrices using the MATLAB function randsvd from Higham's test matrices [11] by the following MATLAB command.
Here, κ(A) ≈ cnd for cnd < u −1 ≈ 10 16 . As shown in [17], for mode ∈ {1, 2}, there is a cluster of nearly multiple eigenvalues, so that Algorithm 1 (RefSyEv) does not work effectively. As in [17], we set n = 10 and cnd = 10 8 to generate moderately ill-conditioned problems in binary64 and consider the computed results obtained using multipleprecision arithmetic with sufficiently long precision as the exact eigenvalues λ i , i = 1, 2, . . . , n. We compute X (0) as an initial approximate eigenvector matrix using the MATLAB function eig in binary64 arithmetic.
In the previous paper [17], we observed the quadratic convergence of Algorithm 1 in the case of mode ∈ {3, 4, 5}, while Algorithm 1 failed to improve the accuracy of the initial approximate eigenvectors in the case of mode ∈ {1, 2}, since the test matrices for mode ∈ {1, 2} have nearly multiple eigenvalues. To confirm the behavior  Fig. 2, which provides max 1≤i≤n | λ i − λ i |/|λ i | as the maximum relative error of the computed eigenvalues λ i , offdiag( X T A X ) / A as the diagonality of X T A X , I − X T X as the orthogonality of a computed eigenvector matrix X , and E where E is a computed result of E in Algorithm 1. Here, offdiag(·) denotes the off-diagonal part of a given matrix. The horizontal axis shows the number of iterations ν of Algorithm 2. As can be seen from the results, Algorithm 2 works very well even in the case of mode ∈ {1, 2}.

Clustered eigenvalues
As an example of clustered eigenvalues, we show the results for the Wilkinson matrix [23], which is symmetric and tridiagonal with pairs of nearly equal eigenvalues. The Wilkinson matrix W n = (w i j ) ∈ R n×n consists of diagonal entries w ii := |n−2i+1| 2 , i = 1, 2, . . . , n, and super-and sub-diagonal entries being all ones. We apply Algorithm 2 to the Wilkinson matrix with n = 21. The results are displayed in Fig. 3. As can be seen, Algorithm 2 works well.
Next, we show the convergence behavior of Algorithm 2 with limited computational precision for larger matrices with various β, which denotes the reciprocal of the minimum gap between the eigenvalues normalized by A as defined in (10). If β is too large such as β 2 u ≥ 1 as mentioned at the end of Sect. 3, we cannot expect to improve approximate eigenvectors by Algorithm 1. We generate test matrices as follows. Set k ∈ N with k ≤ n − 2. Let A = Q D Q T , where Q is an orthogonal matrix and D is a diagonal matrix where . We compute A ≈ Q D Q T in IEEE 754 binary64 arithmetic, where Q is a pseudo-random approximate orthogonal matrix and D is a floating-point approximation of D. We fix n = 100 and k = 10 and vary β between 10 2 and 10 14 . To make the results more illustrative, we provide a less accurate initial guess X (0) using binary32 arithmetic. In the algorithm, we adopt binary128 (socalled quadruple precision) for high-precision arithmetic. Then, the maximum relative accuracy of the computed results is limited to u h = 2 −113 ≈ 10 −34 . For binary128 arithmetic, we use the multiple-precision toolbox, in which binary128 arithmetic is supported as a special case by the command mp.Digits(34). The results are shown in Fig. 4. As can be seen, Algorithm 2 can refine the computed eigenvalues until their relative accuracy obtains approximately u h . Both the orthogonality and diagonality of the computed eigenvectors are improved until approximately βu h . This result is consistent with Remark 1. For β ∈ {10 8 , 10 14 }, Algorithm 1 cannot work because X (0) is insufficiently accurate and the assumption (3) is not satisfied. We confirm that this problem can be resolved by Algorithm 2.

Computational speed
To evaluate the computational speed of the proposed algorithm (Algorithm 2), we first compare the computing time of Algorithm 2 to that of an approach that uses multiple-precision arithmetic (MP-approach). Note that the timing should be observed for reference because the computing time for Algorithm 2 strongly depends on the implementation of accurate matrix multiplication. Thus, we adopt an efficient method proposed by Ozaki et al. [19] that utilizes fast matrix multiplication routines such as xGEMM in BLAS. To simulate multiple-precision numbers and arithmetic in Algorithm 2, we represent X = X 1 + X 2 + · · · + X m with X k , k = 1, 2, . . . , m, being floating-point matrices in working precision, such as "double-double" (m = 2) and "quad-double" (m = 4) precision format [10] and use the concept of error-free transformations [16]. In the multiple-precision toolbox [1], the MRRR algorithm [6] via Householder reduction is implemented sophisticatedly with parallelism to solve symmetric eigenvalue problems.
For comparison of timing, we generate a pseudo-random real symmetric n × n matrix with clustered eigenvalues in a similar way to Sect. 6.1.2. To construct several eigenvalue clusters, we change diagonal elements of D to Then, there are c clusters close to 1/c, 2/c, . . . , 1 with k eigenvalues and the gap β −1 in each cluster, and n − ck eigenvalues are distributed equally in [−1, − 1 2 ]. We set n = 1000, c = 5, k = 100, and β = 10 12 , i.e., the generated 1000 × 1000 matrix has five clusters with 100 eigenvalues and the gap 10 −12 in each cluster. We compare the measured computing time of Algorithm 2 to that of the MP-approach, which is shown in Table 1 together with E and n J , where n J is the number of eigenvalue clusters identified in Algorithm 2. At ν = 2, Algorithm 2 successfully identifies five eigenvalue clusters as n J = 5 corresponding to c = 5.
For comparison of timing on a lower performance computer, we also conducted numerical experiments using MATLAB R2017b on our laptop PC with a 2.5 GHz Intel Core i7-7660U (2 cores) CPU and 16 GB of main memory. In a similar way to the previous example, we set n = 500, c = 5, k = 10, and β = 10 12 , i.e., the generated 500 × 500 matrix has five clusters with 10 eigenvalues and the gap 10 −12 in each cluster. As can be seen from Table 2, the result is similar to that in Table 1.
Next, we address more large-scale problems. The test matrices are generated using the MATLAB function randn with n ∈ {2000, 5000, 10,000}, such as B = randn(n) and A = B + B'. We aim to compute all the eigenvectors of a given Table 1 Results for a pseudo-random real symmetric matrix with clustered eigenvalues on our workstation: n = 1000, 5 clusters, 100 eigenvalues, and β = 10 12 in each cluster  Table 2 Results for a pseudo-random real symmetric matrix with clustered eigenvalues on our laptop PC: n = 500, 5 clusters, 10 eigenvalues, and β = 10 12 in each cluster real symmetric n × n matrix A with the maximum accuracy allowed by the binary64 format. To make the results more illustrative, we provide a less accurate initial guess X (0) using eig in binary32, and we then refine X (ν) by Algorithm 2. For efficiency, we use binary64 arithmetic for ν = 1, 2, and accurate matrix multiplication based on error-free transformations [19] for ν = 3. As numerical results, we provide E , n J , and the measured computing time. The results are shown in Table 3. As can be seen, Algorithm 2 improves the accuracy of the computed results up to the limit of binary64 (u = 2 −53 ≈ 10 −16 ). For n ∈ {5000, 10,000}, Algorithm 2 requires much computing time in total compared with eig in binary32 as n J increases. This is because the problems generally become more ill-conditioned for larger n. In fact, on the minimum gap between eigenvalues, β = 2.71 · 10 −5 for n = 2000, β = 2.31 · 10 −6 for n = 5000, and β = 2.26 · 10 −6 for n = 10,000. Thus, it is likely that binary32 arithmetic cannot provide a sufficiently accurate initial guess X (0) for a large-scale random matrix.

Application to a real-world problem
Finally, we apply the proposed algorithm to a quantum materials simulation that aims to understand electronic structures in material physics. The problems can be reduced to generalized eigenvalue problems, where eigenvalues and eigenvectors correspond to electronic energies and wave functions, respectively. To understand properties of materials correctly, it is crucial to determine the order of eigenvalues [12] and to obtain accurate eigenvectors [24,25]. We deal with a generalized eigenvalue problem Ax = λBx arising from a vibrating carbon nanotube within a supercell with s, p, d atomic orbitals [4]. The matrices A and B are taken from ELSES matrix library [7] as VCNT22500, where A and B are real symmetric n × n matrices with B being positive definite and n = 22500. Our goal is to compute accurate eigenvectors and separate all the eigenvalues of the problem for determining their order. To this end, we use a numerical verification method in [14] based on the Gershgorin circle theorem (cf. e.g. [9, Theorem 7.2.2] and [23, pp. 71ff]), which can rigorously check whether all eigenvalues are separated and determine an existing range of each eigenvalue.
Let Λ(B −1 A) be the set of the eigenvalues of B −1 A. Here, all the eigenvalues of B −1 A are real from the assumption of A and B. Let X ∈ R n×n be an approximate eigenvector matrix of B −1 A with X being nonsingular. Then, it is expected that C := X −1 B −1 A X is nearly diagonal. Although it is not possible, in general, to calculate C = (c i j ) exactly in finite precision arithmetic, we can efficiently obtain an enclosure of C. Note that we compute neither an enclosure of B −1 nor that of X −1 explicitly. Instead, we compute an approximate solution C of linear systems (B X )C = A X and then verify the accuracy of C using Yamamoto's method [26] with matrix-based interval arithmetic [18,21] for obtaining an enclosure of C. Suppose D = diag( λ i ) is a midpoint matrix and G = (g i j ) is a radius matrix with g i j ≥ 0 satisfying otherwise for all (i, j).
Then, the Gershgorin circle theorem implies [ λ i − e i , λ i + e i ], e i := n j=1 g i j .
It can also be shown that if all the disks [ λ i − e i , λ i + e i ] are isolated, then all the eigenvalues are separated, i.e., each disk contains precisely one eigenvalue of B −1 A [23, pp. 71ff].
We first computed an approximate eigenvector matrix X of B −1 A using the MAT-LAB function eig(A, B) in binary64 arithmetic as an initial guess, and X was obtained in 235.17 s. Then, we had max 1≤i≤n e i = 2.75 × 10 −7 , and 10 eigenvalues with 5 clusters could not be separated due to relatively small eigenvalue gaps. We next applied Algorithm 2 to A, B, and X in higher precision arithmetic in a similar way to Sect. 6.2, and obtained a refined approximate eigenvector matrix X in 597.52 s. Finally, we obtained max 1≤i≤n e i = 1.58 × 10 −14 and confirmed that all the eigenvalues can successfully be separated.