Iterative refinement for symmetric eigenvalue decomposition

An efficient refinement algorithm is proposed for symmetric eigenvalue problems. The structure of the algorithm is straightforward, primarily comprising matrix multiplications. We show that the proposed algorithm converges quadratically if a modestly accurate initial guess is given, including the case of multiple eigenvalues. Our convergence analysis can be extended to Hermitian matrices. Numerical results demonstrate excellent performance of the proposed algorithm in terms of convergence rate and overall computational cost, and show that the proposed algorithm is considerably faster than a standard approach using multiple-precision arithmetic.


Introduction
Let A be a real symmetric n ×n matrix. We are concerned with the standard symmetric eigenvalue problem Ax = λx, where λ ∈ R is an eigenvalue of A and x ∈ R n is an eigenvector of A associated with λ. Solving this problem is important because it plays a significant role in scientific computing. For example, highly accurate computations of a few or all eigenvectors are crucial for large-scale electronic structure calculations in material physics [31,32], in which specific interior eigenvalues with associated eigenvectors need to be computed. Excellent overviews on the symmetric eigenvalue problem can be found in references [25,30].
Throughout this paper, I and O denote the identity and the zero matrices of appropriate size, respectively. For matrices, · 2 and · F denote the spectral norm and the Frobenius norm, respectively. Unless otherwise specified, · means · 2 . For legibility, if necessary, we distinguish between the approximate quantities and the computed results, e.g., for some quantity α we write α and α as an approximation of α and a computed result for α, respectively. This paper aims to develop a method to compute an accurate result of the eigenvalue decomposition where X is an n × n orthogonal matrix whose i-th columns are eigenvectors x (i) of A (called an eigenvector matrix) and D is an n × n diagonal matrix whose diagonal elements are the corresponding eigenvalues λ i ∈ R, i.e., D ii = λ i for i = 1, . . . , n.
For this purpose, we discuss an iterative refinement algorithm for (1) together with a convergence analysis. Throughout the paper, we assume that λ 1 ≤ λ 2 ≤ · · · ≤ λ n , and the columns of X are ordered correspondingly. Several efficient numerical algorithms for computing (1) have been developed such as the bisection method with inverse iteration, the QR algorithm, the divide-andconquer algorithm or the MRRR (multiple relatively robust representations) algorithm via Householder reduction, and the Jacobi algorithm. For details, see [10,11,14,15,25,30] and references cited therein. Since such algorithms have been studied actively in numerical linear algebra for decades, there are highly reliable implementations for them, such as LAPACK routines [4]. We stress that we do not intend to compete with such existing algorithms, i.e., we aim to develop an algorithm to improve the results obtained by any of them. Such a refinement algorithm is useful if the quality of the results is not satisfactory. In other words, our proposed algorithm can be regarded as a supplement to existing algorithms for computing (1).
For symmetric matrices, backward stable algorithms usually provide accurate approximate eigenvalues, which can be seen by the perturbation theory (cf., e.g., [25, p.208]). However, they do not necessarily give accurate approximate eigenvectors, since eigenvectors associated with clustered eigenvalues are very sensitive to perturbations (cf., e.g., [25,Theorem 11.7.1]). To see this, we take a small example For any ε, the eigenvalues and a normalized eigenvector matrix of A are (1) , x (2) , x (3) where λ 2 and λ 3 are nearly double eigenvalues for small ε. We set ε = 2 −25 ≈ 2.98 × 10 −8 and adopt the MATLAB built-in function eig in IEEE 754 binary64 floatingpoint arithmetic, which adopts the LAPACK routine DSYEV, to obtain an approximate eigenvector matrix X = [ x (1) , x (2) , x (3) ] by some backward stable algorithm. Then, the relative rounding error unit is 2 −53 ≈ 1.11 × 10 −16 , and we have X − X ≈ 5.46 × 10 −9 , which means 7 or 8 decimal digits are lost. More precisely, we have x (1) − x (1) ≈ 3.42 × 10 −16 and x (2) .46 × 10 −9 . Therefore, we still have room for improvement of approximate eigenvectors x (2) and x (3) in the range of binary64. To improve the accuracy of approximate eigenvectors efficiently, we treat all eigenvectors, since all of them correlate with each other in terms of orthogonality X T X = I and diagonality X T AX = D.
There exist several refinement algorithms for eigenvalue problems that are based on Newton's method for nonlinear equations (cf. e.g., [3,7,12,29]). Since this sort of algorithm is designed to improve eigenpairs (λ, x) ∈ R × R n individually, applying such a method to all eigenpairs requires O(n 4 ) arithmetic operations. To reduce the computational cost, one may consider preconditioning by Householder reduction of A in ordinary floating-point arithmetic such as T ≈ H T A H , where T is a tridiagonal matrix, and H is an approximate orthogonal matrix involving rounding errors. However, this is not a similarity transformation; thus, the original problem is slightly perturbed. Assuming H = H + Δ H for some orthogonal matrix H with where Δ A can be considered a backward error, and the accuracy of eigenpairs using H is limited by its lack of orthogonality.
A possible approach to achieve an accurate eigenvalue decomposition is to use multiple-precision arithmetic for Householder reduction and the subsequent algorithm. In general, however, we do not know in advance how much arithmetic precision is sufficient to obtain results with desired accuracy. Moreover, the use of such multipleprecision arithmetic for entire computations is often much more time-consuming than ordinary floating-point arithmetic such as IEEE 754 binary64. Therefore, we prefer the iterative refinement approach rather than simply using multiple-precision arithmetic.
Simultaneous iteration or Grassmann-Rayleigh quotient iteration [1] can potentially be used to refine eigenvalue decompositions. However, such methods require higher-precision arithmetic for the orthogonalization of approximate eigenvectors. Thus, we cannot restrict the higher-precision arithmetic to matrix multiplication. Wilkinson [30,Chapter 9, pp.637-647] explained the refinement of eigenvalue decompositions for general square matrices with reference to Jahn's method [6,19]. Such methods rely on a similarity transformation C := X −1 A X with high accuracy for a computed result X for X , which requires an accurate solution of the linear system XC = A X for C, and slightly breaks the symmetry of A due to nonorthogonality of X . Davies and Modi [9] proposed a direct method to complete the symmetric eigenvalue decomposition of nearly diagonal matrices. The Davies-Modi algorithm assumes that A is preconditioned to a nearly diagonal matrix such as X T A X , where X is an approximate orthogonal matrix involving rounding errors. Again, as mentioned above, this is not a similarity transformation, and a problem similar to the preconditioning by Householder reduction remains unsolved, i.e., the Davies-Modi algorithm does not refine the nonorthogonality of X . See Appendix for details regarding the relationship between the Davies-Modi algorithm and ours.
Given this background, we try to derive a simple and efficient iterative refinement algorithm to simultaneously improve the accuracy of all eigenvectors with quadratic convergence, which requires O(n 3 ) operations for each iteration. The proposed algorithm can be regarded as a variant of Newton's method, and therefore, its quadratic convergence is naturally derived.
For a computed eigenvector matrix X for (1), define E ∈ R n×n such that X = X (I + E). Here the existence of E requires that X is nonsingular, which is usually satisfied in practice. We assume that X is modestly accurate, e.g., it is obtained by some backward stable algorithm. Then, we aim to compute a sufficiently precise approximation E of E using the following two relations.
After obtaining E, we can update X := X (I + E). If necessary, we iterate the process as X (ν+1) := X (ν) (I + E (ν) ). Under some conditions, we prove E (ν) → O and X (ν) → X , where the convergence rates are quadratic. Our analysis provides a sufficient condition for the convergence of the iterations. The main benefit of our algorithm is their adaptivity, i.e., they allow the adaptive use of high-precision arithmetic until the desired accuracy of the computed results is achieved. In other words, the use of high-precision arithmetic is mandatory, however, it is primarily restricted to matrix multiplication, which accounts for most of the computational cost. Note that a multiple-precision arithmetic library, such as MPFR [21] with GMP [13], could be used for this purpose. With approaches such as quad-double precision arithmetic [16] and arbitrary precision arithmetic [26], multiple-precision arithmetic can be simulated using ordinary floating-point arithmetic. There are more specific approaches for high-precision matrix multiplication. For example, XBLAS (extra precise BLAS) [20] and other accurate and efficient algorithms for dot products [22,27,28] and matrix products [24] based on error-free transformations are available for practical implementation.
The remainder of the paper is organized as follows. In Sect. 2, we present a refinement algorithm for the symmetric eigenvalue decomposition. In Sect. 3, we provide a convergence analysis of the proposed algorithm. In Sect. 4, we present some numerical results showing the behavior and performance of the proposed algorithm. Finally, we conclude the paper in Sect. 5. To put our results into context, we review existing work in Appendix.
For simplicity, we basically handle only real matrices. The discussions in this paper can be extended to Hermitian matrices. Moreover, discussion of the standard symmetric eigenvalue problem can readily be extended to the generalized symmetric (or Hermitian) definite eigenvalue problem Ax = λBx, where A and B are real symmetric (or Hermitian) with B being positive definite.

Proposed algorithm
Let A = A T ∈ R n×n . The eigenvalues of A are denoted by λ i ∈ R, i = 1, . . . , n. Then A = max 1≤i≤n |λ i |. Let X ∈ R n×n denote an orthogonal eigenvector matrix comprising normalized eigenvectors of A, and let X denote an approximation of X computed by some numerical algorithm.
In the following, we derive our proposed algorithm. Define E ∈ R n×n such that X = X (I + E). The problem is how to derive a method for computing E. Here, we assume that which implies that I + E is nonsingular and so is X . First, we consider the orthogonality of the eigenvectors, that is, X T X = I . From this, we obtain I = X T X = (I + E) T X T X (I + E) and Using Neumann series expansion, we obtain Here, it follows from (3) that Substituting (5) into (4) yields where Here, it follows from (3) and (6) that In a similar way to Newton's method (cf. e.g., [5, p. 236]), dropping the second order term Δ 1 from (7) for linearization yields the following matrix equation for E = ( e i j ) ∈ R n×n : Next, we consider the diagonalization of A such that X T AX = D. From this, we obtain D = X T AX = (I + E) T X T A X (I + E) and Substitution of (5) into (10) yields where Here, it follows from (3) and (6) that Similar to the derivation of (9), omission of the second order term Δ 2 from (11) yields the following matrix equation for E = ( e i j ) ∈ R n×n and D = ( d i j ) ∈ R n×n : Here, we restrict D to be diagonal, which implies d i j = 0 for i = j. We put λ i := d ii as approximate eigenvalues. Hereafter, we assume that for the sake of simplicity. If this is not the case, we can find an appropriate permutation τ such that and we redefine λ j := λ τ ( j) and x ( j) := x (τ ( j)) for j = 1, 2, . . . , n. Note that the eigenvalue ordering is not essential in this paper. Summarizing the above discussion of (9) and (13), we obtain the system of matrix equations for E = ( e i j ) and D = ( where R = (r i j ) and S = (s i j ) are defined as R := I − X T X and S := X T A X , respectively. In fact, (14) is surprisingly easy to solve. First, we focus on the diagonal parts of E. From the first equation in (15), it follows that e ii = r ii /2 for 1 ≤ i ≤ n. Moreover, the second equation in (15) also yields (1 − 2 e ii ) λ i = (1 − r ii ) λ i = s ii . If r ii = 1, then we have Here, |r ii | 1 usually hold due to the assumption that the columns of X are normalized approximately. Note that (16) is equivalent to the Rayleigh quotient , where x (i) is the i-th column of X . Next, we focus on the off-diagonal parts of E. The combination of (15) and (16) yields which are simply 2 × 2 linear systems. Theoretically, for (i, j) satisfying λ i = λ j , the linear systems have unique solutions It is easy to see that, Thus, for some sufficiently small E , all of | e i j | for i = j as in (17) are also small whenever λ i = λ j for i = j. However, multiple eigenvalues require some care. For multiple eigenvalues λ i = λ j with i = j, noting the first equation in (15) with adding an artificial condition e i j = e ji , we choose e i j as Similar to the Newton-Schulz iteration, which is discussed in Appendix, the choice as (18) is quite reasonable for improving the orthogonality of X corresponding to x (i) and x ( j) , which are the i-th and j-th columns of X , respectively. Moreover, the accuracy of x (i) and x ( j) is improved as shown in Sect. 3.2. It remains to detect multiple eigenvalues. According to the perturbation theory in [23, Theorem 2], we have In our algorithm, we set In other words, we can identify the multiple eigenvalues using δ in (19) associated with X in some neighborhood of X , which is rigorously proven in Sect. 3.
Finally, we present a refinement algorithm for the eigenvalue decomposition of A in Algorithm 1, which is designed to be applied iteratively.
Let us consider the iterative refinement using Algorithm 1: Algorithm 1 RefSyEv: Refinement of approximate eigenvectors of a real symmetric matrix. Higher-precision arithmetic is required for all the computations except the line 6.
i j ) are the quantities calculated at the line 7 in Algorithm 1. Let E (ν) be defined such that X = X (ν) (I + E (ν) ). Then, E (ν) corresponds to the true error of X (ν) for each ν. Under some condition (the assumption (29) in Lemma 1), we obtain which will be shown in Sect. 3.1.

Remark 1
On practical computation of δ at the line 6 in Algorithm 1, one may prefer to use the Frobenius norm rather than the spectral norm, since the former is much easier to compute than the latter. For any real n × n matrix C, it is known (cf., e.g., [14, p.72]) that C 2 ≤ C F ≤ √ n C 2 . Thus, it may cause some overestimate of δ, and affect the behavior of the algorithm.
Remark 2 For the generalized symmetric definite eigenvalue problem Ax = λBx where A and B are real symmetric with B being positive definite, a similar algorithm can readily be derived by replacing line 2 in Algorithm 1 with R ← I − X T B X .

Convergence analysis
In this section, we prove quadratic convergence of Algorithm 1 under the assumption that the approximate solutions are modestly close to the exact solutions. Our analysis is divided into two parts. First, if we assume that A does not have multiple eigenvalues, then quadratic convergence is proven. Next, we consider a general analysis for any A.
Recall that the error of the approximate solution is expressed as X − X = X E in view of X = X (I + E). The refined approximate solution is X := X (I + E). It then follows that the error of the refined solution is expressed as follows: In addition, recall that E is the solution of the following equations: where where δ is defined as in (19), then (22) is not reflected for the computation of e i j and e ji . In this case, we choose e i j = e ji = r i j /2 from (21). Such an exceptional case is considered later in the subsection on multiple eigenvalues. Briefly, our goal is to prove quadratic convergence which corresponds to as X → X . We would like to prove that as E → 0. To investigate the relationship between E and E, let be defined as in (3) and Then, we see that from (7) and (8). In addition, we have from (11) and (12).

Simple eigenvalues
We focus on the situation where the eigenvalues of A are all simple and a given X is sufficiently close to an orthogonal eigenvector matrix X . First, we derive a sufficient condition that (17) is chosen for all (i, j), i = j in Algorithm 1.

Lemma 1
Let A be a real symmetric n × n matrix with simple eigenvalues λ i , i = 1, 2, . . . , n and a corresponding orthogonal eigenvector matrix X ∈ R n×n . For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in exact arithmetic, and D = diag( λ i ), R, S, and δ are the quantities calculated in Algorithm 1. Define E such that X = X (I + E). If then we obtain min Proof First, it is easy to see that from (21), (23), and (27). Hence, we obtain from the diagonal elements in (31). From (22) and (28), For the first term on the right-hand side, we see from (28). Moreover, for the second term, (24), (28), and (32). In addition, we see Combining this with (33), we obtain Hence, noting the definition of δ as in (30), we derive where the second inequality is due to (27), (28), and (35), and the last inequality is due to (29). In addition, from (26), < 1/100 in (29), and (34), we see Thus, we find that, for all (i, j), i = j, where the first inequality is due to (35), the second inequality is due to (29), the third inequality is due to the second inequality in (37), the fourth inequality is due to (37) and < 1/100 as in (29), and the last inequality is due to (36), respectively. Thus, we obtain (30).
The assumption (29) is crucial for the first iteration in the iterative process (20). In the following, monotone convergence of E (ν) is proven under the assumption (29) for a given initial guess X = X (0) and E = E (0) , so that E (ν+1) < E (ν) for ν = 0, 1, . . . . Thus, in the iterative refinement using Algorithm 1, Lemma 1 ensures (30) are consecutively satisfied for X (ν) in the iterative process. In addition, recall that our aim is to prove the quadratic convergence in the asymptotic regime. To this end, we derive a key lemma that shows (25).

Lemma 2 Let
A be a real symmetric n × n matrix with simple eigenvalues λ i , i = 1, 2, . . . , n and a corresponding orthogonal eigenvector matrix X ∈ R n×n . For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in exact arithmetic, and E is the quantity calculated in Algorithm 1. Define E such that X = X (I + E). Under the assumption (29) in Lemma 1, we have Proof Let , χ(·), and η(·) be defined as in Lemma 1,(26), and (34), respectively. Note that the diagonal elements of E − E are estimated as in (32). In the following, we estimate the off-diagonal elements of E − E. To this end, define Noting (28), (35), and (40), we see that the off-diagonal elements of |Δ 2 − Δ 2 | are less than 2η( ) A 3 . In other words, for i = j from (28), where Δ 2 (i, j) are the (i, j) elements of Δ 2 . In addition, from (22), it follows that From (31), (41) and (42), we have It then follows that Therefore, using (35), we obtain in (32) and (45). Therefore, we obtain Combining this with χ(0) = 3 proves (39). Moreover, we have from (29) and (37).
Using the above lemmas, we obtain a main theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given X is sufficiently close to X . Theorem 1 Let A be a real symmetric n × n matrix with simple eigenvalues λ i , i = 1, 2, . . . , n and a corresponding orthogonal eigenvector matrix X ∈ R n×n . For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in exact arithmetic, and X is the quantity calculated in Algorithm 1. Define E and E such that X = X (I + E) and X = X (I + E ), respectively. Under the assumption (29) in Lemma 1, we have lim sup Proof Noting X (I + E ) = X (I + E) (= X ) and X = X (I + E), where E is the quantity calculated in Algorithm 1, we have Therefore, we obtain Noting (46) and from (29) and (38), we have Finally, using (49) and (39), we obtain (48).
Our analysis indicates that Algorithm 1 may not be convergent for very large n. However, in practice, n is much smaller than 1/ for := E when the initial guess X is computed by some backward stable algorithm, e.g., in IEEE 754 binary64 arithmetic, unless A has nearly multiple eigenvalues. In such a situation, the iterative refinement works well.

Remark 3
For any δ ≥ δ, we can replace δ by δ in Algorithm 1. For example, such cases arise when the Frobenius norm is used for calculating δ instead of the spectral norm as mentioned in Remark 1. In such cases, the quadratic convergence of the algorithm can also be proven in a similar way as in this subsection by replacing the assumption (29) by where ρ := δ/δ ≥ 1. More specifically, in the convergence analysis, (36) is replaced with where the last inequality is due to the assumption (51). Therefore, | λ i − λ j | > δ also hold for i = j in the same manner as the proof of Lemma 1. As a result, (38) and (39) are also established even if δ is replaced by δ.

Multiple eigenvalues
Multiple eigenvalues require some care. If λ i ≈ λ j corresponding to multiple eigenvalues λ i = λ j , we might not be able to solve the linear system given by (21) and (22). Therefore, we use equation (21) only, i.e., e i j = e ji = r i j /2 if | λ i − λ j | ≤ δ.
The key idea of the proof of quadratic convergence below is to define an orthogonal eigenvector matrix Y ∈ V as follows. For any X α ∈ V, splitting X −1 X M into the first p rows V α ∈ R p× p and the remaining (n − p) rows W α ∈ R (n− p)× p , we have in view of the polar decomposition V α = C Q α , where C = V α V T α ∈ R p× p is symmetric and positive semidefinite and Q α ∈ R p× p is orthogonal. Note that, although X M Q for any orthogonal matrix Q ∈ R p× p is also an eigenvector matrix, the symmetric and positive semidefinite matrix C is independent of Q. In other words, we have where the last equality implies the polar decomposition of V α Q. In addition, if V α in (52) is nonsingular, the orthogonal matrix Q α is uniquely determined by the polar decomposition for a fixed X α ∈ V. To investigate the features of V α , we suppose that X is an exact eigenvector matrix. Then, noting X −1 = X T , we see that V α is an orthogonal matrix and C = I and V α = Q α in the polar decomposition of V α in (52). Thus, for any fixed X in some neighborhood of V, the above polar decomposition V α = C Q α is unique, where the nonsingular matrix V α depends on X α ∈ V. In the following, we consider any X in such a neighborhood of V.
Recall that, for any orthogonal matrix Q, the last equality in (53) is due to the polar decomposition V α Q = C(Q α Q). Hence, we have an eigenvector matrix which is independent of Q. Thus, we define the unique matrix Y := [X M Q T α , X S ] for all matrices in V, where Y depends only on X . Then, the corresponding error term F = ( f i j ) is uniquely determined as which implies f i j = f ji corresponding to the multiple eigenvalues λ i = λ j . Therefore, all (i, j). Now, let us consider the situation where X is an exact eigenvector matrix. In (52), noting X −1 = X T , we have W α = O and C = I in the polar decomposition of V α . Combining the features with (54), we see F = O for the exact eigenvector matrix X . Our aim is to prove F → 0 in the iterative refinement for X ≈ Y ∈ V, where Y depends on X . To this end, for the refined X as a result of Algorithm 1, we also define an eigenvector matrix Y ∈ V and F := (X ) −1 Y − I such that the submatrices of (X ) −1 Y corresponding to the multiple eigenvalues are symmetric and positive definite. Note that the eigenvector matrix Y is changed to Y corresponding to X after the refinement.
On the basis of the above observations, we consider general eigenvalue distributions. First of all, define the index sets M k , k = 1, 2, . . . , M for multiple eigenvalues {λ i } i∈M k satisfying the following conditions: (56) Using the above definitions, we obtain the following key lemma to prove quadratic convergence.

Lemma 3
Let A be a real symmetric n × n matrix with the eigenvalues λ i , i = 1, 2, . . . , n. Suppose A has multiple eigenvalues with index sets M k , k = 1, 2, . . . , M, satisfying (56). Let V be the set of n × n orthogonal eigenvector matrices of A. For a given nonsingular X ∈ R n×n , define E as In addition, define Y ∈ V such that, for all k, the n k × n k submatrices of X −1 Y corresponding to {λ i } i∈M k are symmetric and positive semidefinite. Moreover, define F ∈ E such that Y = X (I + F). Then, for any E α ∈ E, Furthermore, if there exists some Y such that F < 1, then Y ∈ V is uniquely determined.
Proof For any E α = (e (α) i j ) ∈ E, let E diag denote the block diagonal part of E α , where the n k × n k blocks of E diag correspond to n k multiple eigenvalues {λ i } i∈M k , i.e., i j if λ i = λ j 0 otherwise for all 1 ≤ i, j ≤ n, where λ 1 ≤ · · · ≤ λ n . Here, we consider the polar decomposition where H is a symmetric and positive semidefinite matrix and U α is an orthogonal matrix. Note that, similarly to C in (52), H is unique and independent of the choice of X α ∈ V that satisfies X α = X (I + E α ), whereas U α is not always uniquely determined. Then, we have from the definition of Y and where the first, second, third, and fourth equalities are consequences of the definition of F, (60), (57), and (59), respectively. Here, we see that Therefore, we obtain (61), (62), and (63), giving us (58). Finally, we prove that Y is unique if F < 1. In the above discussion, if X α is replaced with some Y ∈ V, then E α = F, and thus E diag ≤ E α = F < 1 in (59). Therefore, U α = I in (59) due to the uniqueness of the polar decomposition of the nonsingular matrix I + E diag , which implies the uniqueness of Y from (60).
Moreover, we have the next lemma, corresponding to Lemmas 1 and 2.
Lemma 4 Let A be a real symmetric n × n matrix with the eigenvalues λ i , i = 1, 2, . . . , n. Suppose A has multiple eigenvalues with index sets M k , k = 1, 2, . . . , M, satisfying (56). For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in exact arithmetic, and D = diag( λ i ), E, and δ are the quantities calculated in Algorithm 1. Let F be defined as in Lemma 3. Assume that Then, we obtain where χ(·) and η(·) are defined as in (26) and (34), respectively.
Proof First, we see that, for i = j corresponding to λ i = λ j , similar to the proof of (45) in Lemma 2. Concerning the multiple eigenvalues λ i = λ j , noting | λ i − λ j | ≤ δ and (55), we have Note that, for (i, j) corresponding to λ i = λ j , in the above two inequalities for the elements of F − E. Therefore, we have similar to the proof in Lemma 2.
On the basis of Lemmas 3 and 4 and Theorem 1, we see the quadratic convergence for a real symmetric matrix A that has multiple eigenvalues.
Theorem 2 Let A be a real symmetric n × n matrix with the eigenvalues λ i , i = 1, 2, . . . , n. Suppose A has multiple eigenvalues with index sets M k , k = 1, 2, . . . , M, satisfying (56). Let V be the set of n × n orthogonal eigenvector matrices of A. For a given nonsingular X ∈ R n×n , suppose that Algorithm 1 is applied to A and X in exact arithmetic, and X and δ are the quantities calculated in Algorithm 1. Let Y, Y ∈ V be defined such that, for all k, the n k × n k submatrices of X −1 Y and (X ) −1 Y corresponding to {λ i } i∈M k are symmetric and positive definite. Define F and F such that Y = X (I + F) and Y = X (I + F ), respectively. Furthermore, suppose that (64) in Lemma 4 is satisified for F := F . Then, we obtain Proof Let E and λ i , i = 1, 2, . . . , n, be the quantities calculated in Algorithm 1. First, note that (65) in Lemma 4 is established. Next, we define G := (X ) −1 Y − I . Then, we have similar to (49). Moreover, similar to (46), we have F from (65) and (64). Therefore, we see in a similar manner as the proof of (50). Using (58), we have Therefore, we obtain (66). Since we see (68) and (65), we obtain (67) from (69).
In the iterative refinement, Theorem 2 shows that the error term F is quadratically convergent to zero. Note that X is also convergent to some fixed eigenvector matrix X because Theorem 2 and (65) imply E / F → 1 as F → 0 in X := X (I + E), where F is quadratically convergent to zero.

Complex case
For a Hermitian matrix A ∈ C n×n , we must note that, for any unitary diagonal matrix U , XU is also an eigenvector matrix, i.e., there is a continuum of normalized eigenvector matrices in contrast to the real case. Related to this, note that R := I − X H X and S := X H A X , and (14) is replaced with E + E H = R in the complex case; thus, the diagonal elements e ii for i = 1, . . . , n are not uniquely determined in C. Now, select e ii = r ii /2 ∈ R for i = 1, . . . , n. Then, we can prove quadratic convergence using the polar decomposition in the same way as in the discussion of multiple eigenvalues in the real case. More precisely, we define a normalized eigenvector matrix Y as follows. First, we focus on the situation where all eigenvalues are simple. For a given nonsingular X , let Y be defined such that all diagonal elements of X −1 Y are positive real numbers. In addition, let F := X −1 Y − I . Then, we see the quadratic convergence of F in the same way as in Theorem 2.

Corollary 1
Let A ∈ C n×n be a Hermitian matrix whose eigenvalues λ i , i = 1, 2, . . . , n, are all simple. Let V be the set of n × n unitary eigenvector matrices of A. For a given nonsingular X ∈ C n×n , suppose that Algorithm 1 is applied to A and X , and a nonsingular X is obtained. Define Y, Y ∈ V such that all the diagonal elements of X −1 Y and (X ) −1 Y are positive real numbers. Furthermore, define F and F such that Y = X (I + F) and Y = X (I + F ), respectively. If then we obtain For a general Hermitian matrix having multiple eigenvalues, we define Y in the same manner as in Theorem 2, resulting in the following corollary.

Corollary 2
Let A ∈ C n×n be a Hermitian matrix with the eigenvalues λ i , i = 1, 2, . . . , n. Suppose A has multiple eigenvalues with index sets M k , k = 1, 2, . . . , M, satisfying (56). For a given nonsingular X ∈ C n×n , let Y , Y , F, F , and δ be defined as in Corollary 1. Suppose that, for all k, the n k × n k submatrices of X −1 Y and (X ) −1 Y corresponding to {λ i } i∈M k are Hermitian and positive definite. Furthermore, suppose that (70) is satisfied. Then, we obtain Note that X in Corollaries 1 and 2 is convergent to some fixed eigenvector matrix X of A in the same manner as in the real case.

Numerical results
Here, we present numerical results to demonstrate the effectiveness of the proposed algorithm (Algorithm 1). The numerical experiments discussed in this section were conducted using MATLAB R2016b on a PC with two CPUs (3.0 GHz Intel Xeon E5-2687W v4) and 1 TB of main memory. Let u denote the relative rounding error unit (u = 2 −24 for IEEE binary32, and u = 2 −53 for binary64). To realize multipleprecision arithmetic, we adopt Advanpix Multiprecision Computing Toolbox version 4.2.3 [2], which utilizes well-known, fast, and reliable multiple-precision arithmetic libraries including GMP and MPFR. In all cases, we use the MATLAB function norm for computing the spectral norms R and S − D in Algorithm 1 in binary64 arithmetic, and approximate A by max 1≤i≤n | λ i |. We discuss numerical experiments for some dozens of seeds for the random number generator, and all results are similar to those provided in this section. Therefore, we adopt the default seed as a typical example using the MATLAB command rng('default') for reproducibility.

Convergence property
Here, we confirm the convergence property of the proposed algorithm for various eigenvalue distributions.

Various eigenvalue distributions
We first generate real symmetric and positive definite matrices using the MATLAB function randsvd from Higham's test matrices [17] by the following MATLAB command.
>> A = gallery('randsvd',n,-cnd,mode); The eigenvalue distribution and condition number of A can be controlled by the input arguments mode ∈ {1, 2, 3, 4, 5} and cnd =: α ≥ 1, as follows: Here, κ(A) ≈ cnd for cnd < u −1 ≈ 10 16 . Note that for mode ∈ {1, 2}, there is a cluster of eigenvalues that are not exactly but nearly multiple due to rounding errors when A is generated using randsvd, i.e., all clustered eigenvalues are slightly separated. We start with small examples such as n = 10, since they are sufficiently illustrative to observe the convergence behavior of the algorithm. Moreover, we set cnd = 10 8 to generate moderately ill-conditioned problems in binary64. We compute X (0) as an initial approximate eigenvector matrix using the MATLAB function eig for the eigenvalue decomposition in binary64 arithmetic, which adopts the LAPACK routine DSYEV. Therefore, X (0) suffers from rounding errors. To see the behavior of Algorithm 1 precisely, we use multiple-precision arithmetic with sufficiently long precision to simulate the exact arithmetic in Algorithm 1. Then, we expect that Algorithm 1 (RefSyEv) works effectively for mode ∈ {3, 4, 5}, but does not for mode ∈ {1, 2}. We also use the built-in function eig in the multiple-precision toolbox to compute the eigenvalues λ i , i = 1, 2, . . . , n for evaluation. The results are shown in Fig. 1, which provides max 1≤i≤n | λ i −λ i |/|λ i | as the maximum relative error of the computed eigenvalues λ i , offdiag( X T A X ) / A as the diagonality of X T A X , I − X T X as the orthogonality of a computed eigenvector matrix X , and E , where E is a computed result of E in Algorithm 1. Here, offdiag(·) denotes the off-diagonal part. The horizontal axis shows the number of iterations ν of Algorithm 1.
In the case of mode ∈ {3, 4, 5}, all the quantities decrease quadratically in every iteration, i.e., we observe the quadratic convergence of Algorithm 1, as expected. On the other hand, in the case of mode ∈ {1, 2}, the algorithm fails to improve the accuracy of the initial approximate eigenvectors because the test matrices for mode ∈ {1, 2} have nearly multiple eigenvalues and the assumption (29) for the convergence of Algorithm 1 is not satisfied for these eigenvalues.

Multiple eigenvalues
We deal with the case where A has exactly multiple eigenvalues. It is not trivial to generate such matrices using of floating-point arithmetic because rounding errors slightly perturb the eigenvalues and multiplicity is broken. To avoid this, we utilize a We set n = 256 and k = 10, where A is exactly representable in binary64 format, and compute X (0) using eig in binary64 arithmetic. We apply Algorithm 1 to A and X (ν) for ν = 0, 1, 2, . . . . The results are shown in Fig. 2. As can be seen, Algorithm 1 converges quadratically for matrices with multiple eigenvalues, which is consistent with our convergence analysis (Theorem 2).

Computational speed
To evaluate the computational speed of the proposed algorithm (Algorithm 1), we compare the computing time of Algorithm 1 to that of an approach using multipleprecision arithmetic, which is called "MP-approach". We also compare the results of Algorithm 1 to those of a method combining the Davies-Modi algorithm [9] with the Newton-Schulz iteration [18,Section 8.3], which is called "NS + DM method" and explained in Appendix. Note that the timing should be observed for reference because the computing times for Algorithm 1 and the NS + DM method strongly depend on the implementation of accurate matrix multiplication. Thus, we adopt an efficient method proposed by Ozaki et al. [24] that utilizes fast matrix multiplication routines such as xGEMM in BLAS. In the multiple-precision toolbox, the MRRR algorithm [11] via Householder reduction is implemented sophisticatedly with parallelism to solve symmetric eigenvalue problems.  Table 2 Results for a pseudo-random real symmetric matrix, n = 500 As test matrices, we generate pseudo-random real symmetric n×n matrices with n ∈ {100, 500, 1000} using the MATLAB function randn such as B = randn(n) and A = B + B'. We use eig in binary64, and iteratively refine the computed eigenvectors using Algorithm 1 three times. We do it similarly for the NS+DM method. In the multiple-precision toolbox, we can control the arithmetic precision d in decimal digits using the command mp.Digits(d). Corresponding to the results of Algorithm 1, we adjust d for ν = 1, 2, 3. For timing fairness, we adopt d = max(d, 34) because the case for d = 34 is faster than that of d < 34. In Tables 1, 2 and 3, we show E and the measured computing time. For the NS+DM method, we show I − U , where U is the correction term in the Davies-Modi algorithm, so that I − U is comparable to E in Algorithm 1. As shown by E , Algorithm 1 quadratically improves the accuracy of the computed eigenvectors. Compared to I − U , the behavior of the NS+DM method is similar to those of Algorithm 1. As expected, Algorithm 1 is much faster than the NS+DM method. The accuracy of the results obtained using the MP-approach corresponds to the arithmetic precision d. As can be seen, Algorithm 1 is considerably faster than the MP-approach for greater n.

Conclusion
We have proposed a refinement algorithm (Algorithm 1) for eigenvalue decomposition of real symmetric matrices that can be applied iteratively. Quadratic convergence of the proposed algorithm was proven for a sufficiently accurate initial guess, similar to Newton's method. The proposed algorithm benefits from the availability of efficient matrix multiplication in higher-precision arithmetic. The numerical results demonstrate the excellent performance of the proposed algorithm in terms of convergence rate and measured computing time.
In practice, it is likely that ordinary floating-point arithmetic, such as IEEE 754 binary32 or binary64, is used for calculating an approximation X to an eigenvector matrix X of a given symmetric matrix A. As done in the numerical experiments, we can use X as an initial guess X (0) in Algorithm 1. However, if A has nearly multiple eigenvalues, it is difficult to obtain a sufficiently accurate X (0) in ordinary floatingpoint arithmetic such that Algorithm 1 works well. Our future work is to overcome this problem.

A Appendix: Relation to previous work
First, we briefly explain the Davies-Modi algorithm [9], which is relevant to this study. The Davies-Modi algorithm assumes that a real symmetric matrix A is transformed into a nearly diagonal symmetric matrix S = X T A X by some nearly orthogonal matrix X . Under that assumption, the method aims to compute an accurate eigenvector matrix U of S. The idea is seen in Jahn's method [6,19] as shown by Davies and Smith [8], which derives an SVD algorithm based on the Davies-Modi algorithm. To demonstrate the relationship between the Davies-Modi and the algorithm proposed in this work, we explain the Davies-Modi algorithm in the same manner as in reference [8]. Note that U T is written as for some skew-symmetric matrix Y . To compute Y with high accuracy, we define a diagonal matrix D S whose diagonal elements are the eigenvalues of S. Then, Here we define S 0 = diag(s 11 , . . . , s nn ), and S 1 = S − S 0 corresponds to the offdiagonal entries of S. Under a mild assumption S 1 = O( Y ), we see For the first-order approximation, we would like to solve To construct an orthogonal matrix e − Y 1 , suppose that Y 1 is a skew-symmetric matrix. Then, ( Y 1 ) i j = −( Y 1 ) ji = s i j /(s ii −s j j ) for i = j. To compute U with high accuracy, we construct the second-order approximation U T = I + Y 1 + Y 2 + Y 2 1 /2 for skewsymmetric matrices Y 1 and Y 2 . To compute Y 2 , letting we solve T 1 + Y 2 S 0 − S 0 Y 2 = O for a skew-symmetric matrix Y 2 . Note that Y 2 is computed in the same manner as Y 1 in (74). Thus, we obtain U T , where its secondorder approximation is proven rigorously. Since the Davies-Modi algorithm is based on matrix multiplication, it requires O(n 3 ) operations.
In the following, we discuss the refinement of X . Suppose X is computed using ordinary floating-point arithmetic. The main problem is that, since the Davies-Modi algorithm is applied to S = X T A X to compute the eigenvalue decomposition of S, X is assumed to be an orthogonal matrix. Generally, however, this is not the case because X suffers from rounding errors. Therefore, even if the computation of X T A X is performed in exact arithmetic, A and S are not similar unless X is orthogonal. Then, the eigenvalues of A are slightly perturbed, which may cause a significant change of the eigenvectors associated with clustered eigenvalues. To overcome this problem, we must refine the orthogonality of X as preconditioning in higher-precision arithmetic. Recall that we intend to restrict the use of higher-precision arithmetic to matrix multiplications for better computational efficiency. Hence, we may use the Newton-Schulz iteration [18,Section 8.3] such as where all the singular values of X are quadratically convergent to 1. After this reorthogonalization, for an eigenvector matrix U of T = Z T AZ, the columns of X := ZU are expected to be sufficiently accurate approximation of the eigenvectors of A. Of course, in practice, we must derive a certain method to compute an approximate eigenvector matrix U of T . If T is nearly diagonal, it is effective to apply a diagonal shift to T . Similarly, the Jacobi and Davies-Modi algorithms would be able to compute U accurately using higher-precision arithmetic. As a result, we can obtain a sufficiently accurate eigenvector matrix X := Z U . It is worth noting that our approach can reproduce the first-order approximation step in the Davies-Modi algorithm as follows. Since we see from (71), we obtain S = D S + D S Y 1 − Y 1 D S as the linearization process in the same manner as (13). It is easy to see that D S = diag(s 11 , . . . , s nn ) and Y 1 is equal to the solution in (74). Hence, if X in Algorithm 1 is an orthogonal matrix and the diagonal elements of S are sufficiently separated, then E is equal to the skew-symmetric matrix Y 1 in view of R = O. In other words, from the viewpoint of the Davies-Modi algorithm, the second-order approximation step is removed and the skew-symmetry of E is not assumed in Algorithm 1. Instead, condition (9) is integrated to improve orthogonality. If we assume E = E T in (9), we have which is equivalent to the Newton-Schulz iteration (76). In other words, if all eigenvalues are considered clustered, X is refined by the Newton-Schulz iteration. For orthogonality, we require the only relation (9), while an iteration function for polar decomposition must be an odd function, such as the Newton-Schulz iteration.
In summary, to combine the Newton-Schulz iteration and the Davies-Modi algorithm sophisticatedly, we remove unnecessary conditions from both, resulting in Algorithm 1, which can be considered an ideal method to refine orthogonality and diagonality simultaneously.