# Iterative refinement for symmetric eigenvalue decomposition

## Abstract

An efficient refinement algorithm is proposed for symmetric eigenvalue problems. The structure of the algorithm is straightforward, primarily comprising matrix multiplications. We show that the proposed algorithm converges quadratically if a modestly accurate initial guess is given, including the case of multiple eigenvalues. Our convergence analysis can be extended to Hermitian matrices. Numerical results demonstrate excellent performance of the proposed algorithm in terms of convergence rate and overall computational cost, and show that the proposed algorithm is considerably faster than a standard approach using multiple-precision arithmetic.

## Introduction

Let A be a real symmetric $$n \times n$$ matrix. We are concerned with the standard symmetric eigenvalue problem $$Ax = \lambda x$$, where $$\lambda \in \mathbb {R}$$ is an eigenvalue of A and $$x \in \mathbb {R}^{n}$$ is an eigenvector of A associated with $$\lambda$$. Solving this problem is important because it plays a significant role in scientific computing. For example, highly accurate computations of a few or all eigenvectors are crucial for large-scale electronic structure calculations in material physics [31, 32], in which specific interior eigenvalues with associated eigenvectors need to be computed. Excellent overviews on the symmetric eigenvalue problem can be found in references [25, 30].

Throughout this paper, I and O denote the identity and the zero matrices of appropriate size, respectively. For matrices, $$\Vert \cdot \Vert _{2}$$ and $$\Vert \cdot \Vert _{\mathrm {F}}$$ denote the spectral norm and the Frobenius norm, respectively. Unless otherwise specified, $$\Vert \cdot \Vert$$ means $$\Vert \cdot \Vert _{2}$$. For legibility, if necessary, we distinguish between the approximate quantities and the computed results, e.g., for some quantity $$\alpha$$ we write $$\widetilde{\alpha }$$ and $$\widehat{\alpha }$$ as an approximation of $$\alpha$$ and a computed result for $$\alpha$$, respectively.

This paper aims to develop a method to compute an accurate result of the eigenvalue decomposition

\begin{aligned} A = {X}{D}{X}^{\mathrm {T}}, \end{aligned}
(1)

where $${X}$$ is an $$n \times n$$ orthogonal matrix whose i-th columns are eigenvectors $$x_{(i)}$$ of A (called an eigenvector matrix) and $${D}$$ is an $$n \times n$$ diagonal matrix whose diagonal elements are the corresponding eigenvalues $$\lambda _{i} \in \mathbb {R}$$, i.e., $$D_{ii} = \lambda _{i}$$ for $$i = 1, \ldots , n$$. For this purpose, we discuss an iterative refinement algorithm for (1) together with a convergence analysis. Throughout the paper, we assume that

\begin{aligned} \lambda _{1} \le \lambda _{2} \le \cdots \le \lambda _{n}, \end{aligned}

and the columns of $${X}$$ are ordered correspondingly.

Several efficient numerical algorithms for computing (1) have been developed such as the bisection method with inverse iteration, the QR algorithm, the divide-and-conquer algorithm or the MRRR (multiple relatively robust representations) algorithm via Householder reduction, and the Jacobi algorithm. For details, see [10, 11, 14, 15, 25, 30] and references cited therein. Since such algorithms have been studied actively in numerical linear algebra for decades, there are highly reliable implementations for them, such as LAPACK routines . We stress that we do not intend to compete with such existing algorithms, i.e., we aim to develop an algorithm to improve the results obtained by any of them. Such a refinement algorithm is useful if the quality of the results is not satisfactory. In other words, our proposed algorithm can be regarded as a supplement to existing algorithms for computing (1).

For symmetric matrices, backward stable algorithms usually provide accurate approximate eigenvalues, which can be seen by the perturbation theory (cf., e.g., [25, p.208]). However, they do not necessarily give accurate approximate eigenvectors, since eigenvectors associated with clustered eigenvalues are very sensitive to perturbations (cf., e.g., [25, Theorem 11.7.1]). To see this, we take a small example

\begin{aligned} A = \begin{bmatrix} 1+\varepsilon&1&1+\varepsilon \\ 1&1&-1 \\ 1+\varepsilon&\ \ -1 \ \&1 + \varepsilon \\ \end{bmatrix} . \end{aligned}

For any $$\varepsilon$$, the eigenvalues and a normalized eigenvector matrix of A are

\begin{aligned} \left\{ \begin{array}{l} \lambda _{1} = -1 \\ \lambda _{2} = 2 \\ \lambda _{3} = 2 + 2\varepsilon \\ \end{array}\right. ,\quad {X} = [x_{(1)},x_{(2)},x_{(3)}] = \begin{bmatrix} 1/\sqrt{3}&1/\sqrt{6}&1/\sqrt{2} \\ -1/\sqrt{3}&2/\sqrt{6}&0 \\ -1/\sqrt{3}&\ \ -1/\sqrt{6} \ \&1/\sqrt{2} \\ \end{bmatrix}, \end{aligned}

where $$\lambda _{2}$$ and $$\lambda _{3}$$ are nearly double eigenvalues for small $$\varepsilon$$. We set $$\varepsilon = 2^{-25} \approx 2.98\times 10^{-8}$$ and adopt the MATLAB built-in function $$\mathsf {eig}$$ in IEEE 754 binary64 floating-point arithmetic, which adopts the LAPACK routine DSYEV, to obtain an approximate eigenvector matrix $$\widehat{X} = [\widehat{x}_{(1)},\widehat{x}_{(2)},\widehat{x}_{(3)}]$$ by some backward stable algorithm. Then, the relative rounding error unit is $$2^{-53} \approx 1.11\times 10^{-16}$$, and we have $$\Vert \widehat{X} - X\Vert \approx 5.46\times 10^{-9}$$, which means 7 or 8 decimal digits are lost. More precisely, we have $$\Vert \widehat{x}_{(1)} - x_{(1)}\Vert \approx 3.42\times 10^{-16}$$ and $$\Vert \widehat{x}_{(2)} - x_{(2)}\Vert \approx \Vert \widehat{x}_{(3)} - x_{(3)}\Vert \approx 5.46\times 10^{-9}$$. Therefore, we still have room for improvement of approximate eigenvectors $$\widehat{x}_{(2)}$$ and $$\widehat{x}_{(3)}$$ in the range of binary64. To improve the accuracy of approximate eigenvectors efficiently, we treat all eigenvectors, since all of them correlate with each other in terms of orthogonality $$X^{\mathrm {T}}X = I$$ and diagonality $$X^{\mathrm {T}}AX = D$$.

There exist several refinement algorithms for eigenvalue problems that are based on Newton’s method for nonlinear equations (cf. e.g., [3, 7, 12, 29]). Since this sort of algorithm is designed to improve eigenpairs $$(\lambda , x) \in \mathbb {R}\times \mathbb {R}^{n}$$ individually, applying such a method to all eigenpairs requires $$\mathcal {O}(n^{4})$$ arithmetic operations. To reduce the computational cost, one may consider preconditioning by Householder reduction of A in ordinary floating-point arithmetic such as $$T \approx \widehat{H}^{\mathrm {T}}A\widehat{H}$$, where T is a tridiagonal matrix, and $$\widehat{H}$$ is an approximate orthogonal matrix involving rounding errors. However, this is not a similarity transformation; thus, the original problem is slightly perturbed. Assuming $$\widehat{H} = H + \varDelta _{H}$$ for some orthogonal matrix H with $$\epsilon _{H} = \Vert \varDelta _{H}\Vert \ll 1$$, we have $$\widetilde{T} = \widehat{H}^{\mathrm {T}}A\widehat{H} = H^{\mathrm {T}}(A + \varDelta _{A})H$$ with $$\Vert \varDelta _{A}\Vert = \mathcal {O}(\Vert A\Vert \epsilon _{H})$$ and $$\epsilon _{H} \approx \Vert I - \widehat{H}^{\mathrm {T}}\widehat{H}\Vert$$, where $$\varDelta _{A}$$ can be considered a backward error, and the accuracy of eigenpairs using $$\widehat{H}$$ is limited by its lack of orthogonality.

A possible approach to achieve an accurate eigenvalue decomposition is to use multiple-precision arithmetic for Householder reduction and the subsequent algorithm. In general, however, we do not know in advance how much arithmetic precision is sufficient to obtain results with desired accuracy. Moreover, the use of such multiple-precision arithmetic for entire computations is often much more time-consuming than ordinary floating-point arithmetic such as IEEE 754 binary64. Therefore, we prefer the iterative refinement approach rather than simply using multiple-precision arithmetic.

Simultaneous iteration or Grassmann–Rayleigh quotient iteration  can potentially be used to refine eigenvalue decompositions. However, such methods require higher-precision arithmetic for the orthogonalization of approximate eigenvectors. Thus, we cannot restrict the higher-precision arithmetic to matrix multiplication. Wilkinson [30, Chapter 9, pp.637–647] explained the refinement of eigenvalue decompositions for general square matrices with reference to Jahn’s method [6, 19]. Such methods rely on a similarity transformation $$C := \widehat{X}^{-1}A\widehat{X}$$ with high accuracy for a computed result $$\widehat{X}$$ for $${X}$$, which requires an accurate solution of the linear system $$\widehat{X}C = A\widehat{X}$$ for C, and slightly breaks the symmetry of A due to nonorthogonality of $$\widehat{X}$$. Davies and Modi  proposed a direct method to complete the symmetric eigenvalue decomposition of nearly diagonal matrices. The Davies–Modi algorithm assumes that A is preconditioned to a nearly diagonal matrix such as $$\widehat{X}^{\mathrm {T}}A\widehat{X}$$, where $$\widehat{X}$$ is an approximate orthogonal matrix involving rounding errors. Again, as mentioned above, this is not a similarity transformation, and a problem similar to the preconditioning by Householder reduction remains unsolved, i.e., the Davies–Modi algorithm does not refine the nonorthogonality of $$\widehat{X}$$. See Appendix for details regarding the relationship between the Davies–Modi algorithm and ours.

Given this background, we try to derive a simple and efficient iterative refinement algorithm to simultaneously improve the accuracy of all eigenvectors with quadratic convergence, which requires $$\mathcal {O}(n^{3})$$ operations for each iteration. The proposed algorithm can be regarded as a variant of Newton’s method, and therefore, its quadratic convergence is naturally derived.

For a computed eigenvector matrix $$\widehat{X}$$ for (1), define $$E \in \mathbb {R}^{n\times n}$$ such that $${X} = \widehat{X}(I + E)$$. Here the existence of E requires that $$\widehat{X}$$ is nonsingular, which is usually satisfied in practice. We assume that $$\widehat{X}$$ is modestly accurate, e.g., it is obtained by some backward stable algorithm. Then, we aim to compute a sufficiently precise approximation $$\widetilde{E}$$ of E using the following two relations.

\begin{aligned} \left\{ \begin{array}{ll} {X}^{\mathrm {T}}{X} = I &{} \ \text {(orthogonality)} \\ {X}^{\mathrm {T}}A{X} = {D} &{} \ \text {(diagonality)} \\ \end{array}\right. \end{aligned}
(2)

After obtaining $$\widetilde{E}$$, we can update $$X' := \widehat{X}(I + \widetilde{E})$$. If necessary, we iterate the process as $$X^{(\nu + 1)} := X^{(\nu )}(I + \widetilde{E}^{(\nu )})$$. Under some conditions, we prove $$\widetilde{E}^{(\nu )} \rightarrow O$$ and $$X^{(\nu )} \rightarrow X$$, where the convergence rates are quadratic. Our analysis provides a sufficient condition for the convergence of the iterations.

The main benefit of our algorithm is their adaptivity, i.e., they allow the adaptive use of high-precision arithmetic until the desired accuracy of the computed results is achieved. In other words, the use of high-precision arithmetic is mandatory, however, it is primarily restricted to matrix multiplication, which accounts for most of the computational cost. Note that a multiple-precision arithmetic library, such as MPFR  with GMP , could be used for this purpose. With approaches such as quad-double precision arithmetic  and arbitrary precision arithmetic , multiple-precision arithmetic can be simulated using ordinary floating-point arithmetic. There are more specific approaches for high-precision matrix multiplication. For example, XBLAS (extra precise BLAS)  and other accurate and efficient algorithms for dot products [22, 27, 28] and matrix products  based on error-free transformations are available for practical implementation.

The remainder of the paper is organized as follows. In Sect. 2, we present a refinement algorithm for the symmetric eigenvalue decomposition. In Sect. 3, we provide a convergence analysis of the proposed algorithm. In Sect. 4, we present some numerical results showing the behavior and performance of the proposed algorithm. Finally, we conclude the paper in Sect. 5. To put our results into context, we review existing work in Appendix.

For simplicity, we basically handle only real matrices. The discussions in this paper can be extended to Hermitian matrices. Moreover, discussion of the standard symmetric eigenvalue problem can readily be extended to the generalized symmetric (or Hermitian) definite eigenvalue problem $$Ax = \lambda Bx$$, where A and B are real symmetric (or Hermitian) with B being positive definite.

## Proposed algorithm

Let $$A = A^{\mathrm {T}} \in \mathbb {R}^{n \times n}$$. The eigenvalues of A are denoted by $$\lambda _{i} \in \mathbb {R}$$, $$i = 1, \ldots , n$$. Then $$\Vert A\Vert = \max _{1 \le i \le n}|\lambda _{i}|$$. Let $${X} \in \mathbb {R}^{n \times n}$$ denote an orthogonal eigenvector matrix comprising normalized eigenvectors of A, and let $$\widehat{X}$$ denote an approximation of $${X}$$ computed by some numerical algorithm.

In the following, we derive our proposed algorithm. Define $$E \in \mathbb {R}^{n \times n}$$ such that $${X} = \widehat{X}(I + E)$$. The problem is how to derive a method for computing E. Here, we assume that

\begin{aligned} \epsilon := \Vert E\Vert < 1, \end{aligned}
(3)

which implies that $$I + E$$ is nonsingular and so is $$\widehat{X}$$.

First, we consider the orthogonality of the eigenvectors, that is, $${X}^{\mathrm {T}}{X} = I$$. From this, we obtain $$I = {X}^{\mathrm {T}}{X} = (I + E)^{\mathrm {T}}\widehat{X}^{\mathrm {T}}\widehat{X}(I + E)$$ and

\begin{aligned} (I + E)^{\mathrm {-T}}(I + E)^{-1} = \widehat{X}^{\mathrm {T}}\widehat{X} . \end{aligned}
(4)

Using Neumann series expansion, we obtain

\begin{aligned} (I + E)^{-1} = I - E + \varDelta _{E}, \quad \varDelta _{E} := \sum _{k = 2}^{\infty }(-E)^{k} . \end{aligned}
(5)

Here, it follows from (3) that

\begin{aligned} \Vert \varDelta _{E}\Vert \le \frac{\epsilon ^{2}}{1 - \epsilon } . \end{aligned}
(6)

Substituting (5) into (4) yields

\begin{aligned} E + E^{\mathrm {T}} = I - \widehat{X}^{\mathrm {T}}\widehat{X} + \varDelta _{1}, \end{aligned}
(7)

where $$\varDelta _{1} := \varDelta _{E} + \varDelta ^{\mathrm {T}}_{E} + (E - \varDelta _{E})^{\mathrm {T}}(E - \varDelta _{E})$$. Here, it follows from (3) and (6) that

\begin{aligned} \Vert \varDelta _{1}\Vert \le \frac{(3 - 2\epsilon )\epsilon ^{2}}{(1 - \epsilon )^{2}} . \end{aligned}
(8)

In a similar way to Newton’s method (cf. e.g., [5, p. 236]), dropping the second order term $$\varDelta _{1}$$ from (7) for linearization yields the following matrix equation for $$\widetilde{E} = (\widetilde{e}_{ij}) \in \mathbb {R}^{n \times n}$$:

\begin{aligned} \widetilde{E} + \widetilde{E}^{\mathrm {T}} = I - \widehat{X}^{\mathrm {T}}\widehat{X} . \end{aligned}
(9)

Next, we consider the diagonalization of A such that $${X}^{\mathrm {T}}A{X} = {D}$$. From this, we obtain $${D} = {X}^{\mathrm {T}}A{X} = (I + E)^{\mathrm {T}}\widehat{X}^{\mathrm {T}}A\widehat{X}(I + E)$$ and

\begin{aligned} (I + E)^{\mathrm {-T}}{D}(I + E)^{-1} = \widehat{X}^{\mathrm {T}}A\widehat{X} . \end{aligned}
(10)

Substitution of (5) into (10) yields

\begin{aligned} {D} - {D}E - E^{\mathrm {T}}{D} = \widehat{X}^{\mathrm {T}}A\widehat{X} + \varDelta _{2}, \end{aligned}
(11)

where $$\varDelta _{2} := -{D}\varDelta _{E} - \varDelta ^{\mathrm {T}}_{E}{D} - (E - \varDelta _{E})^{\mathrm {T}}D(E - \varDelta _{E})$$. Here, it follows from (3) and (6) that

\begin{aligned} \Vert \varDelta _{2}\Vert \le \frac{(3 - 2\epsilon )\epsilon ^{2}}{(1 - \epsilon )^{2}}\Vert D\Vert = \frac{(3 - 2\epsilon )\epsilon ^{2}}{(1 - \epsilon )^{2}}\Vert A\Vert . \end{aligned}
(12)

Similar to the derivation of (9), omission of the second order term $$\varDelta _{2}$$ from (11) yields the following matrix equation for $$\widetilde{E} = (\widetilde{e}_{ij}) \in \mathbb {R}^{n \times n}$$ and $$\widetilde{D} = (\widetilde{d}_{ij}) \in \mathbb {R}^{n \times n}$$:

\begin{aligned} \widetilde{D} - \widetilde{D}\widetilde{E} - \widetilde{E}^{\mathrm {T}}\widetilde{D} = \widehat{X}^{\mathrm {T}}A\widehat{X} . \end{aligned}
(13)

Here, we restrict $$\widetilde{D}$$ to be diagonal, which implies $$\widetilde{d}_{ij} = 0$$ for $$i \ne j$$. We put $$\widetilde{\lambda }_{i} := \widetilde{d}_{ii}$$ as approximate eigenvalues. Hereafter, we assume that

\begin{aligned} \widetilde{\lambda }_{1} \le \widetilde{\lambda }_{2} \le \cdots \le \widetilde{\lambda }_{n} \end{aligned}

for the sake of simplicity. If this is not the case, we can find an appropriate permutation $$\tau$$ such that

\begin{aligned} \widetilde{\lambda }_{\tau (1)} \le \widetilde{\lambda }_{\tau (2)} \le \cdots \le \widetilde{\lambda }_{\tau (n)}, \end{aligned}

and we redefine $$\widetilde{\lambda }_{j} := \widetilde{\lambda }_{\tau (j)}$$ and $$\widehat{x}_{(j)} := \widehat{x}_{(\tau (j))}$$ for $$j = 1, 2, \ldots , n$$. Note that the eigenvalue ordering is not essential in this paper.

Summarizing the above discussion of (9) and (13), we obtain the system of matrix equations for $$\widetilde{E} = (\widetilde{e}_{ij})$$ and $$\widetilde{D} = (\widetilde{d}_{ij}) = \mathrm {diag}(\widetilde{\lambda }_{i})$$ such that

\begin{aligned}&\left\{ \begin{array}{l} \widetilde{E} + \widetilde{E}^{\mathrm {T}} = R \\ \widetilde{D} - \widetilde{D}\widetilde{E} - \widetilde{E}^{\mathrm {T}}\widetilde{D} = S \\ \end{array}\right. \end{aligned}
(14)
\begin{aligned}&\quad \Leftrightarrow \left\{ \begin{array}{l} \widetilde{e}_{ij} + \widetilde{e}_{ji} = r_{ij} \\ \widetilde{d}_{ij} - \widetilde{\lambda }_{i}\widetilde{e}_{ij} - \widetilde{\lambda }_{j}\widetilde{e}_{ji} = s_{ij} \end{array}\right. \quad \text {for} \; \ 1 \le i, j \le n, \end{aligned}
(15)

where $$R = (r_{ij})$$ and $$S = (s_{ij})$$ are defined as $$R := I - \widehat{X}^{\mathrm {T}}\widehat{X}$$ and $$S := \widehat{X}^{\mathrm {T}}A\widehat{X}$$, respectively.

In fact, (14) is surprisingly easy to solve. First, we focus on the diagonal parts of $$\widetilde{E}$$. From the first equation in (15), it follows that $$\widetilde{e}_{ii} = r_{ii}/2$$ for $$1 \le i \le n$$. Moreover, the second equation in (15) also yields $$(1 - 2\widetilde{e}_{ii})\widetilde{\lambda }_{i} = (1 - r_{ii})\widetilde{\lambda }_{i} = s_{ii}$$. If $$r_{ii} \ne 1$$, then we have

\begin{aligned} \widetilde{\lambda }_{i} = \frac{s_{ii}}{1 - r_{ii}} \quad \text {for} \; \ 1 \le i \le n . \end{aligned}
(16)

Here, $$|r_{ii}| \ll 1$$ usually hold due to the assumption that the columns of $$\widehat{X}$$ are normalized approximately. Note that (16) is equivalent to the Rayleigh quotient $$\widetilde{\lambda }_{i} = (\widehat{x}_{(i)}^{\mathrm {T}}A\widehat{x}_{(i)}) / (\widehat{x}_{(i)}^{\mathrm {T}}\widehat{x}_{(i)})$$, where $$\widehat{x}_{(i)}$$ is the i-th column of $$\widehat{X}$$.

Next, we focus on the off-diagonal parts of $$\widetilde{E}$$. The combination of (15) and (16) yields

\begin{aligned} \left\{ \begin{array}{l} \widetilde{e}_{ij} + \widetilde{e}_{ji} = r_{ij} \\ \widetilde{\lambda }_{i}\widetilde{e}_{ij} + \widetilde{\lambda }_{j}\widetilde{e}_{ji} = -s_{ij} \\ \end{array}\right. \quad \text {for} \; \ 1 \le i, j \le n, \ i \ne j, \end{aligned}

which are simply $$2 \times 2$$ linear systems. Theoretically, for (ij) satisfying $$\widetilde{\lambda }_{i} \ne \widetilde{\lambda }_{j}$$, the linear systems have unique solutions

\begin{aligned} \widetilde{e}_{ij} = \frac{s_{ij} + \widetilde{\lambda }_{j}r_{ij}}{\widetilde{\lambda }_{j} - \widetilde{\lambda }_{i}} . \end{aligned}
(17)

It is easy to see that, for $$i\not = j$$, $$s_{ij} \rightarrow 0$$, $$r_{ij} \rightarrow 0$$, $$\widetilde{\lambda }_{i} \rightarrow \lambda _{i}$$, and $$\widetilde{\lambda }_{j} \rightarrow \lambda _{j}$$ as $$\Vert E \Vert \rightarrow 0$$. Thus, for some sufficiently small $$\Vert E \Vert$$, all of $$|\widetilde{e}_{ij}|$$ for $$i\not = j$$ as in (17) are also small whenever $$\lambda _{i} \ne \lambda _{j}$$ for $$i\not = j$$. However, multiple eigenvalues require some care. For multiple eigenvalues $$\lambda _{i}=\lambda _{j}$$ with $$i\not = j$$, noting the first equation in (15) with adding an artificial condition $$\widetilde{e}_{ij} = \widetilde{e}_{ji}$$, we choose $$\widetilde{e}_{ij}$$ as

\begin{aligned} \widetilde{e}_{ij} = \frac{r_{ij}}{2} . \end{aligned}
(18)

Similar to the Newton–Schulz iteration, which is discussed in Appendix, the choice as (18) is quite reasonable for improving the orthogonality of $$\widehat{X}$$ corresponding to $$\widehat{x}_{(i)}$$ and $$\widehat{x}_{(j)}$$, which are the i-th and j-th columns of $$\widehat{X}$$, respectively. Moreover, the accuracy of $$\widehat{x}_{(i)}$$ and $$\widehat{x}_{(j)}$$ is improved as shown in Sect. 3.2.

It remains to detect multiple eigenvalues. According to the perturbation theory in [23, Theorem 2], we have

\begin{aligned} |{\lambda }_{i} - \widetilde{\lambda }_{i}| \le \Vert S - \widetilde{D}\Vert _{2} + \Vert A\Vert _{2}\Vert R\Vert _{2} \quad \text {for} \; \ i=1,\ldots ,n . \end{aligned}

In our algorithm, we set

\begin{aligned} \delta := 2(\Vert S - \widetilde{D}\Vert _{2} + \Vert A\Vert _{2}\Vert R\Vert _{2}). \end{aligned}
(19)

For all $$i\not = j$$, $$|\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| \le \delta$$ whenever $$\lambda _{i}=\lambda _{j}$$. In addition, for a sufficiently small $$\Vert E \Vert$$, we see that, for all $$i\not = j$$, $$|\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| > \delta$$ whenever $$\lambda _{i}\not =\lambda _{j}$$ in view of $$\delta \rightarrow 0$$ as $$\Vert E \Vert \rightarrow 0$$. In other words, we can identify the multiple eigenvalues using $$\delta$$ in (19) associated with $$\widehat{X}$$ in some neighborhood of $${X}$$, which is rigorously proven in Sect. 3.

Finally, we present a refinement algorithm for the eigenvalue decomposition of A in Algorithm 1, which is designed to be applied iteratively.

Let us consider the iterative refinement using Algorithm 1:

\begin{aligned} X^{(0)} \leftarrow \widehat{X} \in \mathbb {R}^{n \times n}, \quad X^{(\nu + 1)} \leftarrow \mathsf {RefSyEv}(A, X^{(\nu )}) \quad \text {for} \; \ \nu = 0, 1, \ldots \ . \end{aligned}
(20)

Then, $$X^{(\nu + 1)} = X^{(\nu )}(I + \widetilde{E}^{(\nu )})$$ for $$\nu = 0, 1, \ldots$$, where $$\widetilde{E}^{(\nu )} = (\widetilde{e}_{ij}^{(\nu )})$$ are the quantities calculated at the line 7 in Algorithm 1. Let $$E^{(\nu )}$$ be defined such that $$X = X^{(\nu )}(I + E^{(\nu )})$$. Then, $$E^{(\nu )}$$ corresponds to the true error of $$X^{(\nu )}$$ for each $$\nu$$. Under some condition (the assumption (29) in Lemma 1), we obtain

\begin{aligned} \left\{ \begin{array}{ll} \Vert E^{(\nu + 1)}\Vert < \Vert E^{(\nu )}\Vert &{} \ \text {(monotone convergence)}\\ \Vert E^{(\nu + 1)}\Vert = \mathcal {O}(\Vert E^{(\nu )}\Vert ^{2}) &{} \ \text {(quadratic convergence)} \end{array}\right. , \end{aligned}

which will be shown in Sect. 3.1.

### Remark 1

On practical computation of $$\delta$$ at the line 6 in Algorithm 1, one may prefer to use the Frobenius norm rather than the spectral norm, since the former is much easier to compute than the latter. For any real $$n \times n$$ matrix C, it is known (cf., e.g., [14, p.72]) that $$\Vert C\Vert _{2} \le \Vert C\Vert _{\mathrm {F}} \le \sqrt{n}\Vert C\Vert _{2}$$. Thus, it may cause some overestimate of $$\delta$$, and affect the behavior of the algorithm. $$\square$$

### Remark 2

For the generalized symmetric definite eigenvalue problem $$Ax = \lambda Bx$$ where A and B are real symmetric with B being positive definite, a similar algorithm can readily be derived by replacing line 2 in Algorithm 1 with $$R \leftarrow I - \widehat{X}^{\mathrm {T}}B\widehat{X}$$. $$\square$$

## Convergence analysis

In this section, we prove quadratic convergence of Algorithm  1 under the assumption that the approximate solutions are modestly close to the exact solutions. Our analysis is divided into two parts. First, if we assume that A does not have multiple eigenvalues, then quadratic convergence is proven. Next, we consider a general analysis for any A.

Recall that the error of the approximate solution is expressed as $$\Vert \widehat{X} - X\Vert =\Vert \widehat{X}E\Vert$$ in view of $$X = \widehat{X}(I + E)$$. The refined approximate solution is $$X' := \widehat{X}(I + \widetilde{E})$$. It then follows that the error of the refined solution is expressed as follows:

\begin{aligned} \Vert \widehat{X}(I + \widetilde{E}) - X\Vert = \Vert \widehat{X}(\widetilde{E} - E)\Vert . \end{aligned}

In addition, recall that $$\widetilde{E}$$ is the solution of the following equations:

\begin{aligned}&\widetilde{E}+\widetilde{E}^{\mathrm {T}}=R, \end{aligned}
(21)
\begin{aligned}&\widetilde{D}-\widetilde{D}\widetilde{E}-\widetilde{E}^{\mathrm {T}}\widetilde{D}=S, \end{aligned}
(22)

where

\begin{aligned} R&:=I-\widehat{X}^{\mathrm {T}}\widehat{X}, \end{aligned}
(23)
\begin{aligned} S&:=\widehat{X}^{\mathrm {T}}A\widehat{X}. \end{aligned}
(24)

However, if $$\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}$$ such that $$|\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| \le \delta$$, where $$\delta$$ is defined as in (19), then (22) is not reflected for the computation of $$\widetilde{e}_{ij}$$ and $$\widetilde{e}_{ji}$$. In this case, we choose $$\widetilde{e}_{ij}=\widetilde{e}_{ji}=r_{ij}/2$$ from (21). Such an exceptional case is considered later in the subsection on multiple eigenvalues.

Briefly, our goal is to prove quadratic convergence

\begin{aligned} \Vert \widehat{X}(I+\widetilde{E})-X\Vert =\mathcal {O}(\Vert \widehat{X}-X\Vert ^2), \end{aligned}

which corresponds to

\begin{aligned} \Vert \widehat{X}(\widetilde{E}-E)\Vert =\mathcal {O}(\Vert \widehat{X}E\Vert ^2), \end{aligned}

as $$\widehat{X}\rightarrow X$$. We would like to prove that

\begin{aligned} \Vert \widetilde{E}-E\Vert =\mathcal {O}(\Vert E\Vert ^2) \end{aligned}
(25)

as $$\Vert E\Vert \rightarrow 0$$.

To investigate the relationship between E and $$\widetilde{E}$$, let $$\epsilon$$ be defined as in (3) and

\begin{aligned} \chi (\epsilon ):=\frac{3-2\epsilon }{(1-\epsilon )^2}. \end{aligned}
(26)

Then, we see that

\begin{aligned} E+E^{\mathrm {T}}=R+\varDelta _{1}, \quad \Vert \varDelta _{1}\Vert \le \chi (\epsilon )\epsilon ^{2} \end{aligned}
(27)

from (7) and (8). In addition, we have

\begin{aligned} D-DE-E^{\mathrm {T}}D=S+\varDelta _{2}, \quad \Vert \varDelta _{2} \Vert \le \chi (\epsilon )\Vert A\Vert \epsilon ^{2} \end{aligned}
(28)

from (11) and (12).

### Simple eigenvalues

We focus on the situation where the eigenvalues of A are all simple and a given $$\widehat{X}$$ is sufficiently close to an orthogonal eigenvector matrix X. First, we derive a sufficient condition that (17) is chosen for all (ij), $$i \ne j$$ in Algorithm  1.

### Lemma 1

Let A be a real symmetric $$n \times n$$ matrix with simple eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$ and a corresponding orthogonal eigenvector matrix $$X \in \mathbb {R}^{n \times n}$$. For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$ in exact arithmetic, and $$\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})$$, R, S, and $$\delta$$ are the quantities calculated in Algorithm 1. Define E such that $$X=\widehat{X}(I+E)$$. If

\begin{aligned} \epsilon := \Vert E\Vert < \min \left( \frac{\min _{i \not = j}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}
(29)

then we obtain

\begin{aligned} \min _{i \ne j}|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}| > \delta (= 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert )). \end{aligned}
(30)

### Proof

First, it is easy to see that

\begin{aligned} (E-\widetilde{E}) + ({E}-\widetilde{E})^{\mathrm {T}} = \varDelta _{1}, \quad \Vert \varDelta _{1}\Vert \le \chi (\epsilon )\epsilon ^2 \end{aligned}
(31)

from (21), (23), and (27). Hence, we obtain

\begin{aligned} |\widetilde{e}_{ii}-{e}_{ii}|\le \frac{\chi (\epsilon )}{2}\epsilon ^2 \quad \text {for} \; \ i=1,\ldots , n \end{aligned}
(32)

from the diagonal elements in (31). From (22) and (28), $$\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})$$ and $$D = \mathrm {diag}(\lambda _{i})$$ are determined as $$\widetilde{\lambda }_{i}=s_{ii}/(1-2\widetilde{e}_{ii})$$, $$\lambda _{i}=(s_{ii} + \varDelta _{2}(i,i))/(1-2e_{ii})$$. Thus, we have

\begin{aligned} \widetilde{\lambda }_{i}-{\lambda }_{i}= & {} \frac{s_{ii}(1-2e_{ii}) -(s_{ii}+\varDelta _{2}(i,i))(1-2\widetilde{e}_{ii})}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\nonumber \\= & {} -\frac{(1-2\widetilde{e}_{ii})\varDelta _{2}(i,i) + 2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\nonumber \\= & {} -\frac{\varDelta _{2}(i,i)}{1-2e_{ii}} - \frac{2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}. \end{aligned}
(33)

For the first term on the right-hand side, we see

\begin{aligned} \left| \frac{\varDelta _{2}(i,i)}{1-2e_{ii}} \right| \le \frac{\chi (\epsilon )}{1-2e_{ii}}\Vert A\Vert \epsilon ^2 \end{aligned}

from (28). Moreover, for the second term,

\begin{aligned} \left| \frac{2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})} \right| \le \frac{(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\Vert A\Vert \epsilon ^{2} \end{aligned}

from (24), (28), and (32). In addition, we see

\begin{aligned}&\frac{\chi (\epsilon )}{1-2e_{ii}} + \frac{(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1-2e_{ii})(1-2\widetilde{e}_{ii})} \nonumber \\&\quad =\frac{(1 - 2\widetilde{e}_{ii}) + (1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\chi (\epsilon )\nonumber \\&\quad \le \frac{2(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1 - 2\epsilon )(1 - 2\epsilon - \chi (\epsilon )\epsilon ^{2})} =: \eta (\epsilon ) . \end{aligned}
(34)

Combining this with (33), we obtain

\begin{aligned} |\widetilde{\lambda }_{i}-{\lambda }_{i}| \le \eta (\epsilon )\Vert A\Vert \epsilon ^2 \quad \text {for} \; \ i=1, \ldots , n. \end{aligned}
(35)

Hence, noting the definition of $$\delta$$ as in (30), we derive

\begin{aligned} \delta\le & {} 2(\Vert S-D\Vert +\Vert A\Vert \Vert R\Vert +\Vert D-\widetilde{D}\Vert ) \nonumber \\\le & {} 2(2(2\Vert A\Vert \epsilon +\chi (\epsilon )\Vert A\Vert \epsilon ^{2}) +\eta (\epsilon )\Vert A\Vert \epsilon ^{2})\nonumber \\\le & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon ) \epsilon )\Vert A\Vert \epsilon \nonumber \\< & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}, \end{aligned}
(36)

where the second inequality is due to (27), (28), and (35), and the last inequality is due to (29). In addition, from (26), $$\epsilon < 1/100$$ in (29), and (34), we see

\begin{aligned} \chi (\epsilon )=\frac{3-2\epsilon }{(1-\epsilon )^2}< 3.05, \quad \eta (\epsilon ) = \frac{2(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1 - 2\epsilon )(1 - 2\epsilon - \chi (\epsilon )\epsilon ^{2})} < 7. \end{aligned}
(37)

Thus, we find that, for all (ij), $$i \ne j$$,

\begin{aligned} |\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}|\ge & {} |{\lambda }_{i}-{\lambda }_{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2\\> & {} \min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|-2\eta (\epsilon )\cdot \frac{\min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|}{10n} \cdot \frac{1}{100}\\> & {} \left( 10-\frac{14}{100} \right) \cdot \frac{\min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|}{10n}\\> & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon ) \cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}\\> & {} \delta \ (= 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert )), \end{aligned}

where the first inequality is due to (35), the second inequality is due to (29), the third inequality is due to the second inequality in (37), the fourth inequality is due to (37) and $$\epsilon < 1/100$$ as in (29), and the last inequality is due to (36), respectively. Thus, we obtain (30). $$\square$$

The assumption (29) is crucial for the first iteration in the iterative process (20). In the following, monotone convergence of $$\Vert E^{(\nu )}\Vert$$ is proven under the assumption (29) for a given initial guess $$\widehat{X} = X^{(0)}$$ and $$E = E^{(0)}$$, so that $$\Vert E^{(\nu + 1)}\Vert < \Vert E^{(\nu )}\Vert$$ for $$\nu = 0, 1, \ldots$$ . Thus, in the iterative refinement using Algorithm 1, Lemma 1 ensures that $$|\widetilde{\lambda }_{i}^{(\nu )} - \widetilde{\lambda }_{j}^{(\nu )}| > \delta ^{(\nu )}$$ for all (ij), $$i \ne j$$ as in (30) are consecutively satisfied for $$X^{(\nu )}$$ in the iterative process. In addition, recall that our aim is to prove the quadratic convergence in the asymptotic regime. To this end, we derive a key lemma that shows (25).

### Lemma 2

Let A be a real symmetric $$n \times n$$ matrix with simple eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$ and a corresponding orthogonal eigenvector matrix $$X \in \mathbb {R}^{n \times n}$$. For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$ in exact arithmetic, and $$\widetilde{E}$$ is the quantity calculated in Algorithm 1. Define E such that $$X=\widehat{X}(I+E)$$. Under the assumption (29) in Lemma 1, we have

\begin{aligned}&\Vert \widetilde{E}-E\Vert < \frac{7}{10}\Vert E\Vert , \end{aligned}
(38)
\begin{aligned}&\limsup _{\Vert E\Vert \rightarrow 0} \frac{\Vert \widetilde{E}-E \Vert }{\Vert E \Vert ^{2}}\le \frac{6n\Vert A\Vert }{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|}. \end{aligned}
(39)

### Proof

Let $$\epsilon$$, $$\chi (\cdot )$$, and $$\eta (\cdot )$$ be defined as in Lemma 1, (26), and (34), respectively. Note that the diagonal elements of $$\widetilde{E}-{E}$$ are estimated as in (32). In the following, we estimate the off-diagonal elements of $$\widetilde{E}-{E}$$. To this end, define

\begin{aligned} \widetilde{\varDelta }_{2} := \widetilde{D}-\widetilde{D}E-E^{\mathrm {T}}\widetilde{D}-S. \end{aligned}
(40)

Noting (28), (35), and (40), we see that the off-diagonal elements of $$|\varDelta _{2}-\widetilde{\varDelta }_{2}|$$ are less than $$2\eta (\epsilon )\Vert A\Vert \epsilon ^3$$. In other words,

\begin{aligned} |\widetilde{\varDelta }_{2}(i,j)|\le & {} (\chi (\epsilon )+2\eta (\epsilon )\epsilon )\Vert A\Vert \epsilon ^2 \end{aligned}
(41)

for $$i\not =j$$ from (28), where $$\widetilde{\varDelta }_{2}(i,j)$$ are the (ij) elements of $$\widetilde{\varDelta }_{2}$$. In addition, from (22), it follows that

\begin{aligned} \widetilde{D}(E-\widetilde{E})+(E-\widetilde{E})^{\mathrm {T}}\widetilde{D} = -\widetilde{\varDelta }_{2}. \end{aligned}
(42)

From (31), (41) and (42), we have

\begin{aligned} (e_{ij}-\widetilde{e}_{ij})+(e_{ji}-\widetilde{e}_{ji})= & {} \epsilon _{1}, \quad |\epsilon _{1}|\le \chi (\epsilon )\epsilon ^{2}, \end{aligned}
(43)
\begin{aligned} {\widetilde{\lambda }_{i}}(e_{ij}-\widetilde{e}_{ij})+{\widetilde{\lambda }_{j}} (e_{ji}-\widetilde{e}_{ji})= & {} \epsilon _{2}, \quad |\epsilon _{2}|\le (\chi (\epsilon )+2\eta (\epsilon )\epsilon )\Vert A\Vert \epsilon ^{2}. \end{aligned}
(44)

It then follows that

\begin{aligned} e_{ij}-\widetilde{e}_{ij}=\frac{\epsilon _{2}-\widetilde{\lambda }_{j}\epsilon _{1}}{\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}},\quad e_{ji}-\widetilde{e}_{ji}=\frac{\epsilon _{2}-\widetilde{\lambda }_{i}\epsilon _{1}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}}. \end{aligned}

Therefore, using (35), we obtain

\begin{aligned} |\widetilde{e}_{ij}-e_{ij}|\le & {} \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|}\nonumber \\\le & {} \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2}. \end{aligned}
(45)

Note that $$\Vert \widetilde{E}-E\Vert ^{2} \le \Vert \widetilde{E}-E\Vert _{\mathrm {F}}^{2} = \sum _{i,j}|\widetilde{e}_{ij}-e_{ij}|^2$$ and

\begin{aligned} \frac{\chi (\epsilon )}{2}\epsilon ^2 \le \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2} \quad (i\not = j) \end{aligned}

in (32) and (45). Therefore, we obtain

\begin{aligned} \Vert \widetilde{E}-E\Vert \le \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})n\Vert A\Vert \epsilon ^2}{\min _{i\not =j}| \lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2}. \end{aligned}

Combining this with $$\chi (0)=3$$ proves (39). Moreover, we have

\begin{aligned} \Vert \widetilde{E}-E\Vert< \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\epsilon }{\frac{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|}{n\Vert A\Vert \epsilon }-\frac{2\eta (\epsilon ) \epsilon }{n}}< \frac{6.4\epsilon }{10-\frac{14}{100n}} < \frac{7}{10}\epsilon \end{aligned}
(46)

from (29) and (37). $$\square$$

Using the above lemmas, we obtain a main theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given $$\widehat{X}$$ is sufficiently close to X.

### Theorem 1

Let A be a real symmetric $$n \times n$$ matrix with simple eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$ and a corresponding orthogonal eigenvector matrix $$X \in \mathbb {R}^{n\times n}$$. For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$ in exact arithmetic, and $$X'$$ is the quantity calculated in Algorithm 1. Define E and $$E'$$ such that $$X=\widehat{X}(I+E)$$ and $$X=X'(I+E')$$, respectively. Under the assumption (29) in Lemma 1, we have

\begin{aligned}&\Vert E'\Vert < \frac{5}{7}\Vert E\Vert , \end{aligned}
(47)
\begin{aligned}&\limsup _{\Vert E\Vert \rightarrow 0}\frac{\Vert E'\Vert }{\Vert E\Vert ^{2}} \le \frac{6n\Vert A\Vert }{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|} . \end{aligned}
(48)

### Proof

Noting $$X'(I+E')=\widehat{X}(I+E) \ (=X)$$ and $$X' = \widehat{X}(I + \widetilde{E})$$, where $$\widetilde{E}$$ is the quantity calculated in Algorithm 1, we have

\begin{aligned} X'E'=\widehat{X}(E-\widetilde{E}) . \end{aligned}

Therefore, we obtain

\begin{aligned} E'=(I+\widetilde{E})^{-1}(E - \widetilde{E}). \end{aligned}
(49)

Noting (46) and

\begin{aligned} \Vert \widetilde{E}\Vert \le \Vert \widetilde{E}-E\Vert +\Vert E\Vert \le \frac{17}{10} \Vert E\Vert < \frac{1}{50} \end{aligned}

from (29) and (38), we have

\begin{aligned} \Vert E'\Vert \le \frac{\Vert \widetilde{E} - E\Vert }{1 - \Vert \widetilde{E}\Vert } < \frac{\frac{7}{10}\Vert E\Vert }{1-\frac{1}{50}}=\frac{5}{7}\Vert E\Vert . \end{aligned}
(50)

Finally, using (49) and (39), we obtain (48). $$\square$$

Our analysis indicates that Algorithm 1 may not be convergent for very large n. However, in practice, n is much smaller than $$1/\epsilon$$ for $$\epsilon :=\Vert E\Vert$$ when the initial guess $$\widehat{X}$$ is computed by some backward stable algorithm, e.g., in IEEE 754 binary64 arithmetic, unless A has nearly multiple eigenvalues. In such a situation, the iterative refinement works well.

### Remark 3

For any $$\widetilde{\delta } \ge \delta$$, we can replace $$\delta$$ by $$\widetilde{\delta }$$ in Algorithm 1. For example, such cases arise when the Frobenius norm is used for calculating $$\delta$$ instead of the spectral norm as mentioned in Remark 1. In such cases, the quadratic convergence of the algorithm can also be proven in a similar way as in this subsection by replacing the assumption (29) by

\begin{aligned} \epsilon := \Vert E\Vert < \min \left( \frac{1}{\rho }\cdot \frac{\min _{i \not = j}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}
(51)

where $$\rho := \widetilde{\delta }/\delta \ge 1$$. More specifically, in the convergence analysis, (36) is replaced with

\begin{aligned} \widetilde{\delta }= & {} \rho \cdot 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert ) \le \rho \cdot 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\Vert A\Vert \epsilon \\< & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}, \end{aligned}

where the last inequality is due to the assumption (51). Therefore, $$|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|>\widetilde{\delta }$$ also hold for $$i\not =j$$ in the same manner as the proof of Lemma 1. As a result, (38) and (39) are also established even if $$\delta$$ is replaced by $$\widetilde{\delta }$$. $$\square$$

### Multiple eigenvalues

Multiple eigenvalues require some care. If $$\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}$$ corresponding to multiple eigenvalues $$\lambda _{i}=\lambda _{j}$$, we might not be able to solve the linear system given by (21) and (22). Therefore, we use equation (21) only, i.e., $$\widetilde{e}_{ij}=\widetilde{e}_{ji}={r}_{ij}/2$$ if $$|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \delta$$.

To investigate the above exceptional process, let us consider a simple case as follows. Suppose $$\lambda _{i}$$, $$i \in \mathcal {M} := \{1, 2, \ldots , p\}$$ are multiple, i.e., $$\lambda _{1}=\cdots =\lambda _{p}<\lambda _{p+1}<\cdots < \lambda _{n}$$. Then, the eigenvectors corresponding to $$\lambda _{i}$$, $$1\le i \le p$$ are not unique. Suppose $$X = [x_{(1)},\ldots ,x_{(n)}] \in \mathbb {R}^{n\times n}$$ is an orthogonal eigenvector matrix of A, where $$x_{(i)}$$ are the normalized eigenvectors corresponding to $$\lambda _{i}$$ for $$i = 1, \ldots , n$$. Define $$X_{\mathcal {M}} := [x_{(1)},\ldots ,x_{(p)}] \in \mathbb {R}^{n\times p}$$ and $$X_{\mathcal {S}} := [x_{(p+1)},\ldots ,x_{(n)}] \in \mathbb {R}^{n\times (n - p)}$$. Then, the columns of $$X_{\mathcal {M}}Q$$ are also the eigenvectors of A for any orthogonal matrix $$Q \in \mathbb {R}^{p\times p}$$. Thus, let $$\mathcal {V}$$ be the set of $$n \times n$$ orthogonal eigenvector matrices of A and $$\mathcal {E}:= \{ \widehat{X}^{-1}X-I : X \in \mathcal {V}\}$$ for a given nonsingular $$\widehat{X}$$.

The key idea of the proof of quadratic convergence below is to define an orthogonal eigenvector matrix $$Y \in \mathcal {V}$$ as follows. For any $$X_{\alpha } \in \mathcal {V}$$, splitting $$\widehat{X}^{-1}X_{\mathcal {M}}$$ into the first p rows $$V_{\alpha } \in \mathbb {R}^{p \times p}$$ and the remaining $$(n - p)$$ rows $$W_{\alpha } \in \mathbb {R}^{(n - p) \times p}$$, we have

\begin{aligned} \widehat{X}^{-1}X_{\mathcal {M}} = \left[ \begin{array}{c} V_{\alpha } \\ W_{\alpha } \\ \end{array}\right] = \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] Q_{\alpha } \end{aligned}
(52)

in view of the polar decomposition $$V_{\alpha } = CQ_{\alpha }$$, where $$C=\sqrt{V_{\alpha }V_{\alpha }^{\mathrm {T}}} \in \mathbb {R}^{p \times p}$$ is symmetric and positive semidefinite and $$Q_{\alpha } \in \mathbb {R}^{p \times p}$$ is orthogonal. Note that, although $$X_{\mathcal {M}}Q$$ for any orthogonal matrix $$Q \in \mathbb {R}^{p \times p}$$ is also an eigenvector matrix, the symmetric and positive semidefinite matrix C is independent of Q. In other words, we have

\begin{aligned} \widehat{X}^{-1}(X_{\mathcal {M}}Q) = (\widehat{X}^{-1}X_{\mathcal {M}})Q = \left[ \begin{array}{c} V_{\alpha }Q \\ W_{\alpha }Q \\ \end{array}\right] = \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] Q_{\alpha }Q, \end{aligned}
(53)

where the last equality implies the polar decomposition of $$V_{\alpha }{Q}$$. In addition, if $$V_{\alpha }$$ in (52) is nonsingular, the orthogonal matrix $$Q_{\alpha }$$ is uniquely determined by the polar decomposition for a fixed $$X_{\alpha } \in \mathcal {V}$$. To investigate the features of $$V_{\alpha }$$, we suppose that $$\widehat{X}$$ is an exact eigenvector matrix. Then, noting $$\widehat{X}^{-1}=\widehat{X}^\mathrm{T}$$, we see that $$V_{\alpha }$$ is an orthogonal matrix and $$C=I$$ and $$V_{\alpha }=Q_{\alpha }$$ in the polar decomposition of $$V_{\alpha }$$ in (52). Thus, for any fixed $$\widehat{X}$$ in some neighborhood of $$\mathcal {V}$$, the above polar decomposition $$V_{\alpha } = CQ_{\alpha }$$ is unique, where the nonsingular matrix $$V_{\alpha }$$ depends on $$X_{\alpha } \in \mathcal {V}$$. In the following, we consider any $$\widehat{X}$$ in such a neighborhood of $$\mathcal {V}$$.

Recall that, for any orthogonal matrix Q, the last equality in (53) is due to the polar decomposition $$V_{\alpha }{Q}= C(Q_{\alpha }Q)$$. Hence, we have an eigenvector matrix

\begin{aligned} (X_{\mathcal {M}}Q)(Q_{\alpha }Q)^\mathrm{T} = X_{\mathcal {M}}Q_{\alpha }^\mathrm{T} = \widehat{X} \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] , \end{aligned}

which is independent of Q. Thus, we define the unique matrix $$Y := [X_{\mathcal {M}}Q_{\alpha }^{\mathrm {T}}, X_{\mathcal {S}}]$$ for all matrices in $$\mathcal {V}$$, where Y depends only on $$\widehat{X}$$. Then, the corresponding error term $$F = (f_{ij})$$ is uniquely determined as

\begin{aligned} F&:= \widehat{X}^{-1}Y-I = [\widehat{X}^{-1}X_{\mathcal {M}}Q_{\alpha }^{\mathrm {T}}, * \ ] - I \nonumber \\&= \left[ \begin{array}{cc} C - I &{} * \\ * &{} * \\ \end{array}\right] , \end{aligned}
(54)

which implies $$f_{ij}=f_{ji}$$ corresponding to the multiple eigenvalues $$\lambda _{i}=\lambda _{j}$$. Therefore,

\begin{aligned} f_{ij}=f_{ji}=\frac{r_{ij}+\varDelta _{1}(i,j)}{2} \end{aligned}
(55)

from (27), where $$\varDelta _{1}(i,j)$$ denote (ij) elements of $$\varDelta _{1}$$ for all (ij). Now, let us consider the situation where $$\widehat{X}$$ is an exact eigenvector matrix. In (52), noting $$\widehat{X}^{-1}=\widehat{X}^{\mathrm {T}}$$, we have $$W_{\alpha }=O$$ and $$C=I$$ in the polar decomposition of $$V_{\alpha }$$. Combining the features with (54), we see $$F=O$$ for the exact eigenvector matrix $$\widehat{X}$$.

Our aim is to prove $$\Vert F\Vert \rightarrow 0$$ in the iterative refinement for $$\widehat{X}\approx Y \in \mathcal {V}$$, where Y depends on $$\widehat{X}$$. To this end, for the refined $$X'$$ as a result of Algorithm 1, we also define an eigenvector matrix $$Y'\in \mathcal {V}$$ and $$F':=(X')^{-1}Y'-I$$ such that the submatrices of $$(X')^{-1}Y'$$ corresponding to the multiple eigenvalues are symmetric and positive definite. Note that the eigenvector matrix Y is changed to $$Y'$$ corresponding to $$X'$$ after the refinement.

On the basis of the above observations, we consider general eigenvalue distributions. First of all, define the index sets $$\mathcal {M}_{k}$$, $$k = 1, 2, \ldots , M$$ for multiple eigenvalues $$\{ {\lambda }_{i} \}_{i \in \mathcal {M}_{k}}$$ satisfying the following conditions:

\begin{aligned} \left\{ \begin{array}{l} \text {(a)} \ \mathcal {M}_{k} \subseteq \{1,2,\ldots ,n\} \ \text {with} \ n_{k} := |\mathcal {M}_{k}| \ge 2 \\ \text {(b)} \ {\lambda }_{i} = {\lambda }_{j}, \ \forall i,j \in \mathcal {M}_{k} \\ \text {(c)} \ {\lambda }_{i} \not = {\lambda }_{j}, \ \forall i \in \mathcal {M}_{k}, \ \forall j \in \{1,2,\ldots ,n\}\,\backslash \, \mathcal {M}_{k} \end{array}\right. . \end{aligned}
(56)

Using the above definitions, we obtain the following key lemma to prove quadratic convergence.

### Lemma 3

Let A be a real symmetric $$n \times n$$ matrix with the eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$. Suppose A has multiple eigenvalues with index sets $$\mathcal {M}_{k}$$, $$k = 1, 2, \ldots , M$$, satisfying (56). Let $$\mathcal {V}$$ be the set of $$n \times n$$ orthogonal eigenvector matrices of A. For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, define $$\mathcal {E}$$ as

\begin{aligned} \mathcal {E}:= \{ \widehat{X}^{-1}X-I : X\in \mathcal {V}\}. \end{aligned}
(57)

In addition, define $$Y\in \mathcal {V}$$ such that, for all k, the $$n_{k}\times n_{k}$$ submatrices of $$\widehat{X}^{-1}Y$$ corresponding to $$\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}$$ are symmetric and positive semidefinite. Moreover, define $$F \in \mathcal {E}$$ such that $$Y=\widehat{X}(I+F)$$. Then, for any $$E_{\alpha } \in \mathcal {E}$$,

\begin{aligned} \Vert F\Vert \le 3\Vert E_{\alpha }\Vert . \end{aligned}
(58)

Furthermore, if there exists some Y such that $$\Vert F\Vert <1$$, then $$Y\in \mathcal {V}$$ is uniquely determined.

### Proof

For any $$E_{\alpha } = (e_{ij}^{(\alpha )}) \in \mathcal {E}$$, let $$E_{\mathrm {diag}}$$ denote the block diagonal part of $$E_{\alpha }$$, where the $$n_{k} \times n_{k}$$ blocks of $$E_{\mathrm {diag}}$$ correspond to $$n_{k}$$ multiple eigenvalues $$\{ {\lambda }_{i} \}_{i \in \mathcal {M}_{k}}$$, i.e.,

\begin{aligned} E_{\mathrm {diag}}(i,j):= \left\{ \begin{array}{ll} e_{ij}^{(\alpha )} &{} \ \text {if} \ \lambda _{i}=\lambda _{j} \\ 0 &{} \ \text {otherwise} \end{array}\right. \end{aligned}

for all $$1\le i,j \le n$$, where $$\lambda _{1} \le \cdots \le \lambda _{n}$$. Here, we consider the polar decomposition

\begin{aligned} I+E_{\mathrm {diag}}=:HU_{\alpha }, \end{aligned}
(59)

where H is a symmetric and positive semidefinite matrix and $$U_{\alpha }$$ is an orthogonal matrix. Note that, similarly to C in (52), H is unique and independent of the choice of $$X_{\alpha } \in \mathcal {V}$$ that satisfies $$X_{\alpha }=\widehat{X}(I+E_{\alpha })$$, whereas $$U_{\alpha }$$ is not always uniquely determined. Then, we have

\begin{aligned} Y=X_{\alpha }U_{\alpha }^{\mathrm {T}} \end{aligned}
(60)

from the definition of Y and

\begin{aligned} F= & {} \widehat{X}^{-1}Y-I\nonumber \\= & {} \widehat{X}^{-1}X_{\alpha }U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }+I)U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }-E_{\mathrm {diag}}+HU_{\alpha })U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }-E_{\mathrm {diag}})U_{\alpha }^{\mathrm {T}}+H-I, \end{aligned}
(61)

where the first, second, third, and fourth equalities are consequences of the definition of F, (60), (57), and (59), respectively. Here, we see that

\begin{aligned} \Vert H-I\Vert \le \Vert E_{\mathrm {diag}}\Vert \end{aligned}
(62)

because all the eigenvalues of H are the singular values of $$HU_{\alpha }$$ in (59) that range over the interval $$[1-\Vert E_{\mathrm {diag}}\Vert , 1+\Vert E_{\mathrm {diag}}\Vert ]$$ from the Weyl’s inequality for singular values. In addition, note that

\begin{aligned} \Vert E_{\mathrm {diag}}\Vert \le \Vert E_{\alpha }\Vert . \end{aligned}
(63)

Therefore, we obtain

\begin{aligned} \Vert F\Vert =\Vert (E_{\alpha }-E_{\mathrm {diag}})U_{\alpha }^{\mathrm {T}}+(H-I)\Vert \le 3\Vert E_{\alpha }\Vert \end{aligned}

from (61), (62), and (63), giving us (58).

Finally, we prove that Y is unique if $$\Vert F\Vert <1$$. In the above discussion, if $$X_{\alpha }$$ is replaced with some $$Y \in \mathcal {V}$$, then $$E_{\alpha }=F$$, and thus $$\Vert E_{\mathrm {diag}}\Vert \le \Vert E_{\alpha }\Vert = \Vert F\Vert <1$$ in (59). Therefore, $$U_{\alpha }=I$$ in (59) due to the uniqueness of the polar decomposition of the nonsingular matrix $$I+E_\mathrm{diag}$$, which implies the uniqueness of Y from (60). $$\square$$

Moreover, we have the next lemma, corresponding to Lemmas 1 and 2.

### Lemma 4

Let A be a real symmetric $$n \times n$$ matrix with the eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$. Suppose A has multiple eigenvalues with index sets $$\mathcal {M}_{k}$$, $$k = 1, 2, \ldots , M$$, satisfying (56). For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$ in exact arithmetic, and $$\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})$$, $$\widetilde{E}$$, and $$\delta$$ are the quantities calculated in Algorithm 1. Let F be defined as in Lemma 3. Assume that

\begin{aligned} \epsilon _{F} := \Vert F\Vert 0 < \frac{1}{3}\cdot \min \left( \frac{\min _{\lambda _{i}\not = \lambda _{j}}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) . \end{aligned}
(64)

Then, we obtain

\begin{aligned} \Vert F-\widetilde{E}\Vert \le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})n\Vert A\Vert \epsilon _{F}^2}{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2}, \end{aligned}
(65)

where $$\chi (\cdot )$$ and $$\eta (\cdot )$$ are defined as in (26) and (34), respectively.

### Proof

First, we see that, for $$i\not =j$$ corresponding to $$\lambda _{i}\not =\lambda _{j}$$,

\begin{aligned}&|\widetilde{e}_{ij}-f_{ij}|\le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\Vert A\Vert \epsilon _{F}^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2}, \end{aligned}

similar to the proof of (45) in Lemma 2. Concerning the multiple eigenvalues $$\lambda _{i}=\lambda _{j}$$, noting $$| \widetilde{\lambda }_{i} - \widetilde{\lambda }_{j} | \le \delta$$ and (55), we have

\begin{aligned} |\widetilde{e}_{ij}-f_{ij}|\le \frac{\chi (\epsilon _{F})}{2}\epsilon _{F}^2\ \text {for }(i,j)\text { corresponding to }\lambda _{i}=\lambda _{j}. \end{aligned}

Note that, for (ij) corresponding to $$\lambda _{i}\not =\lambda _{j}$$,

\begin{aligned} \frac{\chi (\epsilon _{F})}{2}\epsilon _{F}^2 \le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\Vert A\Vert \epsilon _{F}^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2} \end{aligned}

in the above two inequalities for the elements of $$F-\widetilde{E}$$. Therefore, we have

\begin{aligned} \Vert F-\widetilde{E}\Vert\le & {} \Vert F-\widetilde{E}\Vert _{\mathrm {F}} = \sqrt{\sum _{1\le i,j \le n}|f_{ij}-\widetilde{e}_{ij}|^2}\\\le & {} \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})n\Vert A\Vert \epsilon _{F}^2}{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2} \end{aligned}

similar to the proof in Lemma 2. $$\square$$

On the basis of Lemmas 3 and 4 and Theorem 1, we see the quadratic convergence for a real symmetric matrix A that has multiple eigenvalues.

### Theorem 2

Let A be a real symmetric $$n \times n$$ matrix with the eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$. Suppose A has multiple eigenvalues with index sets $$\mathcal {M}_{k}$$, $$k = 1, 2, \ldots , M$$, satisfying (56). Let $$\mathcal {V}$$ be the set of $$n \times n$$ orthogonal eigenvector matrices of A. For a given nonsingular $$\widehat{X} \in \mathbb {R}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$ in exact arithmetic, and $$X'$$ and $$\delta$$ are the quantities calculated in Algorithm 1. Let $$Y, Y' \in \mathcal {V}$$ be defined such that, for all k, the $$n_{k}\times n_{k}$$ submatrices of $$\widehat{X}^{-1}Y$$ and $$(X')^{-1}Y'$$ corresponding to $$\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}$$ are symmetric and positive definite. Define F and $$F'$$ such that $$Y=\widehat{X}(I+F)$$ and $$Y'=X'(I+F')$$, respectively. Furthermore, suppose that (64) in Lemma 4 is satisified for $$\epsilon _{F} := \Vert F\Vert$$. Then, we obtain

\begin{aligned}&\Vert F'\Vert < \frac{5}{7}\Vert F\Vert , \end{aligned}
(66)
\begin{aligned}&\limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}} \le 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}| \lambda _{i}-\lambda _{j}|}\right) . \end{aligned}
(67)

### Proof

Let $$\widetilde{E}$$ and $$\widetilde{\lambda }_{i}$$, $$i = 1, 2, \ldots , n$$, be the quantities calculated in Algorithm 1. First, note that (65) in Lemma 4 is established. Next, we define $$G:=(X')^{-1}Y-I$$. Then, we have

\begin{aligned} G=(I+\widetilde{E})^{-1}(F-\widetilde{E}) \end{aligned}
(68)

similar to (49). Moreover, similar to (46), we have

\begin{aligned} \Vert \widetilde{E}-F \Vert< & {} \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\epsilon _{F}}{\frac{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}{n\Vert A\Vert \epsilon _{F}}-\frac{2\eta (\epsilon _{F}) \epsilon _{F}}{n}}\\< & {} \frac{6.4\epsilon _{F}}{3(10-\frac{14}{100n})} < \frac{1}{3}\cdot \frac{7}{10} \epsilon _{F} \end{aligned}

from (65) and (64). Therefore, we see

\begin{aligned} \Vert G\Vert <\frac{1}{3}\cdot \frac{5}{7}\epsilon _{F} \end{aligned}

in a similar manner as the proof of (50). Using (58), we have

\begin{aligned} \Vert F'\Vert \le 3\Vert G\Vert . \end{aligned}
(69)

Therefore, we obtain (66). Since we see

\begin{aligned} \limsup _{\epsilon _{F} \rightarrow 0}\frac{\Vert G\Vert }{\Vert F\Vert ^{2}} \le \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|} \end{aligned}

from (68) and (65), we obtain (67) from (69). $$\square$$

In the iterative refinement, Theorem 2 shows that the error term $$\Vert F\Vert$$ is quadratically convergent to zero. Note that $$\widehat{X}$$ is also convergent to some fixed eigenvector matrix X because Theorem 2 and (65) imply $$\Vert \widetilde{E}\Vert /\Vert F\Vert \rightarrow 1$$ as $$\Vert F\Vert \rightarrow 0$$ in $$X':=\widehat{X}(I+\widetilde{E})$$, where $$\Vert F\Vert$$ is quadratically convergent to zero.

### Complex case

For a Hermitian matrix $$A\in \mathbb {C}^{n \times n}$$, we must note that, for any unitary diagonal matrix U, XU is also an eigenvector matrix, i.e., there is a continuum of normalized eigenvector matrices in contrast to the real case. Related to this, note that $$R:=I-\widehat{X}^{\mathrm {H}}\widehat{X}$$ and $$S:=\widehat{X}^{\mathrm {H}}A\widehat{X}$$, and (14) is replaced with $$\widetilde{E}+\widetilde{E}^{\mathrm {H}}=R$$ in the complex case; thus, the diagonal elements $$\widetilde{e}_{ii}$$ for $$i=1,\ldots ,n$$ are not uniquely determined in $$\mathbb {C}$$. Now, select $$\widetilde{e}_{ii}=r_{ii}/2\in \mathbb {R}$$ for $$i=1,\ldots ,n$$. Then, we can prove quadratic convergence using the polar decomposition in the same way as in the discussion of multiple eigenvalues in the real case. More precisely, we define a normalized eigenvector matrix Y as follows. First, we focus on the situation where all eigenvalues are simple. For a given nonsingular $$\widehat{X}$$, let Y be defined such that all diagonal elements of $$\widehat{X}^{-1}Y$$ are positive real numbers. In addition, let $$F:=\widehat{X}^{-1}Y-I$$. Then, we see the quadratic convergence of F in the same way as in Theorem 2.

### Corollary 1

Let $$A\in \mathbb {C}^{n\times n}$$ be a Hermitian matrix whose eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$, are all simple. Let $$\mathcal {V}$$ be the set of $$n \times n$$ unitary eigenvector matrices of A. For a given nonsingular $$\widehat{X} \in \mathbb {C}^{n\times n}$$, suppose that Algorithm 1 is applied to A and $$\widehat{X}$$, and a nonsingular $$X'$$ is obtained. Define $$Y,Y' \in \mathcal {V}$$ such that all the diagonal elements of $$\widehat{X}^{-1}Y$$ and $$(X')^{-1}Y'$$ are positive real numbers. Furthermore, define F and $$F'$$ such that $$Y=\widehat{X}(I+F)$$ and $$Y'=X'(I+F')$$, respectively. If

\begin{aligned} \Vert F\Vert < \frac{1}{3}\min \left( \frac{\min _{\lambda _{i}\not = \lambda _{j}}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}
(70)

then we obtain

\begin{aligned} \Vert F'\Vert< & {} \frac{5}{7}\Vert F\Vert , \\ \limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}}\le & {} 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}\right) . \end{aligned}

For a general Hermitian matrix having multiple eigenvalues, we define Y in the same manner as in Theorem 2, resulting in the following corollary.

### Corollary 2

Let $$A\in \mathbb {C}^{n\times n}$$ be a Hermitian matrix with the eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$. Suppose A has multiple eigenvalues with index sets $$\mathcal {M}_{k}$$, $$k = 1, 2, \ldots , M$$, satisfying (56). For a given nonsingular $$\widehat{X} \in \mathbb {C}^{n\times n}$$, let Y, $$Y'$$, F, $$F'$$, and $$\delta$$ be defined as in Corollary 1. Suppose that, for all k, the $$n_{k}\times n_{k}$$ submatrices of $$\widehat{X}^{-1}Y$$ and $$(X')^{-1}Y'$$ corresponding to $$\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}$$ are Hermitian and positive definite. Furthermore, suppose that (70) is satisfied. Then, we obtain

\begin{aligned}&\Vert F'\Vert < \frac{5}{7}\Vert F\Vert , \\&\limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}} \le 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}\right) . \end{aligned}

Note that $$\widehat{X}$$ in Corollaries 1 and 2 is convergent to some fixed eigenvector matrix X of A in the same manner as in the real case.

## Numerical results

Here, we present numerical results to demonstrate the effectiveness of the proposed algorithm (Algorithm 1). The numerical experiments discussed in this section were conducted using MATLAB R2016b on a PC with two CPUs (3.0 GHz Intel Xeon E5-2687W v4) and 1 TB of main memory. Let $$\mathbf {u}$$ denote the relative rounding error unit ($$\mathbf {u}= 2^{-24}$$ for IEEE binary32, and $$\mathbf {u}= 2^{-53}$$ for binary64). To realize multiple-precision arithmetic, we adopt Advanpix Multiprecision Computing Toolbox version 4.2.3 , which utilizes well-known, fast, and reliable multiple-precision arithmetic libraries including GMP and MPFR. In all cases, we use the MATLAB function norm for computing the spectral norms $$\Vert R\Vert$$ and $$\Vert S - \widetilde{D}\Vert$$ in Algorithm 1 in binary64 arithmetic, and approximate $$\Vert A\Vert$$ by $$\max _{1 \le i \le n}|\widetilde{\lambda }_{i}|$$. We discuss numerical experiments for some dozens of seeds for the random number generator, and all results are similar to those provided in this section. Therefore, we adopt the default seed as a typical example using the MATLAB command rng(‘default’) for reproducibility.

### Convergence property

Here, we confirm the convergence property of the proposed algorithm for various eigenvalue distributions.

#### Various eigenvalue distributions

We first generate real symmetric and positive definite matrices using the MATLAB function randsvd from Higham’s test matrices  by the following MATLAB command.

The eigenvalue distribution and condition number of A can be controlled by the input arguments $$\mathtt {mode} \in \{1,2,3,4,5\}$$ and $$\mathtt {cnd} =: \alpha \ge 1$$, as follows:

1. 1.

one large: $$\lambda _{1} \approx 1$$, $$\lambda _{i} \approx \alpha ^{-1}$$, $$i = 2,\ldots ,n$$

2. 2.

one small: $$\lambda _{n} \approx \alpha ^{-1}$$, $$\lambda _{i} \approx 1$$, $$i = 1,\ldots ,n-1$$

3. 3.

geometrically distributed: $$\lambda _{i} \approx \alpha ^{-(i - 1)/(n - 1)}$$, $$i = 1,\ldots ,n$$

4. 4.

arithmetically distributed: $$\lambda _{i} \approx 1 - (1 - \alpha ^{-1})(i - 1)/(n - 1)$$, $$i = 1,\ldots ,n$$

5. 5.

random with uniformly distributed logarithm: $$\lambda _{i} \approx \alpha ^{-r(i)}$$, $$i = 1,\ldots ,n$$, where r(i) are pseudo-random values drawn from the standard uniform distribution on (0, 1).

Here, $$\kappa (A) \approx \mathtt {cnd}$$ for $$\mathtt {cnd} < \mathbf {u}^{-1} \approx 10^{16}$$. Note that for $$\mathtt {mode} \in \{1,2\}$$, there is a cluster of eigenvalues that are not exactly but nearly multiple due to rounding errors when A is generated using randsvd, i.e., all clustered eigenvalues are slightly separated.

We start with small examples such as $$n = 10$$, since they are sufficiently illustrative to observe the convergence behavior of the algorithm. Moreover, we set $$\mathtt {cnd} = 10^{8}$$ to generate moderately ill-conditioned problems in binary64. We compute $$X^{(0)}$$ as an initial approximate eigenvector matrix using the MATLAB function eig for the eigenvalue decomposition in binary64 arithmetic, which adopts the LAPACK routine DSYEV. Therefore, $$X^{(0)}$$ suffers from rounding errors. To see the behavior of Algorithm 1 precisely, we use multiple-precision arithmetic with sufficiently long precision to simulate the exact arithmetic in Algorithm 1. Then, we expect that Algorithm 1 (RefSyEv) works effectively for $$\mathtt {mode} \in \{3,4,5\}$$, but does not for $$\mathtt {mode} \in \{1,2\}$$. We also use the built-in function eig in the multiple-precision toolbox to compute the eigenvalues $$\lambda _{i}$$, $$i = 1, 2, \ldots , n$$ for evaluation. The results are shown in Fig. 1, which provides $$\max _{1 \le i \le n}|\widehat{\lambda }_{i} - \lambda _{i}|/|\lambda _{i}|$$ as the maximum relative error of the computed eigenvalues $$\widehat{\lambda }_{i}$$, $$\Vert \mathrm {offdiag}(\widehat{X}^{\mathrm {T}}A\widehat{X})\Vert /\Vert A\Vert$$ as the diagonality of $$\widehat{X}^{\mathrm {T}}A\widehat{X}$$, $$\Vert I - \widehat{X}^{\mathrm {T}}\widehat{X}\Vert$$ as the orthogonality of a computed eigenvector matrix $$\widehat{X}$$, and $$\Vert \widehat{E}\Vert$$, where $$\widehat{E}$$ is a computed result of $$\widetilde{E}$$ in Algorithm 1. Here, $$\mathrm {offdiag}(\cdot )$$ denotes the off-diagonal part. The horizontal axis shows the number of iterations $$\nu$$ of Algorithm 1.

In the case of $$\mathtt {mode} \in \{3,4,5\}$$, all the quantities decrease quadratically in every iteration, i.e., we observe the quadratic convergence of Algorithm 1, as expected. On the other hand, in the case of $$\mathtt {mode} \in \{1, 2\}$$, the algorithm fails to improve the accuracy of the initial approximate eigenvectors because the test matrices for $$\mathtt {mode} \in \{1, 2\}$$ have nearly multiple eigenvalues and the assumption (29) for the convergence of Algorithm 1 is not satisfied for these eigenvalues.

#### Multiple eigenvalues

We deal with the case where A has exactly multiple eigenvalues. It is not trivial to generate such matrices using of floating-point arithmetic because rounding errors slightly perturb the eigenvalues and multiplicity is broken. To avoid this, we utilize a Hadamard matrix H of order n whose elements are 1 or $$-1$$ with $$H^{\mathrm {T}}H = nI$$. For a given $$k < n$$, let $$A = \frac{1}{n}HDH^{\mathrm {T}}$$ where

\begin{aligned} D = \mathrm {diag}(\overbrace{-1,\ldots ,-1}^{k},1,2,\ldots ,n-k) . \end{aligned}

Then A has k-fold eigenvalues $$\lambda _{i} = -1$$, $$i = 1, \ldots , k$$, and $$\lambda _{i+k} = i$$, $$i = 1, \ldots , n - k$$. We set $$n = 256$$ and $$k = 10$$, where A is exactly representable in binary64 format, and compute $$X^{(0)}$$ using eig in binary64 arithmetic. We apply Algorithm 1 to A and $$X^{(\nu )}$$ for $$\nu = 0, 1, 2, \ldots$$ . The results are shown in Fig. 2. As can be seen, Algorithm 1 converges quadratically for matrices with multiple eigenvalues, which is consistent with our convergence analysis (Theorem 2).

### Computational speed

To evaluate the computational speed of the proposed algorithm (Algorithm 1), we compare the computing time of Algorithm 1 to that of an approach using multiple-precision arithmetic, which is called “MP-approach”. We also compare the results of Algorithm 1 to those of a method combining the Davies–Modi algorithm  with the Newton–Schulz iteration [18, Section 8.3], which is called “NS$$\,+\,$$DM method” and explained in Appendix. Note that the timing should be observed for reference because the computing times for Algorithm 1 and the NS$$\,+\,$$DM method strongly depend on the implementation of accurate matrix multiplication. Thus, we adopt an efficient method proposed by Ozaki et al.  that utilizes fast matrix multiplication routines such as xGEMM in BLAS. In the multiple-precision toolbox, the MRRR algorithm  via Householder reduction is implemented sophisticatedly with parallelism to solve symmetric eigenvalue problems.

As test matrices, we generate pseudo-random real symmetric $$n \times n$$ matrices with $$n \in \{100, 500, 1000\}$$ using the MATLAB function randn such as B = randn(n) and A = B + B’. We use eig in binary64, and iteratively refine the computed eigenvectors using Algorithm 1 three times. We do it similarly for the NS$$+$$DM method. In the multiple-precision toolbox, we can control the arithmetic precision d in decimal digits using the command mp.Digits(d). Corresponding to the results of Algorithm 1, we adjust d for $$\nu = 1, 2, 3$$. For timing fairness, we adopt $$\mathtt {d} = \max (\mathtt {d},34)$$ because the case for $$\mathtt {d} = 34$$ is faster than that of $$\mathtt {d} < 34$$. In Tables 1, 2 and 3, we show $$\Vert \widehat{E}\Vert$$ and the measured computing time. For the NS$$+$$DM method, we show $$\Vert I - \widehat{U}\Vert$$, where $$\widehat{U}$$ is the correction term in the Davies–Modi algorithm, so that $$\Vert I - \widehat{U}\Vert$$ is comparable to $$\Vert \widehat{E}\Vert$$ in Algorithm 1. As shown by $$\Vert \widehat{E}\Vert$$, Algorithm 1 quadratically improves the accuracy of the computed eigenvectors. Compared to $$\Vert I - \widehat{U}\Vert$$, the behavior of the NS$$+$$DM method is similar to those of Algorithm 1. As expected, Algorithm 1 is much faster than the NS$$+$$DM method. The accuracy of the results obtained using the MP-approach corresponds to the arithmetic precision d. As can be seen, Algorithm 1 is considerably faster than the MP-approach for greater n.

## Conclusion

We have proposed a refinement algorithm (Algorithm 1) for eigenvalue decomposition of real symmetric matrices that can be applied iteratively. Quadratic convergence of the proposed algorithm was proven for a sufficiently accurate initial guess, similar to Newton’s method.

The proposed algorithm benefits from the availability of efficient matrix multiplication in higher-precision arithmetic. The numerical results demonstrate the excellent performance of the proposed algorithm in terms of convergence rate and measured computing time.

In practice, it is likely that ordinary floating-point arithmetic, such as IEEE 754 binary32 or binary64, is used for calculating an approximation $$\widehat{X}$$ to an eigenvector matrix $${X}$$ of a given symmetric matrix A. As done in the numerical experiments, we can use $$\widehat{X}$$ as an initial guess $$X^{(0)}$$ in Algorithm 1. However, if A has nearly multiple eigenvalues, it is difficult to obtain a sufficiently accurate $$X^{(0)}$$ in ordinary floating-point arithmetic such that Algorithm 1 works well. Our future work is to overcome this problem.

## References

1. 1.

Absil, P.A., Mahony, R., Sepulchre, R., Van Dooren, P.: A Grassmann-Rayleigh quotient iteration for computing invariant subspaces. SIAM Rev. 44, 57–73 (2006)

2. 2.

3. 3.

Ahuesa, M., Largillier, A., D’Almeida, F.D., Vasconcelos, P.B.: Spectral refinement on quasi-diagonal matrices. Linear Algebra Appl. 401, 109–117 (2005)

4. 4.

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)

5. 5.

Atkinson, K., Han, W.: Theoretical Numerical Analysis, 3rd edn. Springer, New York (2009)

6. 6.

Collar, A.R.: Some notes on Jahn’s method for the improvement of approximate latent roots and vectors of a square matrix. Quart. J. Mech. Appl. Math. 1, 145–148 (1948)

7. 7.

Davies, P.I., Higham, N.J., Tisseur, F.: Analysis of the Cholesky method with iterative refinement for solving the symmetric definite generalized eigenproblem. SIAM J. Matrix Anal. Appl. 23, 472–493 (2001)

8. 8.

Davies, P.I., Smith, M.I.: Updating the singular value decomposition. J. Comput. Appl. Math. 170, 145–167 (2004)

9. 9.

Davies, R.O., Modi, J.J.: A direct method for completing eigenproblem solutions on a parallel computer. Linear Algebra Appl. 77, 61–74 (1986)

10. 10.

Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)

11. 11.

Dhillon, I.S., Parlett, B.N.: Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra Appl. 387, 1–28 (2004)

12. 12.

Dongarra, J.J., Moler, C.B., Wilkinson, J.H.: Improving the accuracy of computed eigenvalues and eigenvectors. SIAM J. Numer. Anal. 20, 23–45 (1983)

13. 13.

GMP: GNU Multiple Precision Arithmetic Library, Code and documentation. http://gmplib.org/ (2015)

14. 14.

Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)

15. 15.

Gu, M., Eisenstat, S.C.: A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. SIAM J. Matrix Anal. Appl. 16, 172–191 (1995)

16. 16.

Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: Proceedings of the 15th IEEE Symposium on Computer Arithmetic, pp. 155–162. IEEE Computer Society Press (2001)

17. 17.

Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)

18. 18.

Higham, N.J.: Functions of Matrices: Theory and Computation. SIAM, Philadelphia (2008)

19. 19.

Jahn, H.A.: Improvement of an approximate set of latent roots and modal columns of a matrix by methods akin to those of classical perturbation theory. Quart. J. Mech. Appl. Math. 1, 131–144 (1948)

20. 20.

Li, X.S., Demmel, J.W., Bailey, D.H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S.Y., Kapur, A., Martin, M.C., Thompson, B.J., Tung, T., Yoo, D.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28, 152–205 (2002)

21. 21.

MPFR: The GNU MPFR Library, Code and documentation. http://www.mpfr.org/ (2013)

22. 22.

Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26, 1955–1988 (2005)

23. 23.

Oishi, S.: Fast enclosure of matrix eigenvalues and singular values via rounding mode controlled computation. Linear Algebra Appl. 324, 133–146 (2001)

24. 24.

Ozaki, K., Ogita, T., Oishi, S., Rump, S.M.: Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications. Numer. Algorithms 59, 95–118 (2012)

25. 25.

Parlett, B.N.: The symmetric eigenvalue problem, vol. 20, 2nd edn. Classics in Applied Mathematics. SIAM, Philadelphia (1998)

26. 26.

Priest, D.M.: Algorithms for arbitrary precision floating point arithmetic. In: Proceedings of the 10th Symposium on Computer Arithmetic, pp. 132–145. IEEE Computer Society Press (1991)

27. 27.

Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part I: faithful rounding. SIAM J. Sci. Comput. 31, 189–224 (2008)

28. 28.

Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part II: sign, $$K$$-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31, 1269–1302 (2008)

29. 29.

Tisseur, F.: Newton’s method in floating point arithmetic and iterative refinement of generalized eigenvalue problems. SIAM J. Matrix Anal. Appl. 22, 1038–1057 (2001)

30. 30.

Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Clarendon Press, Oxford (1965)

31. 31.

Yamamoto, S., Fujiwara, T., Hatsugai, Y.: Electronic structure of charge and spin stripe order in $$\text{ La }_{2-x}\,\text{ Sr }_{x}\,\text{ Ni }\,\text{ O }_{4}$$ ($$x =\frac{1}{3}, \frac{1}{2}$$). Phys. Rev. B 76, 165114 (2007)

32. 32.

Yamamoto, S., Sogabe, T., Hoshi, T., Zhang, S.-L., Fujiwara, T.: Shifted conjugate-orthogonal-conjugate-gradient method and its application to double orbital extended Hubbard model. J. Phys. Soc. Jpn. 77, 114713 (2008)

## Acknowledgements

The first author would like to express his sincere thanks to Professor Chen Greif at the University of British Columbia for his valuable comments and helpful suggestions.

## Author information

Authors

### Corresponding author

Correspondence to Takeshi Ogita.

This study was partially supported by CREST, JST, and JSPS KAKENHI Grant numbers 16H03917, 17K14143.

## Appendix: Relation to previous work

### Appendix: Relation to previous work

First, we briefly explain the Davies–Modi algorithm , which is relevant to this study. The Davies–Modi algorithm assumes that a real symmetric matrix A is transformed into a nearly diagonal symmetric matrix $$S = \widehat{X}^{\mathrm {T}}A\widehat{X}$$ by some nearly orthogonal matrix $$\widehat{X}$$. Under that assumption, the method aims to compute an accurate eigenvector matrix $${U}$$ of S. The idea is seen in Jahn’s method [6, 19] as shown by Davies and Smith , which derives an SVD algorithm based on the Davies–Modi algorithm. To demonstrate the relationship between the Davies–Modi and the algorithm proposed in this work, we explain the Davies–Modi algorithm in the same manner as in reference . Note that $${U}^{\mathrm {T}}$$ is written as

\begin{aligned} {U}^{\mathrm {T}} = e^{Y} = I + Y + \frac{1}{2}Y^{2} + \frac{1}{6}Y^{3} + \cdots \end{aligned}
(71)

for some skew-symmetric matrix Y. To compute Y with high accuracy, we define a diagonal matrix $$D_{S}$$ whose diagonal elements are the eigenvalues of S. Then,

\begin{aligned} D_{S} = {U}^{\mathrm {T}}S{U} = S + {Y}S - S{Y} + \frac{1}{2}(Y^{2}S+SY^{2})-YSY + \mathcal {O}(\Vert Y\Vert ^{3}). \end{aligned}
(72)

Here we define $$S_{0}={\mathrm {diag}}(s_{11},\ldots ,s_{nn})$$, and $$S_{1}=S-S_{0}$$ corresponds to the off-diagonal entries of S. Under a mild assumption $$S_{1}=\mathcal {O}(\Vert Y\Vert )$$, we see

\begin{aligned} D_{S}= & {} S_{0} + (S_{1} + {Y}S_{0} - S_{0}{Y}) \\&+ \left( {Y}S_{1} - S_{1}{Y} + \frac{1}{2}(Y^{2}S_{0}+S_{0}Y^{2})-YS_{0}Y \right) + \mathcal {O}(\Vert Y\Vert ^{3}).\nonumber \end{aligned}
(73)

For the first-order approximation, we would like to solve

\begin{aligned} S_{1} + \widetilde{Y}_{1}S_{0} - S_{0}\widetilde{Y}_{1}=O. \end{aligned}
(74)

To construct an orthogonal matrix $$e^{-\widetilde{Y}_{1}}$$, suppose that $$\widetilde{Y}_{1}$$ is a skew-symmetric matrix. Then, $$(\widetilde{Y}_{1})_{ij}=-(\widetilde{Y}_{1})_{ji}=s_{ij}/(s_{ii}-s_{jj})$$ for $$i\not =j$$. To compute $${U}$$ with high accuracy, we construct the second-order approximation $$\widetilde{U}^{\mathrm {T}}=I+\widetilde{Y}_{1}+\widetilde{Y}_{2}+\widetilde{Y}_{1}^{2}/2$$ for skew-symmetric matrices $$\widetilde{Y}_{1}$$ and $$\widetilde{Y}_{2}$$. To compute $$\widetilde{Y}_{2}$$, letting

\begin{aligned} T=\frac{1}{2}(\widetilde{Y}_{1}S_{1} - S_{1}\widetilde{Y}_{1}),\quad T_{0}={\mathrm {diag}}(t_{11},\ldots , t_{nn}),\quad T_{1}=T-T_{0}, \end{aligned}
(75)

we solve $$T_{1} + \widetilde{Y}_{2}S_{0} - S_{0}\widetilde{Y}_{2}=O$$ for a skew-symmetric matrix $$\widetilde{Y}_{2}$$. Note that $$\widetilde{Y}_{2}$$ is computed in the same manner as $$\widetilde{Y}_{1}$$ in (74). Thus, we obtain $$\widetilde{U}^{\mathrm {T}}$$, where its second-order approximation is proven rigorously. Since the Davies–Modi algorithm is based on matrix multiplication, it requires $$\mathcal {O}(n^{3})$$ operations.

In the following, we discuss the refinement of $$\widehat{X}$$. Suppose $$\widehat{X}$$ is computed using ordinary floating-point arithmetic. The main problem is that, since the Davies–Modi algorithm is applied to $$S=\widehat{X}^{\mathrm {T}}A\widehat{X}$$ to compute the eigenvalue decomposition of S, $$\widehat{X}$$ is assumed to be an orthogonal matrix. Generally, however, this is not the case because $$\widehat{X}$$ suffers from rounding errors. Therefore, even if the computation of $$\widehat{X}^{\mathrm {T}}A\widehat{X}$$ is performed in exact arithmetic, A and S are not similar unless $$\widehat{X}$$ is orthogonal. Then, the eigenvalues of A are slightly perturbed, which may cause a significant change of the eigenvectors associated with clustered eigenvalues. To overcome this problem, we must refine the orthogonality of $$\widehat{X}$$ as preconditioning in higher-precision arithmetic. Recall that we intend to restrict the use of higher-precision arithmetic to matrix multiplications for better computational efficiency. Hence, we may use the Newton–Schulz iteration [18, Section 8.3] such as

\begin{aligned} Z =\frac{1}{2}\widehat{X}(3I-\widehat{X}^{\mathrm {T}}\widehat{X}), \end{aligned}
(76)

where all the singular values of $$\widehat{X}$$ are quadratically convergent to 1. After this reorthogonalization, for an eigenvector matrix U of $$T = Z^{\mathrm {T}}AZ$$, the columns of $$X':=ZU$$ are expected to be sufficiently accurate approximation of the eigenvectors of A. Of course, in practice, we must derive a certain method to compute an approximate eigenvector matrix $$\widetilde{U}$$ of T. If T is nearly diagonal, it is effective to apply a diagonal shift to T. Similarly, the Jacobi and Davies–Modi algorithms would be able to compute $$\widetilde{U}$$ accurately using higher-precision arithmetic. As a result, we can obtain a sufficiently accurate eigenvector matrix $$X':=Z\widetilde{U}$$.

It is worth noting that our approach can reproduce the first-order approximation step in the Davies–Modi algorithm as follows. Since we see

\begin{aligned} \widehat{X}^{\mathrm {T}}A\widehat{X} = S = {U}D_{S}{U}^{\mathrm {T}}=e^{-Y}D_{S}e^{Y}=D_{S}+D_{S}{Y}-{Y}D_{S}+\mathcal {O}(\Vert Y\Vert ^{2}) \end{aligned}
(77)

from (71), we obtain $$S=\widetilde{D}_{S}+\widetilde{D}_{S}\widetilde{Y}_{1}-\widetilde{Y}_{1}\widetilde{D}_{S}$$ as the linearization process in the same manner as (13). It is easy to see that $$\widetilde{D}_{S}={\mathrm {diag}}(s_{11},\ldots ,s_{nn})$$ and $$\widetilde{Y}_{1}$$ is equal to the solution in (74). Hence, if $$\widehat{X}$$ in Algorithm 1 is an orthogonal matrix and the diagonal elements of S are sufficiently separated, then $$\widetilde{E}$$ is equal to the skew-symmetric matrix $$\widetilde{Y}_{1}$$ in view of $$R=O$$. In other words, from the viewpoint of the Davies–Modi algorithm, the second-order approximation step is removed and the skew-symmetry of $$\widetilde{E}$$ is not assumed in Algorithm 1. Instead, condition (9) is integrated to improve orthogonality. If we assume $$\widetilde{E}=\widetilde{E}^{\mathrm {T}}$$ in (9), we have

\begin{aligned} X'=\widehat{X}(I+\widetilde{E})=\widehat{X}\left( I+\frac{1}{2}(I-\widehat{X}^{\mathrm {T}}\widehat{X}) \right) , \end{aligned}
(78)

which is equivalent to the Newton–Schulz iteration (76). In other words, if all eigenvalues are considered clustered, $$\widehat{X}$$ is refined by the Newton–Schulz iteration. For orthogonality, we require the only relation (9), while an iteration function for polar decomposition must be an odd function, such as the Newton–Schulz iteration.

In summary, to combine the Newton–Schulz iteration and the Davies–Modi algorithm sophisticatedly, we remove unnecessary conditions from both, resulting in Algorithm 1, which can be considered an ideal method to refine orthogonality and diagonality simultaneously.