1 Introduction

Let A be a real symmetric \(n \times n\) matrix. Since solving a standard symmetric eigenvalue problem \(Ax = \lambda x\), where \(\lambda \in {\mathbb {R}}\) is an eigenvalue of A and \(x \in {\mathbb {R}}^{n}\) is an eigenvector of A associated with \(\lambda \), is ubiquitous in scientific computing, it is important to develop reliable numerical algorithms for calculating eigenvalues and eigenvectors accurately. Excellent overviews on the symmetric eigenvalue problem can be found in references [20, 23].

We are concerned with the eigenvalue decomposition of A such that

$$\begin{aligned} A = {X}{D}{X}^{\mathrm {T}}, \end{aligned}$$
(1)

where \({X}\) is an \(n \times n\) orthogonal matrix whose ith columns are eigenvectors \(x_{(i)}\) of A (called an eigenvector matrix) and \({D} = (d_{ij})\) is an \(n \times n\) diagonal matrix whose diagonal elements are the corresponding eigenvalues \(\lambda _{i} \in {\mathbb {R}}\), i.e., \(d_{ii} = \lambda _{i}\) for \(i = 1, \ldots , n\). Throughout the paper, we assume that

$$\begin{aligned} \lambda _{1} \le \lambda _{2} \le \cdots \le \lambda _{n}, \end{aligned}$$

and the columns of \({X}\) are ordered correspondingly.

We here collect notation used in this paper. Let I and O denote the identity matrix and the zero matrix of appropriate size, respectively. Unless otherwise specified, \(\Vert \cdot \Vert \) means \(\Vert \cdot \Vert _{2}\), which denotes the Euclidean norm for vectors and the spectral norm for matrices. For legibility, if necessary, we distinguish between the approximate quantities and the computed results, e.g., for some quantity \(\alpha \) we write \(\widetilde{\alpha }\) and \(\widehat{\alpha }\) as an approximation of \(\alpha \) and a computed result for \(\alpha \), respectively.

The accuracy of an approximate eigenvector depends on the gap between the corresponding eigenvalue and its nearest neighbor eigenvalue (cf., e.g., [20, Theorem 11.7.1]). For simplicity, suppose all eigenvalues of A are simple. Let \(\widehat{X} \in {\mathbb {R}}^{n \times n}\) be an approximation of \({X}\). Let \(z_{(i)}:=\widehat{x}_{(i)}/\Vert \widehat{x}_{(i)}\Vert \) for \(i = 1, 2, \ldots , n\), where \(\widehat{x}_{(i)}\) are the i-th columns of \(\widehat{X}\). Moreover, for each i, suppose that the Ritz value \(\mu _{i}:=z_{(i)}^{\mathrm {T}}Az_{(i)}\) is closer to \(\lambda _{i}\) than to any other eigenvalues. Let \( gap (\mu _{i})\) denote the smallest difference between \(\mu _{i}\) and any other eigenvalue, i.e., \( gap (\mu _{i}) := \min _{j \ne i}|\mu _{i}-\lambda _{j}|\). Then, it holds for all i that

$$\begin{aligned} |\sin \theta (x_{(i)},z_{(i)})| \le \frac{\Vert Az_{(i)}-\mu _{i}z_{(i)}\Vert }{ gap (\mu _{i})}, \quad \theta (x_{(i)},z_{(i)}):=\mathrm{arc} \cos (|x_{(i)}^{\mathrm {T}}z_{(i)}|) . \end{aligned}$$

Suppose \(\widehat{X}\) is obtained by some backward stable algorithm with the relative rounding error unit \({\mathbf {u}}\) in floating-point arithmetic. For example, \({\mathbf {u}}= 2^{-53}\) for IEEE 754 binary64. Then, there exists \(\varDelta _{A}^{(i)}\) such that

$$\begin{aligned} (A + \varDelta _{A}^{(i)})z_{(i)} = \mu _{i}z_{(i)}, \quad \Vert \varDelta _{A}^{(i)}\Vert = {\mathcal {O}}(\Vert A\Vert {\mathbf {u}}), \end{aligned}$$

which implies \(\Vert Az_{(i)}-\mu _{i}z_{(i)}\Vert = {\mathcal {O}}(\Vert A\Vert {\mathbf {u}})\), and hence, for all i,

$$\begin{aligned} |\sin \theta (x_{(i)},z_{(i)})| \le \alpha _{i}, \quad \alpha _{i} = {\mathcal {O}}\left( \frac{\Vert A\Vert {\mathbf {u}}}{ gap (\mu _{i})}\right) . \end{aligned}$$
(2)

The smaller the eigenvalue gap, the worse the accuracy of a computed eigenvector. Therefore, refinement algorithms for eigenvectors are useful for obtaining highly accurate results. For example, highly accurate computations of a few or all eigenvectors are crucial for large-scale electronic structure calculations in material physics [24, 25], in which specific interior eigenvalues with associated eigenvectors need to be computed. On related work on refinement algorithms for symmetric eigenvalue decomposition, see the previous paper [17] for details.

In [17], we proposed a refinement algorithm for the eigenvalue decomposition of A, which works not for an individual eigenvector but for all eigenvectors. Since the algorithm is based on Newton’s method, it converges quadratically, provided that an initial guess is sufficiently accurate. In practice, although the algorithm refines computed eigenvectors corresponding to sufficiently separated simple eigenvalues, it cannot refine computed eigenvectors corresponding to “nearly” multiple eigenvalues. This is because it is difficult for standard numerical algorithms in floating-point arithmetic to provide sufficiently accurate initial approximate eigenvectors corresponding to nearly multiple eigenvalues as shown in (2). The purpose of this paper is to remedy this problem, i.e., we aim to develop a refinement algorithm for the eigenvalue decomposition of a symmetric matrix with clustered eigenvalues.

We briefly explain the idea of our proposed algorithm. We focus on the so-called \(\sin \theta \) theorem by Davis–Kahan [5, Section 2] as follows. For an index set \({\mathcal {J}}\) with \(|{\mathcal {J}}|=\ell < n\), let \(X_{{\mathcal {J}}} \in {\mathbb {R}}^{n\times \ell }\) denote the eigenvector matrix comprising \(x_{(j)}\) for all \(j \in {\mathcal {J}}\). For \(1\le k \le \ell \), let \(\mu _{k}\) denote the Ritz values for the subspace spanned by some given vectors with \(\mu _{1}\le \cdots \le \mu _{\ell }\), and let \(z_{k}\) be the corresponding normalized Ritz vectors. Assume that the eigenvalues \(\lambda _{i}\) for all \(i \not \in {\mathcal {J}}\) are entirely outside of \([\mu _{1},\mu _{\ell }]\). Let \( Gap \) denote the smallest difference between the Ritz values \(\mu _{k}\) for all k, \(1\le k \le \ell \), and the eigenvalues \(\lambda _{i}\) for all \(i \not \in {\mathcal {J}}\), i.e., \( Gap := \min \{|\mu _{k} - \lambda _{i}|~:~1 \le k \le \ell , \ i \not \in {\mathcal {J}}\}\). Moreover, let \(Z_{{\mathcal {J}}}:=[z_{1},\ldots , z_{\ell }] \in {\mathbb {R}}^{n\times \ell }\). Then, we obtain

$$\begin{aligned}&|\sin \varTheta (X_{{\mathcal {J}}},Z_{{\mathcal {J}}})| \le \frac{\Vert AZ_{{\mathcal {J}}}-Z_{{\mathcal {J}}}(Z_{{\mathcal {J}}}^{\mathrm {T}}AZ_{{\mathcal {J}}})\Vert }{ Gap }, \\&\quad \varTheta (X_{{\mathcal {J}}},Z_{{\mathcal {J}}}):=\mathrm{arc} \cos (\Vert X_{{\mathcal {J}}}^{\mathrm {T}}Z_{{\mathcal {J}}}\Vert ). \end{aligned}$$

This indicates that the subspace spanned by eigenvectors associated with the clustered eigenvalues is not very sensitive to perturbations, provided that the gap between the clustered eigenvalues and the others is sufficiently large. That means backward stable algorithms can provide a sufficiently accurate initial guess of the “subspace” corresponding to the clustered eigenvalues. To extract eigenvectors from the subspace correctly, relatively larger gaps are necessary between the clustered eigenvalues as can be seen from (2). Thus, we first apply the algorithm (Algorithm 1: \(\mathsf {RefSyEv}\)) in the previous paper [17] to the initial approximate eigenvector matrix for improving the subspace corresponding to the clustered eigenvalues. Then, we divide the entire problem into subproblems, each of which corresponds to each cluster of eigenvalues. Finally, we expand eigenvalue gaps in each subproblem by using a diagonal shift and compute eigenvectors of each subproblem, which can be used for refining approximate eigenvectors corresponding to clustered eigenvalues in the entire problem.

One might notice that the above procedure is similar to the classical shift-invert technique to transform eigenvalue distributions. In addition, the MRRR algorithm [6] also employs a shift strategy to increase relative gaps between clustered eigenvalues for computing the associated eigenvectors. In other words, it is well known that the diagonal shift is useful for solving eigenvalue problems accurately. Our contribution is to show its effectiveness on the basis of appropriate error analysis with the adaptive use of higher precision arithmetic, which leads to the derivation of the proposed algorithm.

In the same spirit of the previous paper [17], our proposed algorithm primarily comprises matrix multiplication, which accounts for the majority of the computational cost. Therefore, we can utilize higher precision matrix multiplication efficiently. For example, XBLAS [13] and other efficient algorithms [16, 19, 22] based on so-called error-free transformations for accurate matrix multiplication are available for practical implementation.

The remainder of the paper is organized as follows. In Sect. 2, we recall the refinement algorithm (Algorithm 1) proposed in the previous paper [17] together with its convergence theory. For practical use, we present a rounding error analysis of Algorithm 1 in finite precision arithmetic in Sect. 3, which is useful for setting working precision and shows achievable accuracy of approximate eigenvectors obtained by using Algorithm 1. In Sect. 4, we show the behavior of Algorithm 1 for clustered eigenvalues, which explains the effect of nearly multiple eigenvalues on computed results and leads to the derivation of the proposed algorithm. On the basis of Algorithm 1, we propose a refinement algorithm (Algorithm 2: \(\mathsf {RefSyEvCL}\)) that can also be applied to matrices with clustered eigenvalues in Sect. 5. In Sect. 6, we present some numerical results showing the behavior and performance of the proposed algorithm together with an application to a quantum materials simulation as a real-world problem.

For simplicity, we basically handle only real matrices. As mentioned in the previous paper [17], the discussions in this paper can also be extended to generalized symmetric (Hermitian) definite eigenvalue problems.

2 Basic algorithm and its convergence theory

In this section, we introduce the refinement algorithm proposed in the previous paper [17], which is the basis of the algorithm proposed in this paper.

Let \(A = A^{\mathrm {T}} \in {\mathbb {R}}^{n \times n}\). The eigenvalues of A are denoted by \(\lambda _{i} \in {\mathbb {R}}\), \(i = 1, \ldots , n\). Then \(\Vert A\Vert = \max _{1 \le i \le n}|\lambda _{i}| = \max (|\lambda _{1}|,|\lambda _{n}|)\). Let \({X} \in {\mathbb {R}}^{n \times n}\) denote an orthogonal eigenvector matrix comprising normalized eigenvectors of A, and let \(\widehat{X}\) denote an approximation of \({X}\) with \(\widehat{X}\) being nonsingular. In addition, define \(E \in {\mathbb {R}}^{n \times n}\) such that

$$\begin{aligned} {X}=\widehat{X}(I+E). \end{aligned}$$

In the previous paper, we presented the following algorithm for the eigenvalue decomposition of A, which is designed to be applied iteratively. For later use in Sect. 5, the algorithm also allows the case where an input \(\widehat{X}\) is rectangular, i.e., \(\widehat{X} \in {\mathbb {R}}^{n \times \ell }\), \(\ell < n\).

figure a

In [17, Theorem 1], we presented the following theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given \(\widehat{X}\) is sufficiently close to X.

Theorem 1

(Ogita–Aishima [17]) Let A be a real symmetric \(n \times n\) matrix with simple eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix \(X \in {\mathbb {R}}^{n\times n}\). For a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in real arithmetic, and \(X'\) is the quantity calculated in Algorithm 1. Define E and \(E'\) such that \(X=\widehat{X}(I+E)\) and \(X=X'(I+E')\), respectively. If

$$\begin{aligned} \Vert E\Vert < \min \left( \frac{\min _{i \not = j}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}$$
(3)

then we have

$$\begin{aligned} \Vert E'\Vert< & {} \frac{5}{7}\Vert E\Vert , \end{aligned}$$
(4)
$$\begin{aligned} \limsup _{\Vert E\Vert \rightarrow 0}\frac{\Vert E'\Vert }{\Vert E\Vert ^{2}}\le & {} \frac{6n\Vert A\Vert }{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|} . \end{aligned}$$
(5)

In the following, we review the discussion in [17, §3.2] for exactly multiple eigenvalues. If \(\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}\) corresponding to multiple eigenvalues \(\lambda _{i}=\lambda _{j}\), we compute \(\widetilde{e}_{ij}=\widetilde{e}_{ji}={r}_{ij}/2\) for (ij) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \omega \).

To investigate the above exceptional process, define the index sets \({\mathcal {M}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {M}}}\), for multiple eigenvalues \(\{ {\lambda }_{i} \}_{i \in {\mathcal {M}}_{k}}\) satisfying the following conditions:

$$\begin{aligned} \left\{ \begin{array}{l} \text {(a)} \ {\mathcal {M}}_{k} \subseteq \{1,2,\ldots ,n\} \ \text {with} \ n_{k} := |{\mathcal {M}}_{k}| \ge 2 \\ \text {(b)} \ {\lambda }_{i} = {\lambda }_{j}, \ \forall i,j \in {\mathcal {M}}_{k} \\ \text {(c)} \ {\lambda }_{i} \not = {\lambda }_{j}, \ \forall i \in {\mathcal {M}}_{k}, \ \forall j \in \{1,2,\ldots ,n\}\,\backslash \, {\mathcal {M}}_{k} \end{array}\right. . \end{aligned}$$
(6)

Note that the eigenvectors corresponding to multiple eigenvalues are not unique. Hence, using the above index sets, let Y be an eigenvector matrix defined such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{-1}Y\) corresponding to \(\{\lambda _{i}\}_{i \in {\mathcal {M}}_{k}}\) are symmetric and positive definite. Since then Y is unique, define F such that \(Y=\widehat{X}(I+F)\). Define \(R:=I-\widehat{X}^{\mathrm {T}}\widehat{X}\) and \(S:=\widehat{X}^{\mathrm {T}}A\widehat{X}\). Then, using the orthogonality \(Y^{\mathrm {T}}Y = I\) and the diagonality \(Y^{\mathrm {T}}AY = D\), we have

$$\begin{aligned} F+F^{\mathrm {T}}= & {} R+\varDelta _{1}, \quad \Vert \varDelta _{1}\Vert \le \chi (\epsilon ){\epsilon }^{2}, \end{aligned}$$
(7)
$$\begin{aligned} D-DF-F^{\mathrm {T}}D= & {} S+\varDelta _{2}, \quad \Vert \varDelta _{2} \Vert \le \chi ({\epsilon })\Vert A\Vert {\epsilon }^{2}, \end{aligned}$$
(8)

where \(\epsilon := \Vert F\Vert \) and

$$\begin{aligned} \chi (\epsilon ):=\frac{3-2\epsilon }{(1-\epsilon )^2} . \end{aligned}$$

The above equations can be obtained in the same manner as in our previous paper [17, Eqs. (7) and (11)] by replacing E with F in the equations.

In a similar way to Newton’s method (cf. e.g., [3, p. 236]), dropping the second order terms in (7) and (8) yields Algorithm 1, and the next convergence theorem is provided [17, Theorem 2].

Theorem 2

(Ogita–Aishima [17]) Let A be a real symmetric \(n \times n\) matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \({\mathcal {M}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {M}}}\), satisfying (6). Let \({\mathcal {V}}\) be the set of \(n \times n\) orthogonal eigenvector matrices of A. For a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in real arithmetic, and \(X'\) and \(\omega \) are the quantities calculated in Algorithm 1. Let \(Y, Y' \in {\mathcal {V}}\) be defined such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{-1}Y\) and \((X')^{-1}Y'\) corresponding to \(\{\lambda _{i}\}_{i \in {\mathcal {M}}_{k}}\) are symmetric and positive definite. Define F and \(F'\) such that \(Y=\widehat{X}(I+F)\) and \(Y'=X'(I+F')\), respectively. Furthermore, suppose that

$$\begin{aligned} \Vert F\Vert < \frac{1}{3}\min \left( \frac{\min _{\lambda _{i} \not = \lambda _{j}}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) . \end{aligned}$$

Then, we obtain

$$\begin{aligned} \Vert F'\Vert< & {} \frac{5}{7}\Vert F\Vert , \\ \limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}}\le & {} 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}\right) . \end{aligned}$$

On the basis of the above convergence theorems, let us consider the iterative refinement using Algorithm 1:

$$\begin{aligned} X^{(0)} \leftarrow \widehat{X} \in {\mathbb {R}}^{n \times n}, \quad X^{(\nu + 1)} \leftarrow \mathsf {RefSyEv}(A, X^{(\nu )}) \quad \text {for} \ \nu = 0, 1, \ldots \end{aligned}$$

Then, \(X^{(\nu + 1)} = X^{(\nu )}(I + \widetilde{E}^{(\nu )})\) for \(\nu = 0, 1, \ldots \), where \(\widetilde{E}^{(\nu )} = (\widetilde{e}_{ij}^{(\nu )})\) are the quantities calculated in line 7 of Algorithm 1. In practice, it is likely that ordinary precision floating-point arithmetic, such as IEEE 754 binary32 or binary64, is used for calculating an approximation \(\widehat{X}\) to an eigenvector matrix \({X}\) of a given symmetric matrix A by some backward stable algorithm. It is natural to use such \(\widehat{X}\) as an initial guess \(X^{(0)}\) in Algorithm 1. However, if A has nearly multiple eigenvalues, it is difficult to obtain a sufficiently accurate \(X^{(0)}\) in ordinary precision floating-point arithmetic such that Algorithm 1 works well. To overcome this problem, we develop a practical algorithm for clustered eigenvalues, which is proposed as Algorithm 2 in Sect. 5.

3 Rounding error analysis for basic algorithm

If Algorithm 1 is performed in finite precision arithmetic with the relative rounding error unit \({\mathbf {u}}_{h}\), the accuracy of a refined eigenvector matrix \(X'\) is restricted by \({\mathbf {u}}_{h}\). Since \(\widehat{X}\) is improved quadratically when using real arithmetic, \({\mathbf {u}}_{h}\) must correspond to \(\Vert E\Vert ^{2}\) to preserve the convergence property of Algorithm 1. We explain the details in the following. For simplicity, we consider the real case. The extension to the complex case is obvious.

Let \({\mathbb {F}}_{h}\) be a set of floating-point numbers with the relative rounding error unit \({\mathbf {u}}_{h}\). We define the rounding operator \( fl _{h}\) such that \( fl _{h}: {\mathbb {R}}\rightarrow {\mathbb {F}}_{h}\) and assume the use of the following standard floating-point arithmetic model [11]. For \(a, b \in {\mathbb {F}}_{h}\) and \(\circ \in \{ +, -, \times , / \}\), it holds that

$$\begin{aligned} fl _{h}(a \circ b) = (a \circ b)(1 + \delta _{h}), \quad |\delta _{h}| \le {\mathbf {u}}_{h} . \end{aligned}$$
(9)

For example, it is satisfied in IEEE 754 floating-point arithmetic barring overflow and underflow.

Suppose all elements of A and \(\widehat{X}\) are exactly representable in \({\mathbb {F}}_{h}\), i.e., \(A, \widehat{X} \in {\mathbb {F}}_{h}^{n \times n}\), and \(\Vert \widehat{x}_{(i)}\Vert \approx 1\) for all i. Let \(\widehat{R}\), \(\widehat{S}\), and \(\widehat{E}\) denote the computed results of R, S, and \(\widetilde{E}\) in Algorithm 1, respectively. Define \(\varDelta _{R}\), \(\varDelta _{S}\), and \(\varDelta _{E}\) such that

$$\begin{aligned} \widehat{R} = R + \varDelta _{R}, \quad \widehat{S} = S + \varDelta _{S}, \quad \widehat{E} = \widetilde{E} + \varDelta _{E} . \end{aligned}$$

From a standard rounding error analysis as in [11], we obtain

$$\begin{aligned} (\varDelta _{R})_{ij} = {\mathcal {O}}({\mathbf {u}}_{h}), \quad (\varDelta _{S})_{ij} = {\mathcal {O}}(\Vert A\Vert {\mathbf {u}}_{h}) \quad \text {for} \ 1 \le i, j \le n . \end{aligned}$$

For the computed results \(\widehat{\lambda }_{i}\) of \(\widetilde{\lambda }_{i}\), \(i = 1, 2, \ldots , n\), in Algorithm 1,

$$\begin{aligned} \widehat{\lambda }_{i}= & {} \frac{\widehat{s}_{ii}}{(1 - \widehat{r}_{ii})(1 + \delta _{1})}(1 + \delta _{2}), \quad |\delta _{k}| \le {\mathbf {u}}_{h}, \ k = 1, 2 \\= & {} \frac{\widehat{s}_{ii}}{1 - \widehat{r}_{ii}}(1 + \phi ) = \frac{s_{ii} + (\varDelta _{S})_{ii}}{1 - r_{ii} - (\varDelta _{R})_{ii}}(1 + \phi ), \quad |\phi | = {\mathcal {O}}({\mathbf {u}}_{h}) \\= & {} \widetilde{\lambda }_{i} + \varepsilon _{i}, \quad |\varepsilon _{i}| = {\mathcal {O}}(\Vert A\Vert {\mathbf {u}}_{h}) . \end{aligned}$$

For all (ij) satisfying \(|\widehat{\lambda }_{i} - \widehat{\lambda }_{j}| > \widehat{\omega }\), where \(\widehat{\omega }\) is an approximation of \(\omega \) computed in floating-point arithmetic in Algorithm 1,

$$\begin{aligned} \widehat{e}_{ij}= & {} \frac{(\widehat{s}_{ij} + \widehat{\lambda }_{j}\widehat{r}_{ij}(1 + \delta _{3}))(1 + \delta _{4})}{(\widehat{\lambda }_{j} - \widehat{\lambda }_{i})(1 + \delta _{5})}(1 + \delta _{6}), \quad |\delta _{k}| \le {\mathbf {u}}_{h}, \ k = 3, 4, 5, 6 \\= & {} \frac{\widehat{s}_{ij} + \widehat{\lambda }_{j}\widehat{r}_{ij}}{\widehat{\lambda }_{j} - \widehat{\lambda }_{i}} + \tau _{ij}, \quad |\tau _{ij}| = {\mathcal {O}}(\beta _{ij}{\mathbf {u}}_{h}), \quad \beta _{ij} := \frac{\Vert A\Vert }{|\lambda _{i} - \lambda _{j}|} \\= & {} \frac{s_{ij} + (\varDelta _{S})_{ij} + (\widetilde{\lambda }_{j} + \varepsilon _{j})(r_{ij} + (\varDelta _{R})_{ij})}{\widetilde{\lambda }_{j} - \widetilde{\lambda }_{i} + (\varepsilon _{j} - \varepsilon _{i})} + \tau _{ij} \\= & {} \widetilde{e}_{ij} + \gamma _{ij}^{(1)}, \quad |\gamma _{ij}^{(1)}| = {\mathcal {O}}(\beta _{ij}{\mathbf {u}}_{h}) . \end{aligned}$$

Then

$$\begin{aligned} |(\varDelta _{E})_{ij}| = |\widehat{e}_{ij} - \widetilde{e}_{ij}| \le |\gamma _{ij}^{(1)}| = {\mathcal {O}}(\beta _{ij}{\mathbf {u}}_{h}) . \end{aligned}$$

For other (ij), we have

$$\begin{aligned} \widehat{e}_{ij}= & {} \frac{\widehat{r}_{ij}}{2}(1 + \delta _{7}) = \frac{r_{ij} + (\varDelta _{R})_{ij}}{2}(1 + \delta _{7}), \quad |\delta _{7}| \le {\mathbf {u}}_{h} \\= & {} \widetilde{e}_{ij} + \gamma _{ij}^{(2)}, \quad |\gamma _{ij}^{(2)}| = {\mathcal {O}}({\mathbf {u}}_{h}) . \end{aligned}$$

Then

$$\begin{aligned} |(\varDelta _{E})_{ij}| = |\widehat{e}_{ij} - \widetilde{e}_{ij}| \le |\gamma _{ij}^{(2)}| = {\mathcal {O}}({\mathbf {u}}_{h}) . \end{aligned}$$

In summary, we obtain

$$\begin{aligned} \Vert \varDelta _{E}\Vert \le \sqrt{\sum _{1 \le i,j \le n}|(\varDelta _{E})_{ij}|^{2}} = {\mathcal {O}}(\beta {\mathbf {u}}_{h}), \quad \beta := \frac{\Vert A\Vert }{\min _{\lambda _{i} \ne \lambda _{j}}|\lambda _{i} - \lambda _{j}|}, \end{aligned}$$
(10)

where \(\beta \) is the reciprocal of the minimum gap between the eigenvalues normalized by \(\Vert A\Vert \). For the computed result \(\widehat{X}'\) of \(X' = \widehat{X}(I + \widetilde{E})\) in Algorithm 1,

$$\begin{aligned} \widehat{X}'= & {} \widehat{X} + \widehat{X}\widehat{E} + \widehat{\varDelta }, \quad \Vert \widehat{\varDelta }\Vert = {\mathcal {O}}({\mathbf {u}}_{h})\\= & {} \widehat{X}(I + \widetilde{E}) + \widehat{X}(\widehat{E} - \widetilde{E}) + \widehat{\varDelta } \\= & {} X' + \widehat{X}\varDelta _{E} + \widehat{\varDelta }, \end{aligned}$$

and, using (10),

$$\begin{aligned} \Vert \widehat{X}' - X'\Vert \le \Vert \widehat{X}\Vert \Vert \varDelta _{E}\Vert + \Vert \widehat{\varDelta }\Vert = {\mathcal {O}}(\beta {\mathbf {u}}_{h}) . \end{aligned}$$
(11)

Thus, if a given \(\widehat{X}\) is sufficiently close to X in such a way that the assumption (3) holds, combining (5) and (11) yields

$$\begin{aligned} \Vert \widehat{X}' - X\Vert \le \Vert \widehat{X}' - X'\Vert + \Vert X' - X\Vert = {\mathcal {O}}(\beta \cdot \max ({\mathbf {u}}_{h}, \Vert E\Vert ^{2})) . \end{aligned}$$
(12)

If A has nearly multiple eigenvalues and (3) does not hold, then the convergence of Algorithm 1 to an eigenvector matrix of A is guaranteed neither in real arithmetic nor in finite precision arithmetic regardless of the value of \({\mathbf {u}}_{h}\). We will deal with such an ill-conditioned case in Sect. 5.

Remark 1

As can be seen from (12), with a fixed \({\mathbf {u}}_{h}\), iterative use of Algorithm 1 eventually computes an approximate eigenvector matrix that is accurate to \({\mathcal {O}}(\beta {\mathbf {u}}_{h})\), provided that the assumption (3) in Theorem 1 holds in each iteration. This will be confirmed numerically in Sect. 6. \(\square \)

Let us consider the most likely scenario where \(\widehat{X}\) is computed by some backward stable algorithm in ordinary precision floating-point arithmetic with the relative rounding error unit \({\mathbf {u}}\). From (2), we have

$$\begin{aligned} \max _{1 \le i \le n}|\sin \theta (x_{(i)},\widehat{x}_{(i)})| \le \alpha , \quad \alpha = {\mathcal {O}}(\beta {\mathbf {u}}) \end{aligned}$$

under the assumption that \(\beta \approx \Vert A\Vert /\min _{1 \le i \le n} gap (\mu _{i})\). Thus,

$$\begin{aligned} \Vert E\Vert \approx \Vert \widehat{X} - X\Vert = {\mathcal {O}}(\beta {\mathbf {u}}) . \end{aligned}$$

From (12), we obtain

$$\begin{aligned} \Vert \widehat{X}' - X\Vert = {\mathcal {O}}(\beta \cdot \max ({\mathbf {u}}_{h}, \beta ^{2}{\mathbf {u}}^{2})) . \end{aligned}$$
(13)

Therefore, \({\mathbf {u}}_{h}\) should be less than \(\beta ^{2}{\mathbf {u}}^{2}\) in order to preserve convergence speed for the first iteration by Algorithm 1.

Suppose that \(\Vert \widehat{X} - X\Vert = c\beta {\mathbf {u}}\) and \(\Vert \widehat{X}' - X\Vert = c'\beta ^{3}{\mathbf {u}}^{2}\) where c and \(c'\) are some constants. If \(c''\beta ^{2}{\mathbf {u}}< 1\) for \(c'' := c'/c\), then an approximation of X is improved in the sense that \(\Vert \widehat{X}' - X\Vert < \Vert \widehat{X} - X\Vert \). In other words, if \(\beta \) is too large such that \(c''\beta ^{2}{\mathbf {u}}\ge 1\), Algorithm 1 may not work well.

In general, define \(E^{(\nu )} \in {\mathbb {R}}^{n \times n}\) such that \(X = \widehat{X}^{(\nu )}(I + E^{(\nu )})\) for \(\nu = 0, 1, \ldots \), where \(\widehat{X}^{(0)}\) is an initial guess and \(\widehat{X}^{(\nu )}\) is a result of the \(\nu \)th iteration of Algorithm 1 with working precision \({\mathbf {u}}_{h}^{(\nu )}\) for \(\nu = 1, 2, \ldots \). To preserve the convergence speed, we need to set \({\mathbf {u}}_{h}^{(\nu )}\) satisfying \({\mathbf {u}}_{h}^{(\nu )} < \Vert E^{(\nu - 1)}\Vert ^{2}\) as can be seen from (12). Although we do not know \(\Vert E^{(\nu - 1)}\Vert \), we can estimate \(\Vert E^{(\nu - 1)}\Vert \) by \(\Vert \widetilde{E}^{(\nu - 1)}\Vert \) where \(\widetilde{E}^{(\nu - 1)}\) is computed at the (\(\nu - 1\))-st iteration of Algorithm 1.

4 Effect of nearly multiple eigenvalues in basic algorithm

In general, a given matrix A in floating-point format does not have exactly multiple eigenvalues. It is necessary to discuss the behavior of Algorithm 1 for A with some nearly multiple eigenvalues \(\lambda _{i}\approx \lambda _{j}\) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \omega \) in line 7. We basically discuss the behavior in real arithmetic. The effect of the rounding error is briefly explained in Remark 2 at the end of this section.

For simplicity, we assume \(\widetilde{\lambda }_{1} \le \widetilde{\lambda }_{2} \le \cdots \le \widetilde{\lambda }_{n}\). In the following analysis, define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) with

$$\begin{aligned} {\lambda }_{1}^{(\omega )}={\lambda }_{1},\quad \lambda _{i}^{(\omega )}= \left\{ \begin{array}{ll} {\lambda }_{i-1}^{(\omega )} &{} \ \ \text {if} \ \widetilde{\lambda }_{i} - \widetilde{\lambda }_{i-1} \le \omega \\ {\lambda }_{i} &{} \ \ \text {otherwise} \end{array}\right. \ \text {for} \ 2 \le i \le n, \end{aligned}$$
(14)

which means that the clustered eigenvalues of \(A_{\omega }\) are all multiple in each cluster. Then, \(A_{\omega }\) is a perturbed matrix such that

$$\begin{aligned} A_{\omega } = A + \varDelta _{\omega },\quad \Vert \varDelta _{\omega }\Vert = \Vert D - D_{\omega }\Vert = \max _{1\le i \le n} |\lambda _{i}-\lambda _{i}^{(\omega )}|. \end{aligned}$$

Throughout this section, we assume that

$$\begin{aligned} |\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}| \le \omega \quad \text {for } (i,j) \ \text {such that } \lambda _{i}^{(\omega )}=\lambda _{j}^{(\omega )} . \end{aligned}$$
(15)

Importantly, although each individual eigenvector associated with the nearly multiple eigenvalues is very sensitive to perturbations, the subspace spanned by such eigenvectors is not sensitive. Thus, \(\widehat{X}\) computed by a backward stable algorithm is sufficiently close to an eigenvector matrix of \(A_{\omega }\). Below, we show that Algorithm 1 computes \(\widetilde{E}\) that approximates an exact eigenvector matrix \(Y_{\omega }\) defined in the same manner as Y in Sect. 2. Note that \(Y_{\omega }\) is the eigenvector matrix of the above \(A_{\omega }\) close to A, where \(A_{\omega }\) has exactly multiple eigenvalues.

Recall that the submatrices of \(\widehat{X}^{-1}Y_{\omega }\) corresponding to the multiple eigenvalues of \(A_{\omega }\) are symmetric and positive definite. Then, we see that Algorithm 1 computes an approximation of \(Y_{\omega }\) as follows. Define R and \(S_{\omega }\) as

$$\begin{aligned} R := I-\widehat{X}^{\mathrm {T}}\widehat{X}, \quad S_{\omega } := \widehat{X}^{\mathrm {T}}A_{\omega }\widehat{X}, \end{aligned}$$

corresponding to \(A_{\omega }\). We see \(S_{\omega }\) is considered a perturbed matrix of \(S := \widehat{X}^{\mathrm {T}}A\widehat{X}\). Note that, in Algorithm 1, \(\widetilde{E}\) is computed with R and S. Here, we introduce an ideal matrix \(\widetilde{E}_{\omega }\) computed with R and \(S_{\omega }\), where \(\widetilde{E}_{\omega }\) is quadratically convergent to \(F_{\omega }\). In the following, we estimate \(\widetilde{E}_{\omega }-\widetilde{E}\) due to the above perturbation. To this end, we estimate each element of \(S_{\omega }-S\) as in the following lemma.

Lemma 1

Let A be a real symmetric \(n \times n\) matrix with eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix X. In Algorithm 1, for a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) as in (14), and

$$\begin{aligned} \delta _{\omega }:=\max _{1\le i \le n}|\lambda _{i}-\lambda _{i}^{(\omega )}|. \end{aligned}$$
(16)

In addition, define \(S_{\omega }:=\widehat{X}^{\mathrm {T}}A_{\omega }\widehat{X}\). Then, we have

figure b

for all (ij), where

$$\begin{aligned} \epsilon _{\omega }:=\Vert F_{\omega }\Vert , \quad \chi (\epsilon _{\omega }):= \frac{(3 - 2\epsilon _{\omega })}{(1 - \epsilon _{\omega })^{2}} . \end{aligned}$$

Proof

Define Q such that \(X=Y_{\omega }Q\). Then, Q is a block diagonal matrix. More precisely, for \(Q=(q_{ij})\), we have

$$\begin{aligned} q_{ij}=0\ \text {for }(i,j)\text { such that } \lambda _{i}^{(\omega )}\not =\lambda _{j}^{(\omega )}. \end{aligned}$$

It is easy to see that

$$\begin{aligned} S-S_{\omega }=\widehat{X}^{\mathrm {T}}(A-A_{\omega })\widehat{X}=\widehat{X}^{\mathrm {T}}Y_{\omega }Q(D-D_{\omega })Q^{\mathrm {T}}Y_{\omega }^{\mathrm {T}}\widehat{X}. \end{aligned}$$

Let \(D_{Q}:=Q(D-D_{\omega })Q^{\mathrm {T}}\). In a similar way to (8), we have

$$\begin{aligned} D_{Q} - D_{Q}F_{\omega } - F_{\omega }^{\mathrm {T}}D_{Q} = (S-S_{\omega }) + \varDelta (\delta _{\omega }), \end{aligned}$$

where

$$\begin{aligned} \Vert D_{Q}\Vert =\delta _{\omega },\quad \Vert \varDelta (\delta _{\omega })\Vert \le \chi (\epsilon _{\omega }) \epsilon _{\omega }^{2}\delta _{\omega }. \end{aligned}$$

Then (17) follows. Moreover, noting \(D_{Q}\) is a block diagonal matrix, we obtain (18). \(\square \)

For the perturbation analysis of \(\widetilde{E}\), the next lemma is crucial.

Lemma 2

Let A be a real symmetric \(n \times n\) matrix with eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), and a corresponding orthogonal eigenvector matrix X. In Algorithm 1, for a given nonsingular \(\widehat{X} \in {\mathbb {R}}^{n\times n}\), define \(A_{\omega } := XD_{\omega } X^{\mathrm {T}}\) where \(D_{\omega }=\mathrm {diag}(\lambda _{i}^{(\omega )})\) as in (14). Assume that (15) is satisfied. Define \(R = (r_{ij})\) and \(S_{\omega } = (s^{(\omega )}_{ij})\) such that \(R:=I-\widehat{X}^{\mathrm {T}}\widehat{X}\) and \(S_{\omega }:=\widehat{X}^{\mathrm {T}}A_{\omega }\widehat{X}\). Suppose positive numbers \(\omega _{1}\) and \(\omega _{2}\) satisfy

$$\begin{aligned} |s_{ij}-s^{(\omega )}_{ij}| \le {\left\{ \begin{array}{ll} {\omega }_{1} &{} \text {if } \lambda _{i}^{(\omega )}=\lambda _{j}^{(\omega )} \\ {\omega }_{2} &{} \text {otherwise} \end{array}\right. } \quad \text {for all } (i,j) . \end{aligned}$$

We assume that, for all (ij) in line 7 of Algorithm 1, the formulas of \(\widetilde{e}^{(\omega )}_{ij}\) are the same as those of \(\widetilde{e}_{ij}\), i.e.,

$$\begin{aligned} \widetilde{e}^{(\omega )}_{ij} = \left\{ \begin{array}{ll} \dfrac{s^{(\omega )}_{ij} + \widetilde{\lambda }^{(\omega )}_{j}r_{ij}}{\widetilde{\lambda }^{(\omega )}_{j} - \widetilde{\lambda }^{(\omega )}_{i}} &{} \ \text {if} \ |\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| > \omega \\ r_{ij}/2 &{} \ \text {otherwise} \end{array}\right. \quad \text {for all } (i,j), \end{aligned}$$

where \(\widetilde{\lambda }^{(\omega )}_{i}=s^{(\omega )}_{ii}/(1-r_{ii})\) for \(i = 1, 2, \ldots , n\), as in line 4. Moreover, let

$$\begin{aligned} c := \max _{1\le i \le n} \frac{1}{1-{r}_{ii}}. \end{aligned}$$
(19)

Then, for (ij) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \omega \), we have

$$\begin{aligned} \widetilde{e}^{(\omega )}_{ij}=\widetilde{e}_{ij}. \end{aligned}$$
(20)

Moreover, for (ij) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|> \omega \), we have

$$\begin{aligned} |\widetilde{e}^{(\omega )}_{ij}-\widetilde{e}_{ij}| \le \frac{2c\,\omega _{1}}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|-2c\,\omega _{1}} \frac{|s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}|}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|} + \frac{\omega _{2}+c\,\omega _{1}|{r}_{ij}|}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|}. \end{aligned}$$
(21)

Proof

For (ij) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \omega \), since we see

$$\begin{aligned} \widetilde{e}^{(\omega )}_{ij}=\frac{r_{ij}}{2}=\widetilde{e}_{ij}, \end{aligned}$$

we have (20). Next, for \(\widetilde{\lambda }^{(\omega )}_{i}\), \(i=1,\ldots ,n\), we have

$$\begin{aligned} \widetilde{\lambda }^{(\omega )}_{i}=\frac{s^{(\omega )}_{ii}}{1-{r}_{ii}}, \quad |\widetilde{\lambda }^{(\omega )}_{i}-\widetilde{\lambda }_{i}|\le \frac{\omega _{1}}{1-{r}_{ii}} \quad \text {for } i=1,\ldots ,n . \end{aligned}$$
(22)

Thus, from (19), we have

$$\begin{aligned} |\widetilde{\lambda }^{(\omega )}_{i}-\widetilde{\lambda }_{i}|\le c\,\omega _{1} \quad \text {for } i=1,\ldots ,n. \end{aligned}$$
(23)

For (ij) such that \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}| > \omega \), since we see

$$\begin{aligned} \widetilde{e}^{(\omega )}_{ij}=\frac{s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}}{\widetilde{\lambda }^{(\omega )}_{j}-\widetilde{\lambda }^{(\omega )}_{i}}, \end{aligned}$$

we evaluate the errors based on the following inequalities:

$$\begin{aligned} |\widetilde{e}^{(\omega )}_{ij}-\widetilde{e}_{ij}| \le \left| \widetilde{e}^{(\omega )}_{ij}-\frac{s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}} \right| + \left| \widetilde{e}_{ij}-\frac{s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}} \right| . \end{aligned}$$

In the right-hand side, using (22) and (23), we see

$$\begin{aligned} \left| \widetilde{e}^{(\omega )}_{ij}-\frac{s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}} \right|\le & {} \frac{2c\,\omega _{1}}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|-2c\,\omega _{1}} \frac{|s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}|}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|}, \\ \left| \widetilde{e}_{ij}-\frac{s^{(\omega )}_{ij}+\widetilde{\lambda }^{(\omega )}_{j}{r}_{ij}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}} \right|\le & {} \frac{\omega _{2}+c\,\omega _{1}|{r}_{ij}|}{|\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}|}. \end{aligned}$$

Therefore, we obtain (21). \(\square \)

In the following, we estimate \(\omega _{1}\) and \(\omega _{2}\) in Lemma 2. In the right-hand sides of (17) and (18) in Lemma 1, we see

$$\begin{aligned} \left\{ \begin{array}{l} \delta _{\omega }(1+2 \epsilon _{\omega }+\chi (\epsilon _{\omega })\epsilon _{\omega }^{2}) \rightarrow \delta _{\omega } \\ \epsilon _{\omega }\delta _{\omega }(2 +\chi (\epsilon _{\omega }){\epsilon _{\omega }}) \rightarrow 2\epsilon _{\omega }\delta _{\omega } \end{array}\right. \quad \text {as } \epsilon _{\omega } \rightarrow 0 . \end{aligned}$$

If DFS in (8) are replaced with \(D_{\omega },F_{\omega },S_{\omega }\), respectively, we see \(s^{(\omega )}_{ij}={\mathcal {O}}(\Vert A_{\omega }\Vert \epsilon _{\omega })\ (i\not =j)\) as \(\epsilon _{\omega } \rightarrow 0\). In addition, \(r_{ij}={\mathcal {O}}(\epsilon _{\omega })\ (i\not =j)\) as \(\epsilon _{\omega } \rightarrow 0\) from (7). Hence, letting \(\omega _{1}=\delta _{\omega }\) and \(\omega _{2}=2\epsilon _{\omega }\delta _{\omega }\) in Lemma 2, we obtain

$$\begin{aligned} \Vert \widetilde{E}_{\omega }-\widetilde{E}\Vert ={\mathcal {O}}\left( \frac{\epsilon _{\omega }\delta _{\omega }}{\min _{|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|> \omega }|\lambda _{i}-\lambda _{j}|}\right) \quad \text {as } \epsilon _{\omega } \rightarrow 0 . \end{aligned}$$

Since we suppose \(\delta _{\omega }={\mathcal {O}}(\Vert A\Vert \epsilon _{\omega })\) in the situation where \(\widehat{X}\) is computed by a backward stable algorithm, we have

$$\begin{aligned} \Vert \widetilde{E}_{\omega }-\widetilde{E}\Vert ={\mathcal {O}}\left( \frac{\Vert A\Vert \epsilon _{\omega }^{2}}{\min _{|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|> \omega }|\lambda _{i}-\lambda _{j}|}\right) \quad \text {as } \epsilon _{\omega } \rightarrow 0 . \end{aligned}$$

Therefore, \(\widetilde{E}\) is sufficiently close to \(\widetilde{E}_{\omega }\) and \(F_{\omega }\) under the above mild assumptions. Although \(Y_{\omega }\) can be very far from any eigenvector matrix of A, the subspace spanned by the columns of \(Y_{\omega }\) corresponding to the clustered eigenvalues adequately approximates that by the exact eigenvectors whenever \(\Vert A_{\omega }-A\Vert \) is sufficiently small. In the following, we derive an algorithm for clustered eigenvalues using such an important feature.

Remark 2

In this section, we proved that \(\widetilde{E}\) is sufficiently close to \(F_{\omega }\) under the mild assumptions. In Sect. 3, the effect of rounding errors on \(\widetilde{E}\) is evaluated as in (10), i.e., \(\varDelta _{E}:=\widehat{E}-\widetilde{E}\) is sufficiently small, where \(\widehat{E}\) is computed in finite precision arithmetic. The rounding error analysis is not caused by the perturbation analysis to \(F_{\omega }\) in this section. Thus, it is easy to see that \(\Vert \widehat{E}-F_{\omega }\Vert \le \Vert \widehat{E}-F_{\omega }\Vert +\Vert \widehat{E}-F_{\omega }\Vert \) simply holds for the individual estimation for each error \(\Vert \widehat{E}-F_{\omega }\Vert \) and \(\Vert \widehat{E}-F_{\omega }\Vert \) respectively, and hence, the computed \(\widehat{E}\) is sufficiently close to \(F_{\omega }\) corresponding to \(A_{\omega }\). \(\square \)

5 Proposed algorithm for nearly multiple eigenvalues

On the basis of the basic algorithm (Algorithm 1), we propose a practical version of an algorithm for improving the accuracy of computed eigenvectors of symmetric matrices that can also deal with nearly multiple eigenvalues.

Recall that, in Algorithm 1, we choose \(\widetilde{e}_{ij}\) for all (ij) as

figure c

where \(\omega \) is defined in line 6 of the algorithm.

5.1 Observation

First, we show the drawback of Algorithm 1 concerning clustered eigenvalues. For this purpose, we take

$$\begin{aligned} A = \begin{bmatrix} 1+\varepsilon&1&1+\varepsilon \\ 1&1&-1 \\ 1+\varepsilon&-1&1 + \varepsilon \\ \end{bmatrix}, \quad \left\{ \begin{array}{l} \lambda _{1} = -1 \\ \lambda _{2} = 2 \\ \lambda _{3} = 2 + 2\varepsilon \\ \end{array}\right. \quad \hbox {for any }\varepsilon > 0 \end{aligned}$$
(26)

as an example, where \(\lambda _{2}\) and \(\lambda _{3}\) are nearly double eigenvalues for small \(\varepsilon \). We set \(\varepsilon = 2^{-50} \approx 10^{-15}\) and adopt the MATLAB built-in function \(\mathsf {eig}\) in IEEE 754 binary64 arithmetic to obtain \(X^{(0)} := \widehat{X}\). Then, we apply Algorithm 1 iteratively to A and \(X^{(\nu )}\) beginning from \(\nu = 0\). To check the accuracy of \(X^{(\nu )}\) with respect to orthogonality and diagonality, we display \(R^{(\nu )} := I - (X^{(\nu )})^{\mathrm {T}}X^{(\nu )}\) and \(S^{(\nu )} := (X^{(\nu )})^{\mathrm {T}}AX^{(\nu )}\).

For \(X^{(0)}\) obtained by \(\mathsf {eig}\), we obtain the following results.

$$\begin{aligned} R^{(0)}&\approx \begin{bmatrix} \texttt {-2.7e-16}&\texttt {-1.3e-16}&\texttt {-6.8e-17} \\ \texttt {-1.3e-16}&\texttt { 1.4e-16}&\texttt {-5.0e-17} \\ \texttt {-6.8e-17}&\texttt {-5.0e-17}&\texttt {-2.2e-16} \\ \end{bmatrix},&S^{(0)}&\approx \begin{bmatrix} \texttt {-1.0e+00}&\texttt {-1.3e-16}&\texttt {-6.8e-17} \\ \texttt {-1.3e-16}&\texttt { 2.0e+00}&\texttt { 1.7e-17} \\ \texttt {-6.8e-17}&\texttt { 1.7e-17}&\texttt { 2.0e+00} \\ \end{bmatrix} \end{aligned}$$

The following shows the results of two iterations of Algorithm 1 in real arithmetic.

$$\begin{aligned} R^{(1)}&\approx \begin{bmatrix} \texttt { 5.4e-32}&\texttt { 9.1e-33}&\texttt { 2.6e-32} \\ \texttt { 9.1e-33}&\texttt { 1.2e-33}&\texttt { 4.4e-33} \\ \texttt { 2.6e-32}&\texttt { 4.4e-33}&\texttt { 4.0e-32} \\ \end{bmatrix},&S^{(1)}&\approx \left[ \begin{array}{c|cc} \texttt {-1.0e+00} &{} \texttt { 9.1e-33} &{} \texttt { 2.6e-32} \\ \hline \texttt { 9.1e-33} &{} \texttt { 2.0e+00} &{} \texttt {-8.3e-17} \\ \texttt { 2.6e-32} &{} \texttt {-8.3e-17} &{} \texttt { 2.0e+00} \\ \end{array}\right] \\ R^{(2)}&\approx \left[ \begin{array}{c|cc} \texttt { 2.2e-63} &{} \texttt { 1.2e-33} &{} \texttt {-4.3e-34} \\ \hline \texttt { 1.2e-33} &{} \texttt {-2.2e-03} &{} \texttt { 9.1e-34} \\ \texttt {-4.3e-34} &{} \texttt { 9.1e-34} &{} \texttt {-2.2e-03} \\ \end{array}\right] ,&S^{(2)}&\approx \left[ \begin{array}{c|cc} \texttt {-1.0e+00} &{} \texttt { 1.2e-33} &{} \texttt {-4.3e-34} \\ \hline \texttt { 1.2e-33} &{} \texttt { 2.0e+00} &{} \texttt { 1.8e-19} \\ \texttt {-4.3e-34} &{} \texttt { 1.8e-19} &{} \texttt { 2.0e+00} \\ \end{array}\right] \end{aligned}$$

In the first iteration, \(|\widetilde{\lambda }_{2} - \widetilde{\lambda }_{3}| \approx 1.77 \cdot 10^{-15}\) and \(\omega \approx 2.17 \cdot 10^{-15}\), so that \(|\widetilde{\lambda }_{2} - \widetilde{\lambda }_{3}| < \omega \) and Algorithm 1 regards \(\widetilde{\lambda }_{2}\) and \(\widetilde{\lambda }_{3}\) as clustered eigenvalues. Then, the diagonality corresponding to \(\lambda _{2}\) and \(\lambda _{3}\) is not improved due to the choice (24), while the orthogonality of \(X^{(1)}\) is refined due to the choice (25). In the second iteration, \(|\widetilde{\lambda }_{2} - \widetilde{\lambda }_{3}| \approx 1.77 \cdot 10^{-15}\) and \(\omega \approx 1.66 \cdot 10^{-16}\), so that \(|\widetilde{\lambda }_{2} - \widetilde{\lambda }_{3}| > \omega \) and Algorithm 1 regards \(\widetilde{\lambda }_{2}\) and \(\widetilde{\lambda }_{3}\) as separated eigenvalues. However, \(\Vert E\Vert \approx 4.69\cdot 10^{-2} > 1/100\), i.e., the assumption (3) in Theorem 1 is not satisfied. As a result, the orthogonality of \(X^{(2)}\) corresponding to \(\lambda _{2}\) and \(\lambda _{3}\) is badly broken, and the refinement of the diagonality stagnates with respect to the nearly double eigenvalues \(\lambda _{2}\) and \(\lambda _{3}\).

In the following, we overcome such a problem for general symmetric matrices.

5.2 Outline of the proposed algorithm

As mentioned in Sect. 1, the \(\sin \theta \) theorem by Davis–Kahan suggests that backward stable algorithms can provide a sufficiently accurate initial guess of a subspace spanned by eigenvectors associated with clustered eigenvalues for each cluster. We explain how to refine approximate eigenvectors by extracting them from the subspace correctly.

Suppose that Algorithm 1 is applied to \(A = A^{\mathrm {T}} \in {\mathbb {R}}^{n \times n}\) and its approximate eigenvector matrix \(\widehat{X} \in {\mathbb {R}}^{n \times n}\). Then, we obtain \(X'\), \(\widetilde{\lambda }\), and \(\omega \) where \(X' \in {\mathbb {R}}^{n \times n}\) is a refined approximate eigenvector matrix, \(\widetilde{\lambda }_{i}\), \(i = 1, 2, \ldots , n\), are approximate eigenvalues, and \(\omega \in {\mathbb {R}}\) is the criterion that determines whether \(\widetilde{\lambda }_{i}\) are clustered. Using \(\widetilde{\lambda }\) and \(\omega \), we can easily obtain the index sets \({\mathcal {J}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\), for the clusters \(\{\widetilde{\lambda }_{i}\}_{i \in {\mathcal {J}}_{k}}\) of the approximate eigenvalues satisfying all the following conditions (see also Fig. 1).

$$\begin{aligned} \left\{ \begin{array}{l} \text {(a)} \ {\mathcal {J}}_{k} \subseteq \{1,2,\ldots ,n\} \ \text {with} \ n_{k} := |{\mathcal {J}}_{k}| \ge 2 \\ \text {(b)} \ {\displaystyle \min _{j \in {\mathcal {J}}_{k}\backslash \{i\}}|\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| \le \omega }, \ \forall i \in {\mathcal {J}}_{k} \\ \text {(c)} \ |\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| > \omega , \ \forall i \in {\mathcal {J}}_{k}, \ \forall j \in \{1,2,\ldots ,n\}\,\backslash \, {\mathcal {J}}_{k} \end{array}\right. . \end{aligned}$$
(27)
Fig. 1
figure 1

Relationship between \(\widetilde{\lambda }_{i}\) and \({\mathcal {J}}_{k}\) (short vertical lines denote \(\widetilde{\lambda }_{i}\))

Now the problem is how to refine \(X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\), which denotes the matrix comprising approximate eigenvectors corresponding to the clustered approximate eigenvalues \(\{\widetilde{\lambda }_{i}\}_{i \in {\mathcal {J}}_{k}}\).

From the observation about the numerical results in the previous section, we develop the following procedure for the refinement.

  1. 1.

    Find clusters of approximate eigenvalues of A and obtain the index sets \({\mathcal {J}}_{k}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\) for those clusters.

  2. 2.

    Define \(V_{k} := X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\) where \(n_{k} := |{\mathcal {J}}_{k}|\).

  3. 3.

    Compute \(T_{k} = V_{k}^{\mathrm {T}}(A - \mu _{k}I)V_{k}\) where \(\mu _{k} := (\min _{i \in {\mathcal {J}}_{k}}\widetilde{\lambda }_{i} + \max _{i \in {\mathcal {J}}_{k}}\widetilde{\lambda }_{i})/2\).

  4. 4.

    Perform the following procedure for each \(T_{k} \in {\mathbb {R}}^{n_{k} \times n_{k}}\).

    1. (i)

      Compute an eigenvector matrix \(W_{k}\) of \(T_{k}\).

    2. (ii)

      Update \(X'(:,{\mathcal {J}}_{k}) \in {\mathbb {R}}^{n \times n_{k}}\) by \(V_{k}W_{k}\).

This procedure is interpreted as follows. We first apply an approximate similarity transformation to A using a refined eigenvector matrix \(X'\), such as \(S' := (X')^{\mathrm {T}}AX'\). Then, we divide the problem for \(S' \in {\mathbb {R}}^{n \times n}\) into subproblems for \(S'_{k} \in {\mathbb {R}}^{n_{k} \times n_{k}}\), \(k = 1, 2, \ldots , n_{{\mathcal {J}}}\), corresponding to the clusters. We then apply a diagonal shift to \(S'_{k}\), such as \(T_{k} := S'_{k} - \mu _{k}I\) to relatively separate the clustered eigenvalues around \(\mu _{k}\). Rather than using these to obtain \(T_{k}\), we perform steps 2 and 3 in view of computational efficiency and accuracy. Finally, we update the columns of \(X'\) corresponding to \({\mathcal {J}}_{k}\) using an eigenvector matrix \(W_{k}\) of \(T_{k}\) by \(V_{k}W_{k}\).

5.3 Proposed algorithm

Here, we present a practical version of a refinement algorithm for eigenvalue decomposition of a real symmetric matrix A, which can also be applied to the case where A has clustered eigenvalues.

figure d

In Algorithm 2, the function \({\mathsf {f}}{\mathsf {l}}(C)\) rounds an input matrix \(C \in {\mathbb {R}}^{n \times n}\) to a matrix \(T \in {\mathbb {F}}^{n \times n}\), where \({\mathbb {F}}\) is a set of floating-point numbers in ordinary precision, such as the IEEE 754 binary64 format. Here, “round-to-nearest” rounding is not required; however, some faithful rounding, such as chopping, is desirable. Moreover, the function \(\mathsf {eig}(T)\) is similar to the MATLAB function, which computes all approximate eigenvectors of an input matrix \(T \in {\mathbb {F}}^{n \times n}\) in working precision arithmetic. This is expected to adopt some backward stable algorithm as implemented in the LAPACK routine xSYEV [2]. From lines 13–17 in Algorithm 2, we aim to obtain sufficiently accurate approximate eigenvectors \(X'(:,{\mathcal {J}}_{k})\) of A, where the columns of \(X'(:,{\mathcal {J}}_{k})\) correspond to \({\mathcal {J}}_{k}\). For this purpose, we iteratively apply Algorithm 1 (\(\mathsf {RefSyEv}\)) to \(A - \mu _{k}I\) and \(V_{k}^{(\nu )}\) until \(V_{k}^{(\nu )}\) for some \(\nu \) becomes as accurate as other eigenvectors associated with well-separated eigenvalues. Note that the spectral norms \(\Vert \widetilde{E}\Vert _{2}\) and \(\Vert \widetilde{E}_{k}\Vert _{2}\) can be replaced by the Frobenius norms \(\Vert \widetilde{E}\Vert _{\mathrm {F}}\) and \(\Vert \widetilde{E}_{k}\Vert _{\mathrm {F}}\).

For the example (26), we apply Algorithm 2 (\(\mathsf {RefSyEvCL}\)) to A and the same initial guess \(X^{(0)}\) as before. The results of two iterations are as follows.

$$\begin{aligned} R^{(1)}&\approx \begin{bmatrix} \texttt { 5.4e-32}&\texttt {-1.0e-32}&\texttt { 2.5e-32} \\ \texttt {-1.0e-32}&\texttt { 1.4e-33}&\texttt { 2.9e-49} \\ \texttt { 2.5e-32}&\texttt { 2.9e-49}&\texttt { 1.4e-33} \\ \end{bmatrix},&S^{(1)}&\approx \begin{bmatrix} \texttt {-1.0e+00}&\texttt {-1.0e-32}&\texttt { 2.5e-32} \\ \texttt {-1.0e-32}&\texttt { 2.0e+00}&\texttt {-1.3e-48} \\ \texttt { 2.5e-32}&\texttt {-1.3e-48}&\texttt { 2.0e+00} \\ \end{bmatrix} \\ R^{(2)}&\approx \begin{bmatrix} \texttt { 2.2e-63}&\texttt {-5.5e-64}&\texttt { 1.4e-63} \\ \texttt {-5.5e-64}&\texttt { 1.1e-64}&\texttt {-2.6e-64} \\ \texttt { 1.4e-63}&\texttt {-2.6e-64}&\texttt { 6.5e-64} \\ \end{bmatrix},&S^{(2)}&\approx \begin{bmatrix} \texttt {-1.0e+00}&\texttt {-5.5e-64}&\texttt { 1.4e-63} \\ \texttt {-5.5e-64}&\texttt { 2.0e+00}&\texttt {-2.6e-64} \\ \texttt { 1.4e-63}&\texttt {-2.6e-64}&\texttt { 2.0e+00} \\ \end{bmatrix} \end{aligned}$$

Thus, Algorithm 2 works well for this example, i.e., the approximate eigenvectors corresponding to the nearly double eigenvalues \(\lambda _{2}\) and \(\lambda _{3}\) are improved in terms of both orthogonality and diagonality.

Remark 3

For a generalized symmetric definite eigenvalue problem \(Ax = \lambda Bx\) where A and B are real symmetric with B being positive definite, we can modify the algorithms as follows.

  • In Algorithm 1 called at line 2 in Algorithm 2, replace \(R \leftarrow I - \widehat{X}^{\mathrm {T}}\widehat{X}\) with \(R \leftarrow I - \widehat{X}^{\mathrm {T}}B\widehat{X}\).

  • Replace \(A_{k} \leftarrow A - \mu _{k}I\) with \(A_{k} \leftarrow A - \mu _{k}B\) in line 8 of Algorithm 2.

Note that B does not appear in Algorithm 1 called at line 14 in Algorithm 2. \(\square \)

6 Numerical results

We present numerical results to demonstrate the effectiveness of the proposed algorithm (Algorithm 2: RefSyEvCL). All numerical experiments discussed in this section were conducted using MATLAB R2016b on our workstation with two CPUs (3.0 GHz Intel Xeon E5-2687W v4 (12 cores)) and 1 TB of main memory, unless otherwise specified. Let \({\mathbf {u}}\) denote the relative rounding error unit (\({\mathbf {u}}= 2^{-24}\) for IEEE binary32 and \({\mathbf {u}}= 2^{-53}\) for binary64). To realize multiple-precision arithmetic, we adopt Advanpix Multiprecision Computing Toolbox version 4.2.3 [1], which utilizes well-known, fast, and reliable multiple-precision arithmetic libraries including GMP and MPFR. We also use the multiple-precision arithmetic with sufficiently long precision to simulate real arithmetic. In all cases, we use the MATLAB function norm for computing the spectral norms \(\Vert R\Vert \) and \(\Vert S - \widetilde{D}\Vert \) in Algorithm 1 in binary64 arithmetic, and we approximate \(\Vert A\Vert \) by \(\max (|\widetilde{\lambda }_{1}|,|\widetilde{\lambda }_{n}|)\). We discuss numerical experiments for some dozens of seeds for the random number generator, and all results are similar to those provided in this section. Therefore, we adopt the default seed as a typical example using the MATLAB command rng(‘default’) to ensure reproducibility of problems.

6.1 Convergence property

Here, we confirm the convergence property of the proposed algorithm for various eigenvalue distributions.

6.1.1 Various eigenvalue distributions

In the same way as the previous paper [17], we again generate real symmetric and positive definite matrices using the MATLAB function randsvd from Higham’s test matrices [11] by the following MATLAB command.

figure e

The eigenvalue distribution and condition number of A can be controlled by the input arguments \(\texttt {mode} \in \{1,2,3,4,5\}\) and \(\texttt {cnd} =: \alpha \ge 1\), as follows:

  1. 1.

    one large: \(\lambda _{1} \approx 1\), \(\lambda _{i} \approx \alpha ^{-1}\), \(i = 2,\ldots ,n\)

  2. 2.

    one small: \(\lambda _{n} \approx \alpha ^{-1}\), \(\lambda _{i} \approx 1\), \(i = 1,\ldots ,n-1\)

  3. 3.

    geometrically distributed: \(\lambda _{i} \approx \alpha ^{-(i - 1)/(n - 1)}\), \(i = 1,\ldots ,n\)

  4. 4.

    arithmetically distributed: \(\lambda _{i} \approx 1 - (1 - \alpha ^{-1})(i - 1)/(n - 1)\), \(i = 1,\ldots ,n\)

  5. 5.

    random with uniformly distributed logarithm: \(\lambda _{i} \approx \alpha ^{-r(i)}\), \(i = 1,\ldots ,n\), where r(i) are pseudo-random values drawn from the standard uniform distribution on (0, 1).

Here, \(\kappa (A) \approx \texttt {cnd}\) for \(\texttt {cnd} < {\mathbf {u}}^{-1} \approx 10^{16}\). As shown in [17], for \(\texttt {mode} \in \{1,2\}\), there is a cluster of nearly multiple eigenvalues, so that Algorithm 1 (RefSyEv) does not work effectively.

As in [17], we set \(n = 10\) and \(\texttt {cnd} = 10^{8}\) to generate moderately ill-conditioned problems in binary64 and consider the computed results obtained using multiple-precision arithmetic with sufficiently long precision as the exact eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). We compute \(X^{(0)}\) as an initial approximate eigenvector matrix using the MATLAB function eig in binary64 arithmetic.

In the previous paper [17], we observed the quadratic convergence of Algorithm 1 in the case of \(\texttt {mode} \in \{3,4,5\}\), while Algorithm 1 failed to improve the accuracy of the initial approximate eigenvectors in the case of \(\texttt {mode} \in \{1, 2\}\), since the test matrices for \(\texttt {mode} \in \{1, 2\}\) have nearly multiple eigenvalues. To confirm the behavior of Algorithm 2, we apply Algorithm 2 to the same examples. The results are shown in Fig. 2, which provides \(\max _{1 \le i \le n}|\widehat{\lambda }_{i} - \lambda _{i}|/|\lambda _{i}|\) as the maximum relative error of the computed eigenvalues \(\widehat{\lambda }_{i}\), \(\Vert \mathrm {offdiag}(\widehat{X}^{\mathrm {T}}A\widehat{X})\Vert /\Vert A\Vert \) as the diagonality of \(\widehat{X}^{\mathrm {T}}A\widehat{X}\), \(\Vert I - \widehat{X}^{\mathrm {T}}\widehat{X}\Vert \) as the orthogonality of a computed eigenvector matrix \(\widehat{X}\), and \(\Vert \widehat{E}\Vert \) where \(\widehat{E}\) is a computed result of \(\widetilde{E}\) in Algorithm 1. Here, \(\mathrm {offdiag}(\cdot )\) denotes the off-diagonal part of a given matrix. The horizontal axis shows the number of iterations \(\nu \) of Algorithm 2. As can be seen from the results, Algorithm 2 works very well even in the case of \(\texttt {mode} \in \{1, 2\}\).

Fig. 2
figure 2

Results of iterative refinement by Algorithm 2 (\(\mathsf {RefSyEvCL}\)) in real arithmetic for symmetric and positive definite matrices generated by \(\mathsf {randsvd}\) with \(n = 10\) and \(\kappa (A) \approx 10^{8}\)

6.1.2 Clustered eigenvalues

As an example of clustered eigenvalues, we show the results for the Wilkinson matrix [23], which is symmetric and tridiagonal with pairs of nearly equal eigenvalues. The Wilkinson matrix \(W_{n} = (w_{ij}) \in {\mathbb {R}}^{n \times n}\) consists of diagonal entries \(w_{ii} := \frac{|n - 2i + 1|}{2}\), \(i = 1, 2, \ldots , n\), and super- and sub-diagonal entries being all ones. We apply Algorithm 2 to the Wilkinson matrix with \(n = 21\). The results are displayed in Fig. 3. As can be seen, Algorithm 2 works well.

Fig. 3
figure 3

Results of iterative refinement by Algorithm 2 (\(\mathsf {RefSyEvCL}\)) in real arithmetic for the Wilkinson matrix with \(n = 21\)

Next, we show the convergence behavior of Algorithm 2 with limited computational precision for larger matrices with various \(\beta \), which denotes the reciprocal of the minimum gap between the eigenvalues normalized by \(\Vert A\Vert \) as defined in (10). If \(\beta \) is too large such as \(\beta ^{2}{\mathbf {u}}\ge 1\) as mentioned at the end of Sect. 3, we cannot expect to improve approximate eigenvectors by Algorithm 1. We generate test matrices as follows. Set \(k \in {\mathbb {N}}\) with \(k \le n - 2\). Let \(A = QDQ^{\mathrm {T}}\), where Q is an orthogonal matrix and D is a diagonal matrix where

$$\begin{aligned} d_{ii} = \left\{ \begin{array}{ll} 1 - (i - 1)\beta ^{-1} &{} \ \text {for} \ i = 1, 2, \ldots , k \\ -1 + \dfrac{n - i}{n - k - 1}2^{-1} &{} \ \text {for} \ i = k + 1, \ldots , n \\ \end{array}\right. . \end{aligned}$$

Then, k eigenvalues are clustered close to 1 with the gap \(\beta ^{-1}\), and \(n - k\) eigenvalues are distributed equally in \([-1,-\frac{1}{2}]\). We compute \(A \approx \widetilde{Q}\widetilde{D}\widetilde{Q}^{\mathrm {T}}\) in IEEE 754 binary64 arithmetic, where \(\widetilde{Q}\) is a pseudo-random approximate orthogonal matrix and \(\widetilde{D}\) is a floating-point approximation of D. We fix \(n = 100\) and \(k = 10\) and vary \(\beta \) between \(10^{2}\) and \(10^{14}\). To make the results more illustrative, we provide a less accurate initial guess \(X^{(0)}\) using binary32 arithmetic. In the algorithm, we adopt binary128 (so-called quadruple precision) for high-precision arithmetic. Then, the maximum relative accuracy of the computed results is limited to \({\mathbf {u}}_{h} = 2^{-113} \approx 10^{-34}\). For binary128 arithmetic, we use the multiple-precision toolbox, in which binary128 arithmetic is supported as a special case by the command mp.Digits(34). The results are shown in Fig. 4. As can be seen, Algorithm 2 can refine the computed eigenvalues until their relative accuracy obtains approximately \({\mathbf {u}}_{h}\). Both the orthogonality and diagonality of the computed eigenvectors are improved until approximately \(\beta {\mathbf {u}}_{h}\). This result is consistent with Remark 1. For \(\beta \in \{10^{8}, 10^{14}\}\), Algorithm 1 cannot work because \(X^{(0)}\) is insufficiently accurate and the assumption (3) is not satisfied. We confirm that this problem can be resolved by Algorithm 2.

Fig. 4
figure 4

Results of iterative refinement by Algorithm 2 (\(\mathsf {RefSyEvCL}\)) in IEEE 754 binary128 arithmetic (\({\mathbf {u}}_{h} = 2^{-113} \approx 10^{-34}\)) for symmetric matrices with \(n = 100\) and various \(\beta \)

6.2 Computational speed

To evaluate the computational speed of the proposed algorithm (Algorithm 2), we first compare the computing time of Algorithm 2 to that of an approach that uses multiple-precision arithmetic (MP-approach). Note that the timing should be observed for reference because the computing time for Algorithm 2 strongly depends on the implementation of accurate matrix multiplication. Thus, we adopt an efficient method proposed by Ozaki et al. [19] that utilizes fast matrix multiplication routines such as xGEMM in BLAS. To simulate multiple-precision numbers and arithmetic in Algorithm 2, we represent \(\widehat{X} = \widehat{X}_{1} + \widehat{X}_{2} + \cdots + \widehat{X}_{m}\) with \(\widehat{X}_{k}\), \(k = 1, 2, \dots , m\), being floating-point matrices in working precision, such as “double-double” (\(m = 2\)) and “quad-double” (\(m = 4\)) precision format [10] and use the concept of error-free transformations [16].

In the multiple-precision toolbox [1], the MRRR algorithm [6] via Householder reduction is implemented sophisticatedly with parallelism to solve symmetric eigenvalue problems.

For comparison of timing, we generate a pseudo-random real symmetric \(n \times n\) matrix with clustered eigenvalues in a similar way to Sect. 6.1.2. To construct several eigenvalue clusters, we change diagonal elements of D to

$$\begin{aligned} d_{ii} = \left\{ \begin{array}{ll} 1 - \left\lfloor \dfrac{i - 1}{k}\right\rfloor c^{-1} - (i - 1)\beta ^{-1} &{} \ \text {for} \ i = 1, 2, \ldots , ck \\ -1 + \dfrac{n - i}{n - ck - 1}2^{-1} &{} \ \text {for} \ i = ck + 1, ck + 2, \ldots , n \end{array}\right. . \end{aligned}$$

Then, there are c clusters close to \(1/c, 2/c, \ldots , 1\) with k eigenvalues and the gap \(\beta ^{-1}\) in each cluster, and \(n - ck\) eigenvalues are distributed equally in \([-1,-\frac{1}{2}]\). We set \(n = 1000\), \(c = 5\), \(k = 100\), and \(\beta = 10^{12}\), i.e., the generated \(1000 \times 1000\) matrix has five clusters with 100 eigenvalues and the gap \(10^{-12}\) in each cluster. We compare the measured computing time of Algorithm 2 to that of the MP-approach, which is shown in Table 1 together with \(\Vert \widehat{E}\Vert \) and \(n_{{\mathcal {J}}}\), where \(n_{{\mathcal {J}}}\) is the number of eigenvalue clusters identified in Algorithm 2. At \(\nu = 2\), Algorithm 2 successfully identifies five eigenvalue clusters as \(n_{{\mathcal {J}}} = 5\) corresponding to \(c = 5\).

For comparison of timing on a lower performance computer, we also conducted numerical experiments using MATLAB R2017b on our laptop PC with a 2.5 GHz Intel Core i7-7660U (2 cores) CPU and 16 GB of main memory. In a similar way to the previous example, we set \(n = 500\), \(c = 5\), \(k = 10\), and \(\beta = 10^{12}\), i.e., the generated \(500 \times 500\) matrix has five clusters with 10 eigenvalues and the gap \(10^{-12}\) in each cluster. As can be seen from Table 2, the result is similar to that in Table 1.

Table 1 Results for a pseudo-random real symmetric matrix with clustered eigenvalues on our workstation: \(n = 1000\), 5 clusters, 100 eigenvalues, and \(\beta = 10^{12}\) in each cluster
Table 2 Results for a pseudo-random real symmetric matrix with clustered eigenvalues on our laptop PC: \(n = 500\), 5 clusters, 10 eigenvalues, and \(\beta = 10^{12}\) in each cluster

Next, we address more large-scale problems. The test matrices are generated using the MATLAB function randn with \(n \in \{2000, 5000, 10{,}000\}\), such as B = randn(n) and A = B + B’. We aim to compute all the eigenvectors of a given real symmetric \(n \times n\) matrix A with the maximum accuracy allowed by the binary64 format. To make the results more illustrative, we provide a less accurate initial guess \(X^{(0)}\) using eig in binary32, and we then refine \(X^{(\nu )}\) by Algorithm 2. For efficiency, we use binary64 arithmetic for \(\nu = 1, 2\), and accurate matrix multiplication based on error-free transformations [19] for \(\nu = 3\). As numerical results, we provide \(\Vert \widehat{E}\Vert \), \(n_{{\mathcal {J}}}\), and the measured computing time. The results are shown in Table 3. As can be seen, Algorithm 2 improves the accuracy of the computed results up to the limit of binary64 (\({\mathbf {u}}= 2^{-53} \approx 10^{-16}\)). For \(n \in \{5000, 10{,}000\}\), Algorithm 2 requires much computing time in total compared with eig in binary32 as \(n_{{\mathcal {J}}}\) increases. This is because the problems generally become more ill-conditioned for larger n. In fact, on the minimum gap between eigenvalues, \(\beta = 2.71 \cdot 10^{-5}\) for \(n = 2000\), \(\beta = 2.31 \cdot 10^{-6}\) for \(n = 5000\), and \(\beta = 2.26 \cdot 10^{-6}\) for \(n = 10{,}000\). Thus, it is likely that binary32 arithmetic cannot provide a sufficiently accurate initial guess \(X^{(0)}\) for a large-scale random matrix.

Table 3 Results of iterative refinement by Algorithm 2 (\(\mathsf {RefSyEvCL}\)) for pseudo-random symmetric matrices on a workstation

6.3 Application to a real-world problem

Finally, we apply the proposed algorithm to a quantum materials simulation that aims to understand electronic structures in material physics. The problems can be reduced to generalized eigenvalue problems, where eigenvalues and eigenvectors correspond to electronic energies and wave functions, respectively. To understand properties of materials correctly, it is crucial to determine the order of eigenvalues [12] and to obtain accurate eigenvectors [24, 25].

We deal with a generalized eigenvalue problem \(Ax = \lambda Bx\) arising from a vibrating carbon nanotube within a supercell with s, p, d atomic orbitals [4]. The matrices A and B are taken from ELSES matrix library [7] as VCNT22500, where A and B are real symmetric \(n \times n\) matrices with B being positive definite and \(n = 22500\). Our goal is to compute accurate eigenvectors and separate all the eigenvalues of the problem for determining their order. To this end, we use a numerical verification method in [14] based on the Gershgorin circle theorem (cf. e.g. [9, Theorem 7.2.2] and [23, pp. 71ff]), which can rigorously check whether all eigenvalues are separated and determine an existing range of each eigenvalue.

Let \(\varLambda (B^{-1}A)\) be the set of the eigenvalues of \(B^{-1}A\). Here, all the eigenvalues of \(B^{-1}A\) are real from the assumption of A and B. Let \(\widehat{X} \in {\mathbb {R}}^{n \times n}\) be an approximate eigenvector matrix of \(B^{-1}A\) with \(\widehat{X}\) being nonsingular. Then, it is expected that \(C := \widehat{X}^{-1}B^{-1}A\widehat{X}\) is nearly diagonal. Although it is not possible, in general, to calculate \(C = (c_{ij})\) exactly in finite precision arithmetic, we can efficiently obtain an enclosure of C. Note that we compute neither an enclosure of \(B^{-1}\) nor that of \(\widehat{X}^{-1}\) explicitly. Instead, we compute an approximate solution \(\widehat{C}\) of linear systems \((B\widehat{X})C = A\widehat{X}\) and then verify the accuracy of \(\widehat{C}\) using Yamamoto’s method [26] with matrix-based interval arithmetic [18, 21] for obtaining an enclosure of C. Suppose \(\widehat{D} = \mathrm {diag}(\widehat{\lambda }_{i})\) is a midpoint matrix and \(G = (g_{ij})\) is a radius matrix with \(g_{ij} \ge 0\) satisfying

$$\begin{aligned} c_{ij} \in {\left\{ \begin{array}{ll} {[}\widehat{\lambda }_{i} - g_{ii}, \widehat{\lambda }_{i} + g_{ii}] &{} \text {if } i = j \\ {[}-g_{ij}, g_{ij}] &{} \text {otherwise} \end{array}\right. } \quad \text {for all } (i,j) . \end{aligned}$$

Then, the Gershgorin circle theorem implies

$$\begin{aligned} \varLambda (B^{-1}A) \subseteq \bigcup _{i = 1}^{n}[\widehat{\lambda }_{i} - e_{i}, \widehat{\lambda }_{i} + e_{i}], \quad e_{i} := \sum _{j = 1}^{n}g_{ij} . \end{aligned}$$

It can also be shown that if all the disks \([\widehat{\lambda }_{i} - e_{i}, \widehat{\lambda }_{i} + e_{i}]\) are isolated, then all the eigenvalues are separated, i.e., each disk contains precisely one eigenvalue of \(B^{-1}A\) [23, pp. 71ff].

We first computed an approximate eigenvector matrix \(\widehat{X}\) of \(B^{-1}A\) using the MATLAB function \(\mathsf {eig}(A,B)\) in binary64 arithmetic as an initial guess, and \(\widehat{X}\) was obtained in 235.17 s. Then, we had \(\max _{1 \le i \le n}e_{i} = 2.75 \times 10^{-7}\), and 10 eigenvalues with 5 clusters could not be separated due to relatively small eigenvalue gaps. We next applied Algorithm 2 to A, B, and \(\widehat{X}\) in higher precision arithmetic in a similar way to Sect. 6.2, and obtained a refined approximate eigenvector matrix \(\widehat{X}'\) in 597.52 s. Finally, we obtained \(\max _{1 \le i \le n}e_{i} = 1.58 \times 10^{-14}\) and confirmed that all the eigenvalues can successfully be separated.