In this section, we prove quadratic convergence of Algorithm 1 under the assumption that the approximate solutions are modestly close to the exact solutions. Our analysis is divided into two parts. First, if we assume that A does not have multiple eigenvalues, then quadratic convergence is proven. Next, we consider a general analysis for any A.
Recall that the error of the approximate solution is expressed as \(\Vert \widehat{X} - X\Vert =\Vert \widehat{X}E\Vert \) in view of \(X = \widehat{X}(I + E)\). The refined approximate solution is \(X' := \widehat{X}(I + \widetilde{E})\). It then follows that the error of the refined solution is expressed as follows:
$$\begin{aligned} \Vert \widehat{X}(I + \widetilde{E}) - X\Vert = \Vert \widehat{X}(\widetilde{E} - E)\Vert . \end{aligned}$$
In addition, recall that \(\widetilde{E}\) is the solution of the following equations:
$$\begin{aligned}&\widetilde{E}+\widetilde{E}^{\mathrm {T}}=R, \end{aligned}$$
(21)
$$\begin{aligned}&\widetilde{D}-\widetilde{D}\widetilde{E}-\widetilde{E}^{\mathrm {T}}\widetilde{D}=S, \end{aligned}$$
(22)
where
$$\begin{aligned} R&:=I-\widehat{X}^{\mathrm {T}}\widehat{X}, \end{aligned}$$
(23)
$$\begin{aligned} S&:=\widehat{X}^{\mathrm {T}}A\widehat{X}. \end{aligned}$$
(24)
However, if \(\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}\) such that \(|\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}| \le \delta \), where \(\delta \) is defined as in (19), then (22) is not reflected for the computation of \(\widetilde{e}_{ij}\) and \(\widetilde{e}_{ji}\). In this case, we choose \(\widetilde{e}_{ij}=\widetilde{e}_{ji}=r_{ij}/2\) from (21). Such an exceptional case is considered later in the subsection on multiple eigenvalues.
Briefly, our goal is to prove quadratic convergence
$$\begin{aligned} \Vert \widehat{X}(I+\widetilde{E})-X\Vert =\mathcal {O}(\Vert \widehat{X}-X\Vert ^2), \end{aligned}$$
which corresponds to
$$\begin{aligned} \Vert \widehat{X}(\widetilde{E}-E)\Vert =\mathcal {O}(\Vert \widehat{X}E\Vert ^2), \end{aligned}$$
as \(\widehat{X}\rightarrow X\). We would like to prove that
$$\begin{aligned} \Vert \widetilde{E}-E\Vert =\mathcal {O}(\Vert E\Vert ^2) \end{aligned}$$
(25)
as \(\Vert E\Vert \rightarrow 0\).
To investigate the relationship between E and \(\widetilde{E}\), let \(\epsilon \) be defined as in (3) and
$$\begin{aligned} \chi (\epsilon ):=\frac{3-2\epsilon }{(1-\epsilon )^2}. \end{aligned}$$
(26)
Then, we see that
$$\begin{aligned} E+E^{\mathrm {T}}=R+\varDelta _{1}, \quad \Vert \varDelta _{1}\Vert \le \chi (\epsilon )\epsilon ^{2} \end{aligned}$$
(27)
from (7) and (8). In addition, we have
$$\begin{aligned} D-DE-E^{\mathrm {T}}D=S+\varDelta _{2}, \quad \Vert \varDelta _{2} \Vert \le \chi (\epsilon )\Vert A\Vert \epsilon ^{2} \end{aligned}$$
(28)
from (11) and (12).
Simple eigenvalues
We focus on the situation where the eigenvalues of A are all simple and a given \(\widehat{X}\) is sufficiently close to an orthogonal eigenvector matrix X. First, we derive a sufficient condition that (17) is chosen for all (i, j), \(i \ne j\) in Algorithm 1.
Lemma 1
Let A be a real symmetric \(n \times n\) matrix with simple eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\) and a corresponding orthogonal eigenvector matrix \(X \in \mathbb {R}^{n \times n}\). For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in exact arithmetic, and \(\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})\), R, S, and \(\delta \) are the quantities calculated in Algorithm 1. Define E such that \(X=\widehat{X}(I+E)\). If
$$\begin{aligned} \epsilon := \Vert E\Vert < \min \left( \frac{\min _{i \not = j}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}$$
(29)
then we obtain
$$\begin{aligned} \min _{i \ne j}|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}| > \delta (= 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert )). \end{aligned}$$
(30)
Proof
First, it is easy to see that
$$\begin{aligned} (E-\widetilde{E}) + ({E}-\widetilde{E})^{\mathrm {T}} = \varDelta _{1}, \quad \Vert \varDelta _{1}\Vert \le \chi (\epsilon )\epsilon ^2 \end{aligned}$$
(31)
from (21), (23), and (27). Hence, we obtain
$$\begin{aligned} |\widetilde{e}_{ii}-{e}_{ii}|\le \frac{\chi (\epsilon )}{2}\epsilon ^2 \quad \text {for} \; \ i=1,\ldots , n \end{aligned}$$
(32)
from the diagonal elements in (31). From (22) and (28), \(\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})\) and \(D = \mathrm {diag}(\lambda _{i})\) are determined as \(\widetilde{\lambda }_{i}=s_{ii}/(1-2\widetilde{e}_{ii})\), \(\lambda _{i}=(s_{ii} + \varDelta _{2}(i,i))/(1-2e_{ii})\). Thus, we have
$$\begin{aligned} \widetilde{\lambda }_{i}-{\lambda }_{i}= & {} \frac{s_{ii}(1-2e_{ii}) -(s_{ii}+\varDelta _{2}(i,i))(1-2\widetilde{e}_{ii})}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\nonumber \\= & {} -\frac{(1-2\widetilde{e}_{ii})\varDelta _{2}(i,i) + 2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\nonumber \\= & {} -\frac{\varDelta _{2}(i,i)}{1-2e_{ii}} - \frac{2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}. \end{aligned}$$
(33)
For the first term on the right-hand side, we see
$$\begin{aligned} \left| \frac{\varDelta _{2}(i,i)}{1-2e_{ii}} \right| \le \frac{\chi (\epsilon )}{1-2e_{ii}}\Vert A\Vert \epsilon ^2 \end{aligned}$$
from (28). Moreover, for the second term,
$$\begin{aligned} \left| \frac{2(e_{ii}-\widetilde{e}_{ii})s_{ii}}{(1-2e_{ii})(1-2\widetilde{e}_{ii})} \right| \le \frac{(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\Vert A\Vert \epsilon ^{2} \end{aligned}$$
from (24), (28), and (32). In addition, we see
$$\begin{aligned}&\frac{\chi (\epsilon )}{1-2e_{ii}} + \frac{(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1-2e_{ii})(1-2\widetilde{e}_{ii})} \nonumber \\&\quad =\frac{(1 - 2\widetilde{e}_{ii}) + (1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})}{(1-2e_{ii})(1-2\widetilde{e}_{ii})}\chi (\epsilon )\nonumber \\&\quad \le \frac{2(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1 - 2\epsilon )(1 - 2\epsilon - \chi (\epsilon )\epsilon ^{2})} =: \eta (\epsilon ) . \end{aligned}$$
(34)
Combining this with (33), we obtain
$$\begin{aligned} |\widetilde{\lambda }_{i}-{\lambda }_{i}| \le \eta (\epsilon )\Vert A\Vert \epsilon ^2 \quad \text {for} \; \ i=1, \ldots , n. \end{aligned}$$
(35)
Hence, noting the definition of \(\delta \) as in (30), we derive
$$\begin{aligned} \delta\le & {} 2(\Vert S-D\Vert +\Vert A\Vert \Vert R\Vert +\Vert D-\widetilde{D}\Vert ) \nonumber \\\le & {} 2(2(2\Vert A\Vert \epsilon +\chi (\epsilon )\Vert A\Vert \epsilon ^{2}) +\eta (\epsilon )\Vert A\Vert \epsilon ^{2})\nonumber \\\le & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon ) \epsilon )\Vert A\Vert \epsilon \nonumber \\< & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}, \end{aligned}$$
(36)
where the second inequality is due to (27), (28), and (35), and the last inequality is due to (29). In addition, from (26), \(\epsilon < 1/100\) in (29), and (34), we see
$$\begin{aligned} \chi (\epsilon )=\frac{3-2\epsilon }{(1-\epsilon )^2}< 3.05, \quad \eta (\epsilon ) = \frac{2(1 + 2\epsilon + \chi (\epsilon )\epsilon ^{2})\chi (\epsilon )}{(1 - 2\epsilon )(1 - 2\epsilon - \chi (\epsilon )\epsilon ^{2})} < 7. \end{aligned}$$
(37)
Thus, we find that, for all (i, j), \(i \ne j\),
$$\begin{aligned} |\widetilde{\lambda }_{i} - \widetilde{\lambda }_{j}|\ge & {} |{\lambda }_{i}-{\lambda }_{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2\\> & {} \min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|-2\eta (\epsilon )\cdot \frac{\min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|}{10n} \cdot \frac{1}{100}\\> & {} \left( 10-\frac{14}{100} \right) \cdot \frac{\min _{p \not = q}|{\lambda }_{p}-{\lambda }_{q}|}{10n}\\> & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon ) \cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}\\> & {} \delta \ (= 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert )), \end{aligned}$$
where the first inequality is due to (35), the second inequality is due to (29), the third inequality is due to the second inequality in (37), the fourth inequality is due to (37) and \(\epsilon < 1/100\) as in (29), and the last inequality is due to (36), respectively. Thus, we obtain (30). \(\square \)
The assumption (29) is crucial for the first iteration in the iterative process (20). In the following, monotone convergence of \(\Vert E^{(\nu )}\Vert \) is proven under the assumption (29) for a given initial guess \(\widehat{X} = X^{(0)}\) and \(E = E^{(0)}\), so that \(\Vert E^{(\nu + 1)}\Vert < \Vert E^{(\nu )}\Vert \) for \(\nu = 0, 1, \ldots \) . Thus, in the iterative refinement using Algorithm 1, Lemma 1 ensures that \(|\widetilde{\lambda }_{i}^{(\nu )} - \widetilde{\lambda }_{j}^{(\nu )}| > \delta ^{(\nu )}\) for all (i, j), \(i \ne j\) as in (30) are consecutively satisfied for \(X^{(\nu )}\) in the iterative process. In addition, recall that our aim is to prove the quadratic convergence in the asymptotic regime. To this end, we derive a key lemma that shows (25).
Lemma 2
Let A be a real symmetric \(n \times n\) matrix with simple eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\) and a corresponding orthogonal eigenvector matrix \(X \in \mathbb {R}^{n \times n}\). For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in exact arithmetic, and \(\widetilde{E}\) is the quantity calculated in Algorithm 1. Define E such that \(X=\widehat{X}(I+E)\). Under the assumption (29) in Lemma 1, we have
$$\begin{aligned}&\Vert \widetilde{E}-E\Vert < \frac{7}{10}\Vert E\Vert , \end{aligned}$$
(38)
$$\begin{aligned}&\limsup _{\Vert E\Vert \rightarrow 0} \frac{\Vert \widetilde{E}-E \Vert }{\Vert E \Vert ^{2}}\le \frac{6n\Vert A\Vert }{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|}. \end{aligned}$$
(39)
Proof
Let \(\epsilon \), \(\chi (\cdot )\), and \(\eta (\cdot )\) be defined as in Lemma 1, (26), and (34), respectively. Note that the diagonal elements of \(\widetilde{E}-{E}\) are estimated as in (32). In the following, we estimate the off-diagonal elements of \(\widetilde{E}-{E}\). To this end, define
$$\begin{aligned} \widetilde{\varDelta }_{2} := \widetilde{D}-\widetilde{D}E-E^{\mathrm {T}}\widetilde{D}-S. \end{aligned}$$
(40)
Noting (28), (35), and (40), we see that the off-diagonal elements of \(|\varDelta _{2}-\widetilde{\varDelta }_{2}|\) are less than \(2\eta (\epsilon )\Vert A\Vert \epsilon ^3\). In other words,
$$\begin{aligned} |\widetilde{\varDelta }_{2}(i,j)|\le & {} (\chi (\epsilon )+2\eta (\epsilon )\epsilon )\Vert A\Vert \epsilon ^2 \end{aligned}$$
(41)
for \(i\not =j\) from (28), where \(\widetilde{\varDelta }_{2}(i,j)\) are the (i, j) elements of \(\widetilde{\varDelta }_{2}\). In addition, from (22), it follows that
$$\begin{aligned} \widetilde{D}(E-\widetilde{E})+(E-\widetilde{E})^{\mathrm {T}}\widetilde{D} = -\widetilde{\varDelta }_{2}. \end{aligned}$$
(42)
From (31), (41) and (42), we have
$$\begin{aligned} (e_{ij}-\widetilde{e}_{ij})+(e_{ji}-\widetilde{e}_{ji})= & {} \epsilon _{1}, \quad |\epsilon _{1}|\le \chi (\epsilon )\epsilon ^{2}, \end{aligned}$$
(43)
$$\begin{aligned} {\widetilde{\lambda }_{i}}(e_{ij}-\widetilde{e}_{ij})+{\widetilde{\lambda }_{j}} (e_{ji}-\widetilde{e}_{ji})= & {} \epsilon _{2}, \quad |\epsilon _{2}|\le (\chi (\epsilon )+2\eta (\epsilon )\epsilon )\Vert A\Vert \epsilon ^{2}. \end{aligned}$$
(44)
It then follows that
$$\begin{aligned} e_{ij}-\widetilde{e}_{ij}=\frac{\epsilon _{2}-\widetilde{\lambda }_{j}\epsilon _{1}}{\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}},\quad e_{ji}-\widetilde{e}_{ji}=\frac{\epsilon _{2}-\widetilde{\lambda }_{i}\epsilon _{1}}{\widetilde{\lambda }_{j}-\widetilde{\lambda }_{i}}. \end{aligned}$$
Therefore, using (35), we obtain
$$\begin{aligned} |\widetilde{e}_{ij}-e_{ij}|\le & {} \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|}\nonumber \\\le & {} \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2}. \end{aligned}$$
(45)
Note that \(\Vert \widetilde{E}-E\Vert ^{2} \le \Vert \widetilde{E}-E\Vert _{\mathrm {F}}^{2} = \sum _{i,j}|\widetilde{e}_{ij}-e_{ij}|^2 \) and
$$\begin{aligned} \frac{\chi (\epsilon )}{2}\epsilon ^2 \le \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\Vert A\Vert \epsilon ^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2} \quad (i\not = j) \end{aligned}$$
in (32) and (45). Therefore, we obtain
$$\begin{aligned} \Vert \widetilde{E}-E\Vert \le \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})n\Vert A\Vert \epsilon ^2}{\min _{i\not =j}| \lambda _{i}-\lambda _{j}|-2\eta (\epsilon )\Vert A\Vert \epsilon ^2}. \end{aligned}$$
Combining this with \(\chi (0)=3\) proves (39). Moreover, we have
$$\begin{aligned} \Vert \widetilde{E}-E\Vert< \frac{(2\chi (\epsilon )+2\eta (\epsilon )\epsilon + \chi (\epsilon )\eta (\epsilon )\epsilon ^{2})\epsilon }{\frac{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|}{n\Vert A\Vert \epsilon }-\frac{2\eta (\epsilon ) \epsilon }{n}}< \frac{6.4\epsilon }{10-\frac{14}{100n}} < \frac{7}{10}\epsilon \end{aligned}$$
(46)
from (29) and (37). \(\square \)
Using the above lemmas, we obtain a main theorem that states the quadratic convergence of Algorithm 1 if all eigenvalues are simple and a given \(\widehat{X}\) is sufficiently close to X.
Theorem 1
Let A be a real symmetric \(n \times n\) matrix with simple eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\) and a corresponding orthogonal eigenvector matrix \(X \in \mathbb {R}^{n\times n}\). For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in exact arithmetic, and \(X'\) is the quantity calculated in Algorithm 1. Define E and \(E'\) such that \(X=\widehat{X}(I+E)\) and \(X=X'(I+E')\), respectively. Under the assumption (29) in Lemma 1, we have
$$\begin{aligned}&\Vert E'\Vert < \frac{5}{7}\Vert E\Vert , \end{aligned}$$
(47)
$$\begin{aligned}&\limsup _{\Vert E\Vert \rightarrow 0}\frac{\Vert E'\Vert }{\Vert E\Vert ^{2}} \le \frac{6n\Vert A\Vert }{\min _{i\not =j}|\lambda _{i}-\lambda _{j}|} . \end{aligned}$$
(48)
Proof
Noting \(X'(I+E')=\widehat{X}(I+E) \ (=X)\) and \(X' = \widehat{X}(I + \widetilde{E})\), where \(\widetilde{E}\) is the quantity calculated in Algorithm 1, we have
$$\begin{aligned} X'E'=\widehat{X}(E-\widetilde{E}) . \end{aligned}$$
Therefore, we obtain
$$\begin{aligned} E'=(I+\widetilde{E})^{-1}(E - \widetilde{E}). \end{aligned}$$
(49)
Noting (46) and
$$\begin{aligned} \Vert \widetilde{E}\Vert \le \Vert \widetilde{E}-E\Vert +\Vert E\Vert \le \frac{17}{10} \Vert E\Vert < \frac{1}{50} \end{aligned}$$
from (29) and (38), we have
$$\begin{aligned} \Vert E'\Vert \le \frac{\Vert \widetilde{E} - E\Vert }{1 - \Vert \widetilde{E}\Vert } < \frac{\frac{7}{10}\Vert E\Vert }{1-\frac{1}{50}}=\frac{5}{7}\Vert E\Vert . \end{aligned}$$
(50)
Finally, using (49) and (39), we obtain (48). \(\square \)
Our analysis indicates that Algorithm 1 may not be convergent for very large n. However, in practice, n is much smaller than \(1/\epsilon \) for \(\epsilon :=\Vert E\Vert \) when the initial guess \(\widehat{X}\) is computed by some backward stable algorithm, e.g., in IEEE 754 binary64 arithmetic, unless A has nearly multiple eigenvalues. In such a situation, the iterative refinement works well.
Remark 3
For any \(\widetilde{\delta } \ge \delta \), we can replace \(\delta \) by \(\widetilde{\delta }\) in Algorithm 1. For example, such cases arise when the Frobenius norm is used for calculating \(\delta \) instead of the spectral norm as mentioned in Remark 1. In such cases, the quadratic convergence of the algorithm can also be proven in a similar way as in this subsection by replacing the assumption (29) by
$$\begin{aligned} \epsilon := \Vert E\Vert < \min \left( \frac{1}{\rho }\cdot \frac{\min _{i \not = j}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}$$
(51)
where \(\rho := \widetilde{\delta }/\delta \ge 1\). More specifically, in the convergence analysis, (36) is replaced with
$$\begin{aligned} \widetilde{\delta }= & {} \rho \cdot 2(\Vert S-\widetilde{D}\Vert +\Vert A\Vert \Vert R\Vert ) \le \rho \cdot 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\Vert A\Vert \epsilon \\< & {} 2 (4 + 2\chi (\epsilon )\epsilon + \eta (\epsilon )\epsilon )\cdot \frac{\min _{p \not = q}|\lambda _{p} - \lambda _{q}|}{10n}, \end{aligned}$$
where the last inequality is due to the assumption (51). Therefore, \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|>\widetilde{\delta }\) also hold for \(i\not =j\) in the same manner as the proof of Lemma 1. As a result, (38) and (39) are also established even if \(\delta \) is replaced by \(\widetilde{\delta }\). \(\square \)
Multiple eigenvalues
Multiple eigenvalues require some care. If \(\widetilde{\lambda }_{i}\approx \widetilde{\lambda }_{j}\) corresponding to multiple eigenvalues \(\lambda _{i}=\lambda _{j}\), we might not be able to solve the linear system given by (21) and (22). Therefore, we use equation (21) only, i.e., \(\widetilde{e}_{ij}=\widetilde{e}_{ji}={r}_{ij}/2\) if \(|\widetilde{\lambda }_{i}-\widetilde{\lambda }_{j}|\le \delta \).
To investigate the above exceptional process, let us consider a simple case as follows. Suppose \(\lambda _{i}\), \(i \in \mathcal {M} := \{1, 2, \ldots , p\}\) are multiple, i.e., \(\lambda _{1}=\cdots =\lambda _{p}<\lambda _{p+1}<\cdots < \lambda _{n}\). Then, the eigenvectors corresponding to \(\lambda _{i}\), \(1\le i \le p\) are not unique. Suppose \(X = [x_{(1)},\ldots ,x_{(n)}] \in \mathbb {R}^{n\times n}\) is an orthogonal eigenvector matrix of A, where \(x_{(i)}\) are the normalized eigenvectors corresponding to \(\lambda _{i}\) for \(i = 1, \ldots , n\). Define \(X_{\mathcal {M}} := [x_{(1)},\ldots ,x_{(p)}] \in \mathbb {R}^{n\times p}\) and \(X_{\mathcal {S}} := [x_{(p+1)},\ldots ,x_{(n)}] \in \mathbb {R}^{n\times (n - p)}\). Then, the columns of \(X_{\mathcal {M}}Q\) are also the eigenvectors of A for any orthogonal matrix \(Q \in \mathbb {R}^{p\times p}\). Thus, let \(\mathcal {V}\) be the set of \(n \times n\) orthogonal eigenvector matrices of A and \(\mathcal {E}:= \{ \widehat{X}^{-1}X-I : X \in \mathcal {V}\}\) for a given nonsingular \(\widehat{X}\).
The key idea of the proof of quadratic convergence below is to define an orthogonal eigenvector matrix \(Y \in \mathcal {V}\) as follows. For any \(X_{\alpha } \in \mathcal {V}\), splitting \(\widehat{X}^{-1}X_{\mathcal {M}}\) into the first p rows \(V_{\alpha } \in \mathbb {R}^{p \times p}\) and the remaining \((n - p)\) rows \(W_{\alpha } \in \mathbb {R}^{(n - p) \times p}\), we have
$$\begin{aligned} \widehat{X}^{-1}X_{\mathcal {M}} = \left[ \begin{array}{c} V_{\alpha } \\ W_{\alpha } \\ \end{array}\right] = \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] Q_{\alpha } \end{aligned}$$
(52)
in view of the polar decomposition \(V_{\alpha } = CQ_{\alpha }\), where \(C=\sqrt{V_{\alpha }V_{\alpha }^{\mathrm {T}}} \in \mathbb {R}^{p \times p}\) is symmetric and positive semidefinite and \(Q_{\alpha } \in \mathbb {R}^{p \times p}\) is orthogonal. Note that, although \(X_{\mathcal {M}}Q\) for any orthogonal matrix \(Q \in \mathbb {R}^{p \times p}\) is also an eigenvector matrix, the symmetric and positive semidefinite matrix C is independent of Q. In other words, we have
$$\begin{aligned} \widehat{X}^{-1}(X_{\mathcal {M}}Q) = (\widehat{X}^{-1}X_{\mathcal {M}})Q = \left[ \begin{array}{c} V_{\alpha }Q \\ W_{\alpha }Q \\ \end{array}\right] = \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] Q_{\alpha }Q, \end{aligned}$$
(53)
where the last equality implies the polar decomposition of \(V_{\alpha }{Q}\). In addition, if \(V_{\alpha }\) in (52) is nonsingular, the orthogonal matrix \(Q_{\alpha }\) is uniquely determined by the polar decomposition for a fixed \(X_{\alpha } \in \mathcal {V}\). To investigate the features of \(V_{\alpha }\), we suppose that \(\widehat{X}\) is an exact eigenvector matrix. Then, noting \(\widehat{X}^{-1}=\widehat{X}^\mathrm{T}\), we see that \(V_{\alpha }\) is an orthogonal matrix and \(C=I\) and \(V_{\alpha }=Q_{\alpha }\) in the polar decomposition of \(V_{\alpha }\) in (52). Thus, for any fixed \(\widehat{X}\) in some neighborhood of \(\mathcal {V}\), the above polar decomposition \(V_{\alpha } = CQ_{\alpha }\) is unique, where the nonsingular matrix \(V_{\alpha }\) depends on \(X_{\alpha } \in \mathcal {V}\). In the following, we consider any \(\widehat{X}\) in such a neighborhood of \(\mathcal {V}\).
Recall that, for any orthogonal matrix Q, the last equality in (53) is due to the polar decomposition \(V_{\alpha }{Q}= C(Q_{\alpha }Q)\). Hence, we have an eigenvector matrix
$$\begin{aligned} (X_{\mathcal {M}}Q)(Q_{\alpha }Q)^\mathrm{T} = X_{\mathcal {M}}Q_{\alpha }^\mathrm{T} = \widehat{X} \left[ \begin{array}{c} C \\ W_{\alpha }Q_{\alpha }^{\mathrm {T}} \\ \end{array}\right] , \end{aligned}$$
which is independent of Q. Thus, we define the unique matrix \(Y := [X_{\mathcal {M}}Q_{\alpha }^{\mathrm {T}}, X_{\mathcal {S}}]\) for all matrices in \(\mathcal {V}\), where Y depends only on \(\widehat{X}\). Then, the corresponding error term \(F = (f_{ij})\) is uniquely determined as
$$\begin{aligned} F&:= \widehat{X}^{-1}Y-I = [\widehat{X}^{-1}X_{\mathcal {M}}Q_{\alpha }^{\mathrm {T}}, * \ ] - I \nonumber \\&= \left[ \begin{array}{cc} C - I &{} * \\ * &{} * \\ \end{array}\right] , \end{aligned}$$
(54)
which implies \(f_{ij}=f_{ji}\) corresponding to the multiple eigenvalues \(\lambda _{i}=\lambda _{j}\). Therefore,
$$\begin{aligned} f_{ij}=f_{ji}=\frac{r_{ij}+\varDelta _{1}(i,j)}{2} \end{aligned}$$
(55)
from (27), where \(\varDelta _{1}(i,j)\) denote (i, j) elements of \(\varDelta _{1}\) for all (i, j). Now, let us consider the situation where \(\widehat{X}\) is an exact eigenvector matrix. In (52), noting \(\widehat{X}^{-1}=\widehat{X}^{\mathrm {T}}\), we have \(W_{\alpha }=O\) and \(C=I\) in the polar decomposition of \(V_{\alpha }\). Combining the features with (54), we see \(F=O\) for the exact eigenvector matrix \(\widehat{X}\).
Our aim is to prove \(\Vert F\Vert \rightarrow 0\) in the iterative refinement for \(\widehat{X}\approx Y \in \mathcal {V}\), where Y depends on \(\widehat{X}\). To this end, for the refined \(X'\) as a result of Algorithm 1, we also define an eigenvector matrix \(Y'\in \mathcal {V}\) and \(F':=(X')^{-1}Y'-I\) such that the submatrices of \((X')^{-1}Y'\) corresponding to the multiple eigenvalues are symmetric and positive definite. Note that the eigenvector matrix Y is changed to \(Y'\) corresponding to \(X'\) after the refinement.
On the basis of the above observations, we consider general eigenvalue distributions. First of all, define the index sets \(\mathcal {M}_{k}\), \(k = 1, 2, \ldots , M\) for multiple eigenvalues \(\{ {\lambda }_{i} \}_{i \in \mathcal {M}_{k}}\) satisfying the following conditions:
$$\begin{aligned} \left\{ \begin{array}{l} \text {(a)} \ \mathcal {M}_{k} \subseteq \{1,2,\ldots ,n\} \ \text {with} \ n_{k} := |\mathcal {M}_{k}| \ge 2 \\ \text {(b)} \ {\lambda }_{i} = {\lambda }_{j}, \ \forall i,j \in \mathcal {M}_{k} \\ \text {(c)} \ {\lambda }_{i} \not = {\lambda }_{j}, \ \forall i \in \mathcal {M}_{k}, \ \forall j \in \{1,2,\ldots ,n\}\,\backslash \, \mathcal {M}_{k} \end{array}\right. . \end{aligned}$$
(56)
Using the above definitions, we obtain the following key lemma to prove quadratic convergence.
Lemma 3
Let A be a real symmetric \(n \times n\) matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \(\mathcal {M}_{k}\), \(k = 1, 2, \ldots , M\), satisfying (56). Let \(\mathcal {V}\) be the set of \(n \times n\) orthogonal eigenvector matrices of A. For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), define \(\mathcal {E}\) as
$$\begin{aligned} \mathcal {E}:= \{ \widehat{X}^{-1}X-I : X\in \mathcal {V}\}. \end{aligned}$$
(57)
In addition, define \(Y\in \mathcal {V}\) such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{-1}Y\) corresponding to \(\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}\) are symmetric and positive semidefinite. Moreover, define \(F \in \mathcal {E}\) such that \(Y=\widehat{X}(I+F)\). Then, for any \(E_{\alpha } \in \mathcal {E}\),
$$\begin{aligned} \Vert F\Vert \le 3\Vert E_{\alpha }\Vert . \end{aligned}$$
(58)
Furthermore, if there exists some Y such that \(\Vert F\Vert <1\), then \(Y\in \mathcal {V}\) is uniquely determined.
Proof
For any \(E_{\alpha } = (e_{ij}^{(\alpha )}) \in \mathcal {E}\), let \(E_{\mathrm {diag}}\) denote the block diagonal part of \(E_{\alpha }\), where the \(n_{k} \times n_{k}\) blocks of \(E_{\mathrm {diag}}\) correspond to \(n_{k}\) multiple eigenvalues \(\{ {\lambda }_{i} \}_{i \in \mathcal {M}_{k}}\), i.e.,
$$\begin{aligned} E_{\mathrm {diag}}(i,j):= \left\{ \begin{array}{ll} e_{ij}^{(\alpha )} &{} \ \text {if} \ \lambda _{i}=\lambda _{j} \\ 0 &{} \ \text {otherwise} \end{array}\right. \end{aligned}$$
for all \(1\le i,j \le n\), where \(\lambda _{1} \le \cdots \le \lambda _{n}\). Here, we consider the polar decomposition
$$\begin{aligned} I+E_{\mathrm {diag}}=:HU_{\alpha }, \end{aligned}$$
(59)
where H is a symmetric and positive semidefinite matrix and \(U_{\alpha }\) is an orthogonal matrix. Note that, similarly to C in (52), H is unique and independent of the choice of \(X_{\alpha } \in \mathcal {V}\) that satisfies \(X_{\alpha }=\widehat{X}(I+E_{\alpha })\), whereas \(U_{\alpha }\) is not always uniquely determined. Then, we have
$$\begin{aligned} Y=X_{\alpha }U_{\alpha }^{\mathrm {T}} \end{aligned}$$
(60)
from the definition of Y and
$$\begin{aligned} F= & {} \widehat{X}^{-1}Y-I\nonumber \\= & {} \widehat{X}^{-1}X_{\alpha }U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }+I)U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }-E_{\mathrm {diag}}+HU_{\alpha })U_{\alpha }^{\mathrm {T}}-I\nonumber \\= & {} (E_{\alpha }-E_{\mathrm {diag}})U_{\alpha }^{\mathrm {T}}+H-I, \end{aligned}$$
(61)
where the first, second, third, and fourth equalities are consequences of the definition of F, (60), (57), and (59), respectively. Here, we see that
$$\begin{aligned} \Vert H-I\Vert \le \Vert E_{\mathrm {diag}}\Vert \end{aligned}$$
(62)
because all the eigenvalues of H are the singular values of \(HU_{\alpha }\) in (59) that range over the interval \([1-\Vert E_{\mathrm {diag}}\Vert , 1+\Vert E_{\mathrm {diag}}\Vert ]\) from the Weyl’s inequality for singular values. In addition, note that
$$\begin{aligned} \Vert E_{\mathrm {diag}}\Vert \le \Vert E_{\alpha }\Vert . \end{aligned}$$
(63)
Therefore, we obtain
$$\begin{aligned} \Vert F\Vert =\Vert (E_{\alpha }-E_{\mathrm {diag}})U_{\alpha }^{\mathrm {T}}+(H-I)\Vert \le 3\Vert E_{\alpha }\Vert \end{aligned}$$
from (61), (62), and (63), giving us (58).
Finally, we prove that Y is unique if \(\Vert F\Vert <1\). In the above discussion, if \(X_{\alpha }\) is replaced with some \(Y \in \mathcal {V}\), then \(E_{\alpha }=F\), and thus \(\Vert E_{\mathrm {diag}}\Vert \le \Vert E_{\alpha }\Vert = \Vert F\Vert <1\) in (59). Therefore, \(U_{\alpha }=I\) in (59) due to the uniqueness of the polar decomposition of the nonsingular matrix \(I+E_\mathrm{diag}\), which implies the uniqueness of Y from (60). \(\square \)
Moreover, we have the next lemma, corresponding to Lemmas 1 and 2.
Lemma 4
Let A be a real symmetric \(n \times n\) matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \(\mathcal {M}_{k}\), \(k = 1, 2, \ldots , M\), satisfying (56). For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in exact arithmetic, and \(\widetilde{D} = \mathrm {diag}(\widetilde{\lambda }_{i})\), \(\widetilde{E}\), and \(\delta \) are the quantities calculated in Algorithm 1. Let F be defined as in Lemma 3. Assume that
$$\begin{aligned} \epsilon _{F} := \Vert F\Vert 0 < \frac{1}{3}\cdot \min \left( \frac{\min _{\lambda _{i}\not = \lambda _{j}}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) . \end{aligned}$$
(64)
Then, we obtain
$$\begin{aligned} \Vert F-\widetilde{E}\Vert \le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})n\Vert A\Vert \epsilon _{F}^2}{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2}, \end{aligned}$$
(65)
where \(\chi (\cdot )\) and \(\eta (\cdot )\) are defined as in (26) and (34), respectively.
Proof
First, we see that, for \(i\not =j\) corresponding to \(\lambda _{i}\not =\lambda _{j}\),
$$\begin{aligned}&|\widetilde{e}_{ij}-f_{ij}|\le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\Vert A\Vert \epsilon _{F}^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2}, \end{aligned}$$
similar to the proof of (45) in Lemma 2. Concerning the multiple eigenvalues \(\lambda _{i}=\lambda _{j}\), noting \(| \widetilde{\lambda }_{i} - \widetilde{\lambda }_{j} | \le \delta \) and (55), we have
$$\begin{aligned} |\widetilde{e}_{ij}-f_{ij}|\le \frac{\chi (\epsilon _{F})}{2}\epsilon _{F}^2\ \text {for }(i,j)\text { corresponding to }\lambda _{i}=\lambda _{j}. \end{aligned}$$
Note that, for (i, j) corresponding to \(\lambda _{i}\not =\lambda _{j}\),
$$\begin{aligned} \frac{\chi (\epsilon _{F})}{2}\epsilon _{F}^2 \le \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\Vert A\Vert \epsilon _{F}^2}{|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2} \end{aligned}$$
in the above two inequalities for the elements of \(F-\widetilde{E}\). Therefore, we have
$$\begin{aligned} \Vert F-\widetilde{E}\Vert\le & {} \Vert F-\widetilde{E}\Vert _{\mathrm {F}} = \sqrt{\sum _{1\le i,j \le n}|f_{ij}-\widetilde{e}_{ij}|^2}\\\le & {} \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})n\Vert A\Vert \epsilon _{F}^2}{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|-2\eta (\epsilon _{F})\Vert A\Vert \epsilon _{F}^2} \end{aligned}$$
similar to the proof in Lemma 2. \(\square \)
On the basis of Lemmas 3 and 4 and Theorem 1, we see the quadratic convergence for a real symmetric matrix A that has multiple eigenvalues.
Theorem 2
Let A be a real symmetric \(n \times n\) matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \(\mathcal {M}_{k}\), \(k = 1, 2, \ldots , M\), satisfying (56). Let \(\mathcal {V}\) be the set of \(n \times n\) orthogonal eigenvector matrices of A. For a given nonsingular \(\widehat{X} \in \mathbb {R}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\) in exact arithmetic, and \(X'\) and \(\delta \) are the quantities calculated in Algorithm 1. Let \(Y, Y' \in \mathcal {V}\) be defined such that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{-1}Y\) and \((X')^{-1}Y'\) corresponding to \(\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}\) are symmetric and positive definite. Define F and \(F'\) such that \(Y=\widehat{X}(I+F)\) and \(Y'=X'(I+F')\), respectively. Furthermore, suppose that (64) in Lemma 4 is satisified for \(\epsilon _{F} := \Vert F\Vert \). Then, we obtain
$$\begin{aligned}&\Vert F'\Vert < \frac{5}{7}\Vert F\Vert , \end{aligned}$$
(66)
$$\begin{aligned}&\limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}} \le 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}| \lambda _{i}-\lambda _{j}|}\right) . \end{aligned}$$
(67)
Proof
Let \(\widetilde{E}\) and \(\widetilde{\lambda }_{i}\), \(i = 1, 2, \ldots , n\), be the quantities calculated in Algorithm 1. First, note that (65) in Lemma 4 is established. Next, we define \(G:=(X')^{-1}Y-I\). Then, we have
$$\begin{aligned} G=(I+\widetilde{E})^{-1}(F-\widetilde{E}) \end{aligned}$$
(68)
similar to (49). Moreover, similar to (46), we have
$$\begin{aligned} \Vert \widetilde{E}-F \Vert< & {} \frac{(2\chi (\epsilon _{F})+2\eta (\epsilon _{F})\epsilon _{F} + \chi (\epsilon _{F})\eta (\epsilon _{F}){\epsilon _{F}}^{2})\epsilon _{F}}{\frac{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}{n\Vert A\Vert \epsilon _{F}}-\frac{2\eta (\epsilon _{F}) \epsilon _{F}}{n}}\\< & {} \frac{6.4\epsilon _{F}}{3(10-\frac{14}{100n})} < \frac{1}{3}\cdot \frac{7}{10} \epsilon _{F} \end{aligned}$$
from (65) and (64). Therefore, we see
$$\begin{aligned} \Vert G\Vert <\frac{1}{3}\cdot \frac{5}{7}\epsilon _{F} \end{aligned}$$
in a similar manner as the proof of (50). Using (58), we have
$$\begin{aligned} \Vert F'\Vert \le 3\Vert G\Vert . \end{aligned}$$
(69)
Therefore, we obtain (66). Since we see
$$\begin{aligned} \limsup _{\epsilon _{F} \rightarrow 0}\frac{\Vert G\Vert }{\Vert F\Vert ^{2}} \le \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|} \end{aligned}$$
from (68) and (65), we obtain (67) from (69). \(\square \)
In the iterative refinement, Theorem 2 shows that the error term \(\Vert F\Vert \) is quadratically convergent to zero. Note that \(\widehat{X}\) is also convergent to some fixed eigenvector matrix X because Theorem 2 and (65) imply \(\Vert \widetilde{E}\Vert /\Vert F\Vert \rightarrow 1\) as \(\Vert F\Vert \rightarrow 0\) in \(X':=\widehat{X}(I+\widetilde{E})\), where \(\Vert F\Vert \) is quadratically convergent to zero.
Complex case
For a Hermitian matrix \(A\in \mathbb {C}^{n \times n}\), we must note that, for any unitary diagonal matrix U, XU is also an eigenvector matrix, i.e., there is a continuum of normalized eigenvector matrices in contrast to the real case. Related to this, note that \(R:=I-\widehat{X}^{\mathrm {H}}\widehat{X}\) and \(S:=\widehat{X}^{\mathrm {H}}A\widehat{X}\), and (14) is replaced with \(\widetilde{E}+\widetilde{E}^{\mathrm {H}}=R\) in the complex case; thus, the diagonal elements \(\widetilde{e}_{ii}\) for \(i=1,\ldots ,n\) are not uniquely determined in \(\mathbb {C}\). Now, select \(\widetilde{e}_{ii}=r_{ii}/2\in \mathbb {R}\) for \(i=1,\ldots ,n\). Then, we can prove quadratic convergence using the polar decomposition in the same way as in the discussion of multiple eigenvalues in the real case. More precisely, we define a normalized eigenvector matrix Y as follows. First, we focus on the situation where all eigenvalues are simple. For a given nonsingular \(\widehat{X}\), let Y be defined such that all diagonal elements of \(\widehat{X}^{-1}Y\) are positive real numbers. In addition, let \(F:=\widehat{X}^{-1}Y-I\). Then, we see the quadratic convergence of F in the same way as in Theorem 2.
Corollary 1
Let \(A\in \mathbb {C}^{n\times n}\) be a Hermitian matrix whose eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\), are all simple. Let \(\mathcal {V}\) be the set of \(n \times n\) unitary eigenvector matrices of A. For a given nonsingular \(\widehat{X} \in \mathbb {C}^{n\times n}\), suppose that Algorithm 1 is applied to A and \(\widehat{X}\), and a nonsingular \(X'\) is obtained. Define \(Y,Y' \in \mathcal {V}\) such that all the diagonal elements of \(\widehat{X}^{-1}Y\) and \((X')^{-1}Y'\) are positive real numbers. Furthermore, define F and \(F'\) such that \(Y=\widehat{X}(I+F)\) and \(Y'=X'(I+F')\), respectively. If
$$\begin{aligned} \Vert F\Vert < \frac{1}{3}\min \left( \frac{\min _{\lambda _{i}\not = \lambda _{j}}|\lambda _{i}-\lambda _{j}|}{10n\Vert A\Vert }, \frac{1}{100}\right) , \end{aligned}$$
(70)
then we obtain
$$\begin{aligned} \Vert F'\Vert< & {} \frac{5}{7}\Vert F\Vert , \\ \limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}}\le & {} 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}\right) . \end{aligned}$$
For a general Hermitian matrix having multiple eigenvalues, we define Y in the same manner as in Theorem 2, resulting in the following corollary.
Corollary 2
Let \(A\in \mathbb {C}^{n\times n}\) be a Hermitian matrix with the eigenvalues \(\lambda _{i}\), \(i = 1, 2, \ldots , n\). Suppose A has multiple eigenvalues with index sets \(\mathcal {M}_{k}\), \(k = 1, 2, \ldots , M\), satisfying (56). For a given nonsingular \(\widehat{X} \in \mathbb {C}^{n\times n}\), let Y, \(Y'\), F, \(F'\), and \(\delta \) be defined as in Corollary 1. Suppose that, for all k, the \(n_{k}\times n_{k}\) submatrices of \(\widehat{X}^{-1}Y\) and \((X')^{-1}Y'\) corresponding to \(\{\lambda _{i}\}_{i \in \mathcal {M}_{k}}\) are Hermitian and positive definite. Furthermore, suppose that (70) is satisfied. Then, we obtain
$$\begin{aligned}&\Vert F'\Vert < \frac{5}{7}\Vert F\Vert , \\&\limsup _{\Vert F\Vert \rightarrow 0}\frac{\Vert F'\Vert }{\Vert F\Vert ^{2}} \le 3\left( \frac{6n\Vert A\Vert }{\min _{\lambda _{i}\not =\lambda _{j}}|\lambda _{i}-\lambda _{j}|}\right) . \end{aligned}$$
Note that \(\widehat{X}\) in Corollaries 1 and 2 is convergent to some fixed eigenvector matrix X of A in the same manner as in the real case.