1 Introduction

Our aim in this paper is to explain the origin of the problems that have been noticed [1] when computing Gauss-Radau quadrature upper bounds of the A-norm of the error in the Conjugate Gradient (CG) algorithm for solving linear systems \(Ax=b\) with a symmetric positive definite matrix of order N.

The connection between CG and Gauss quadrature has been known since the seminal paper of Hestenes and Stiefel [2] in 1952. This link has been exploited by Gene H. Golub and his collaborators to bound or estimate the A-norm of the error in CG during the iterations; see [1, 3,4,5,6,7,8,9,10,11,12,13].

Using a fixed node \(\mu \) smaller than the smallest eigenvalue of A and the Gauss-Radau quadrature rule, an upper bound for the A-norm of the error can be easily computed. Note that it is useful to have an upper bound of the error norm to stop the CG iterations. In theory, the closer \(\mu \) is to the smallest eigenvalue, the closer is the bound to the norm.

Concerning the approximation properties of the upper bound, we observed in many examples that in earlier iterations, the bound is approximating the A-norm of the error quite well, and that the quality of approximation is improving with increasing iterations. However, in later CG iterations, the bound suddenly becomes worse: it is delayed, almost independent of \(\mu \), and does not represent a good approximation to the A-norm of the error any more. Such a behavior of the upper bound can be observed also in exact arithmetic. Therefore, the problem of the loss of accuracy of the upper bound in later iterations is not directly linked to rounding errors and has to be explained.

The Gauss quadrature bounds of the error norm were obtained by using the connection of CG to the Lanczos algorithm and modifications of the tridiagonal matrix which is generated by this algorithm and implicitly by CG. This is why we start in Section 2 with the Lanczos algorithm. In Section 3 we discuss the relation with CG and how the Gauss-Radau upper bound is computed. A model problem showing the problems arising with the Gauss-Radau bound in “exact” arithmetic is constructed in Section 4. In Sections 5 to 7 we give an analysis that explains that the problems start when the distance of the smallest Ritz value to the smallest eigenvalue becomes smaller than the distance of \(\mu \) to the smallest eigenvalue. We also explain why the Gauss-Radau upper bound becomes almost independent of \(\mu \). In Section 8 we present an algorithm for improving the upper bounds in previous CG iterations such that the relative accuracy of the upper bounds is guaranteed to be smaller than a prescribed tolerance. Conclusions are given in Section 9.

2 The Lanczos algorithm

Given a starting vector v and a symmetric matrix \(A\in \mathbb {R}^{N\times N}\), one can consider a sequence of nested Krylov subspaces

$$\begin{aligned} \mathcal {K}_{k}(A,v)\equiv \textrm{span}\{v,Av,\dots ,A^{k-1}v\},\qquad k=1,2,\dots \end{aligned}$$

The dimension of these subspaces can increase up to an index n called the grade of v with respect to A, at which the maximal dimension is attained, and \(\mathcal {K}_{n}(A,v)\) is invariant under multiplication with A.

figure a

Lanczos algorithm.

Assuming that \(k<n\), the Lanczos algorithm (Algorithm 1) computes an orthonormal basis \(v_{1},\dots ,v_{k+1}\) of the Krylov subspace \(\mathcal {K}_{k+1}(A,v)\). The basis vectors \(v_{j}\) satisfy the matrix relation

$$\begin{aligned} AV_{k}=V_{k}T_{k}+\beta _{k}v_{k+1}e_{k}^{T} \end{aligned}$$

where \(e_{k}\) is the last column of the identity matrix of order k, \(V_{k}=[v_{1}\cdots v_{k}]\) and \(T_{k}\) is the \(k\times k\) symmetric tridiagonal matrix of the recurrence coefficients computed in Algorithm 1:

$$\begin{aligned} T_{k}=\left[ \begin{array}{cccc} \alpha _{1} &{} \beta _{1}\\ \beta _{1} &{} \ddots &{} \ddots \\ &{} \ddots &{} \ddots &{} \beta _{k-1}\\ &{} &{} \beta _{k-1} &{} \alpha _{k} \end{array}\right] . \end{aligned}$$

The coefficients \(\beta _{j}\) being positive, \(T_{k}\) is a so-called Jacobi matrix. If A is positive definite, then \(T_{k}\) is positive definite as well. In the following we will assume for simplicity that the eigenvalues of A are simple and sorted such that

$$\begin{aligned} \lambda _{1}<\lambda _{2}<\cdots <\lambda _{N}. \end{aligned}$$

2.1 Approximation of the eigenvalues

The eigenvalues of \(T_{k}\) (Ritz values) are usually used as approximations to the eigenvalues of A. The quality of the approximation can be measured using \(\beta _{k}\) and the last components of the normalized eigenvectors of \(T_{k}\). In more detail, consider the spectral decomposition of \(T_{k}\) in the form

$$\begin{aligned} T_{k}=S_{k}\Theta _{k}S_{k}^{T},\quad \Theta _{k}=\textrm{diag}\left( \theta _{1}^{(k)},\dots ,\theta _{k}^{(k)}\right) ,\quad S_{k}^{T}S_{k}=S_{k}S_{k}^{T}=I_{k}, \end{aligned}$$

where \(I_k\) is the identity matrix of order k, and assume that the Ritz values are sorted such that

$$\begin{aligned} \theta _{1}^{(k)}<\theta _{2}^{(k)}<\cdots <\theta _{k}^{(k)}. \end{aligned}$$

Denote \(s_{i,j}^{(k)}\) the entries and \(s_{:,j}^{(k)}\) the columns of \(S_{k}\). Then it holds that

$$\begin{aligned} \min _{i=1,\dots ,N}|\lambda _{i}-\theta _{j}^{(k)}|\le \left\| A\left( V_{k}s_{:,j}^{(k)}\right) -\theta _{j}^{(k)}\left( V_{k}s_{:,j}^{(k)}\right) \right\| =\beta _{k}|s_{k,j}^{(k)}|, \end{aligned}$$
(1)

\(j=1,\dots ,k\). Since the Ritz values \(\theta _{j}^{(k)}\) can be seen as Rayleigh quotients, one can improve the bound (1) using the gap theorem; see [14, p. 244] or [15, p. 206]. In particular, let \(\lambda _{\ell }\) be an eigenvalue of A closest to \(\theta _{j}^{(k)}\). Then

$$\begin{aligned} |\lambda _{\ell }-\theta _{j}^{(k)}|\le \frac{\left( \beta _{k}s_{k,j}^{(k)}\right) ^{2}}{\textrm{gap}_{j}^{(k)}},\qquad \textrm{gap}_{j}^{(k)}=\min _{i\ne \ell }|\lambda _{i}-\theta _{j}^{(k)}|. \end{aligned}$$

In the following we will be interested in the situation when the smallest Ritz value \(\theta _{1}^{(k)}\) closely approximates the smallest eigenvalue of A. If \(\lambda _{1}\) is the eigenvalue of A closest to \(\theta _{1}^{(k)}>\lambda _{1}\), then using the gap theorem and [14, Corollary 11.7.1 on p. 246],

$$\begin{aligned} \frac{\left( \beta _{k}s_{k,1}^{(k)}\right) ^{2}}{\lambda _{n}-\lambda _{1}}\le \theta _{1}^{(k)}-\lambda _{1}\le \frac{\left( \beta _{k}s_{k,1}^{(k)}\right) ^{2}}{\lambda _{2}-\theta _{1}^{(k)}}, \end{aligned}$$
(2)

giving the bounds

$$\begin{aligned} \lambda _{2}-\theta _{1}^{(k)}\le \frac{\left( \beta _{k}s_{k,1}^{(k)}\right) ^{2}}{\theta _{1}^{(k)}-\lambda _{1}}\le \lambda _{n}-\lambda _{1}. \end{aligned}$$
(3)

It is known (see, for instance, [16]) that the squares of the last components of the eigenvectors are given by

$$\begin{aligned} \left( s_{k,j}^{(k)}\right) ^{2}=\left|\frac{\chi _{1,k-1}(\theta _{j}^{(k)})}{\chi _{1,k}^{'}(\theta _{j}^{(k)})}\right|, \end{aligned}$$

where \(\chi _{1,\ell }\) is the characteristic polynomial of \(T_{\ell }\) and \(\chi _{1,\ell }^{'}\) denotes its derivative, i.e.,

$$\begin{aligned} \left( s_{k,j}^{(k)}\right) ^{2}=\frac{\theta _{j}^{(k)}-\theta _{1}^{(k-1)}}{\theta _{j}^{(k)}-\theta _{1}^{(k)}}\cdots \frac{\theta _{j}^{(k)}-\theta _{j-1}^{(k-1)}}{\theta _{j}^{(k)}-\theta _{j-1}^{(k)}}\frac{\theta _{j}^{(k-1)}-\theta _{j}^{(k)}}{\theta _{j+1}^{(k)}-\theta _{j}^{(k)}}\cdots \frac{\theta _{k-1}^{(k-1)}-\theta _{j}^{[k)}}{\theta _{k}^{(k)}-\theta _{j}^{(k)}}. \end{aligned}$$

The right-hand side is positive due to the interlacing property of the Ritz values for symmetric tridiagonal matrices. In particular,

$$\begin{aligned} \left( s_{k,1}^{(k)}\right) ^{2}=\frac{\theta _{1}^{(k-1)}-\theta _{1}^{(k)}}{\theta _{2}^{(k)}-\theta _{1}^{(k)}}\cdots \frac{\theta _{k-1}^{(k-1)}-\theta _{1}^{(k)}}{\theta _{k}^{(k)}-\theta _{1}^{(k)}}. \end{aligned}$$
(4)

When the smallest Ritz value \(\theta _{1}^{(k)}\) converges to \(\lambda _{1}\), this last component squared converges to zero; see also (3).

2.2 Modification of the tridiagonal matrix

Given \(\mu < \theta _{1}^{(k)}\), let us consider the problem of finding the coefficient \(\alpha _{k+1}^{(\mu )}\) such that the modified matrix

$$\begin{aligned} T_{k+1}^{(\mu )}=\left[ \begin{array}{ccccc} \alpha _{1} &{} \beta _{1}\\ \beta _{1} &{} \ddots &{} \ddots \\ &{} \ddots &{} \ddots &{} \beta _{k-1}\\ &{} &{} \beta _{k-1} &{} \alpha _{k} &{} \beta _{k}\\ &{} &{} &{} \beta _{k} &{} \alpha _{k+1}^{(\mu )} \end{array}\right] \end{aligned}$$
(5)

has the prescribed \(\mu \) as an eigenvalue. The connection of this problem to Gauss-Radau quadrature rule will be explained in Section 3.

In [17, pp. 331-334] it has been shown that at iteration \(k+1\)

$$\begin{aligned} \alpha _{k+1}^{(\mu )}\,=\,\mu +\zeta _{k}^{(\mu )} \end{aligned}$$

where \(\zeta _{k}^{(\mu )}\) is the last component of the vector y, solution of the linear system

$$\begin{aligned} (T_{k}-\mu I)y=\beta _{k}^{2}e_{k}. \end{aligned}$$
(6)

From [10, Section 3.4], the modified coefficients \(\alpha _{k+1}^{(\mu )}\) can be computed recursively using

$$\begin{aligned} \alpha _{j+1}^{(\mu )}=\mu +\frac{\beta _{j}^{2}}{\alpha _{j}-\alpha _{j}^{(\mu )}},\qquad \alpha _{1}^{(\mu )}=\mu ,\qquad j=1,\dots ,k. \end{aligned}$$
(7)

Using the spectral factorization of \(T_{k}\), we can now prove the following lemma.

Lemma 1

Let \(\mu <\theta _{1}^{(k)}\). Then it holds that

$$\begin{aligned} \alpha _{k+1}^{(\mu )}\,=\,\mu +\sum _{i=1}^{k}\eta _{i,k}^{(\mu )},\qquad \eta _{i,k}^{(\mu )}\equiv \frac{\left( \beta _{k}s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }. \end{aligned}$$
(8)

If \(\mu<\lambda <\theta _{1}^{(k)}\), then \(\alpha _{k+1}^{(\mu )}<\alpha _{k+1}^{(\lambda )}\). Consequently, if \(\mu <\theta _{1}^{(k+1)},\) then \(\alpha _{k+1}^{(\mu )}<\alpha _{k+1}\).

Proof

Since \(\mu <\theta _{1}^{(k)}\) the matrix \(T_{k}-\mu I\) in (6) is positive definite and, therefore, nonsingular. Hence,

$$\begin{aligned} \zeta _{k}^{(\mu )}=e_{k}^{T}y=\beta _{k}^{2}e_{k}^{T}(T_{k}-\mu I)^{-1}e_{k}=\sum _{i=1}^{k}\frac{\left( \beta _{k}s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu } \end{aligned}$$
(9)

so that (8) holds. From (8) it is obvious that if \(\mu<\lambda <\theta _{1}^{(k)}\), then \(\alpha _{k+1}^{(\mu )}<\alpha _{k+1}^{(\lambda )}\).

Finally, taking \(\lambda =\theta _{1}^{(k+1)}<\theta _{1}^{(k)}\) (because of the interlacing of the Ritz values) we obtain \(\alpha _{k+1}^{(\lambda )}=\alpha _{k+1}\) by construction. \(\square \)

3 CG and error norm estimation

When solving a linear system \(Ax=b\) with a symmetric and positive definite matrix A, the CG method (Algorithm 2) is the method of choice. In exact arithmetic, the CG iterates \(x_{k}\) minimize the A-norm of the error over the manifold \(x_{0}+\mathcal {K}_{k}(A,r_{0})\),

$$\begin{aligned} \Vert x-x_{k}\Vert _{A}=\min _{y\in x_{0}+\mathcal {K}_{k}(A,r_{0})}\Vert x-y\Vert _{A}, \end{aligned}$$

and the residual vectors \(r_{k}=b-Ax_{k}\) are proportional to the Lanczos vectors \(v_{j}\),

$$\begin{aligned} v_{j+1}=(-1)^{j}\frac{r_{j}}{\Vert r_{j}\Vert }\,,\qquad j=0,\dots ,k. \end{aligned}$$
figure b

Conjugate gradient algorithm.

Thanks to this close relationship between the CG and Lanczos algorithms, it can be shown (see, for instance, [16]) that the recurrence coefficients computed in both algorithms are connected via \(\alpha _{1}=\gamma _{0}^{-1}\) and

$$\begin{aligned} \beta _{j}=\frac{\sqrt{\delta _{j}}}{\gamma _{j-1}},\quad \alpha _{j+1}=\frac{1}{\gamma _{j}}+\frac{\delta _{j}}{\gamma _{j-1}},\quad j=1,\dots ,k-1. \end{aligned}$$
(10)

Writing (10) in matrix form, we find out that CG computes implicitly the \(LDL^{T}\) factorization \(T_{k}=L_{k}D_{k}L_{k}^{T}\), where

$$\begin{aligned} L_{k}=\left[ \begin{array}{cccc} 1\\ \sqrt{\delta _{1}} &{} \ddots \\ &{} \ddots &{} \ddots \\ &{} &{} \sqrt{\delta _{k-1}} &{} 1 \end{array}\right] ,\quad D_{k}=\left[ \begin{array}{cccc} \gamma _{0}^{-1}\\ &{} \ddots \\ &{} &{} \ddots \\ &{} &{} &{} \gamma _{k-1}^{-1} \end{array}\right] . \end{aligned}$$
(11)

Hence the matrix \(T_{k}\) is known implicitly in CG.

3.1 Modification of the factorization of \(T_{k+1}\)

Similarly as in Section 2.2 we can ask how to modify the Cholesky factorization of \(T_{k+1}\), that is available in CG, such that the resulting matrix \(T_{k+1}^{(\mu )}\) given implicitly in factorized form has the prescribed eigenvalue \(\mu \). In more detail, we look for a coefficient \(\gamma _{k}^{(\mu )}\) such that

$$\begin{aligned} T_{k+1}^{(\mu )}=L_{k+1}\left[ \begin{array}{cc} D_{k}\\ &{} \left( \gamma _{k}^{(\mu )}\right) ^{-1} \end{array}\right] L_{k+1}^{T}. \end{aligned}$$

This problem was solved in [10] leading to an updating formula for computing the modified coefficients

$$\begin{aligned} \gamma _{j+1}^{(\mu )}=\frac{\gamma _{j}^{(\mu )}-\gamma _{j}}{\mu (\gamma _{j}^{(\mu )}-\gamma _{j})+\delta _{j+1}},\ j=1,\dots ,k-1,\qquad \gamma _{0}^{(\mu )}=\frac{1}{\mu }. \end{aligned}$$
(12)

Moreover, \(\gamma _{k}^{(\mu )}\) can be obtained directly from the modified coefficient \(\alpha _{k+1}^{(\mu )}\),

$$\begin{aligned} \gamma _{k}^{(\mu )}=\frac{1}{\alpha _{k+1}^{(\mu )}-\frac{\delta _{k}}{\gamma _{k-1}}}, \end{aligned}$$
(13)

and vice-versa, see [10, p. 173 and 181].

3.2 Quadrature-based bounds in CG

We now briefly summarize the idea of deriving the quadrature-based bounds used in this paper. For a more detailed description, see, e.g., [5,6,7, 10,11,12,13].

Let \(A=Q\Lambda Q^T\) be the spectral decomposition of A, with \(Q=[q_1,\dots ,q_N]\) orthonormal and \(\Lambda =\textrm{diag}(\lambda _1,\dots ,\lambda _N)\). As we said above, for simplicity of notation, we assume that the eigenvalues of A are distinct and ordered as \(\lambda _1< \lambda _2< \dots < \lambda _N\). Let us define the weights \(\omega _{i}\) by

$$\begin{aligned} \omega _{i}\equiv \frac{(r_{0},q_{i})^{2}}{\Vert r_0 \Vert ^2}\qquad \text {so that}\qquad \sum _{i=1}^{N}\omega _{i}=1\,, \end{aligned}$$

and the (nondecreasing) stepwise constant distribution function \(\omega (\lambda )\) with a finite number of points of increase \(\lambda _{1},\lambda _{2},\dots ,\lambda _{N}\),

$$\begin{aligned} \omega (\lambda )\equiv \;\left\{ \; \begin{array}{ccl} 0 &{} \text {for} &{} \lambda<\lambda _{1}\,,\\[1mm] \sum _{j=1}^{i}\omega _{j} &{} \text {for} &{} \lambda _{i}\le \lambda <\lambda _{i+1}\,,\quad 1\le i\le N-1\,.\\[1mm] 1 &{} \text {for} &{} \lambda _{N}\le \lambda \,. \end{array}\right. \, \end{aligned}$$

Having the distribution function \(\omega (\lambda )\) and an interval \(\langle \eta ,\xi \rangle \) such that \(\eta<\lambda _{1}<\lambda _{2}<\dots<\lambda _{N}<\xi \), for any continuous function f, one can define the Riemann-Stieltjes integral (see, for instance, [18])

$$\begin{aligned} \int _{\eta }^{\xi }f(\lambda )\, d\omega (\lambda ) = \sum _{i=1}^{N}\omega _{i}f(\lambda _{i}). \end{aligned}$$

For \(f(\lambda )=\lambda ^{-1}\), we obtain the integral representation of \(\Vert x-x_0\Vert _A^2\),

$$\begin{aligned} \int _{\eta }^{\xi }\lambda ^{-1}\, d\omega (\lambda )= & {} \Vert r_0\Vert ^{-2}\Vert x-x_0\Vert _A^2. \end{aligned}$$
(14)

Using the optimality of CG it can be shown that CG implicitly determines nodes and weights of the k-point Gauss quadrature approximation to the Riemann-Stieltjes integral (14). The nodes are given by the eigenvalues of \(T_k\), and the weights by the squared first components of the normalized eigenvectors of \(T_k\). The corresponding Gauss quadrature rule can be written in the form

$$\begin{aligned} \int _{\eta }^{\xi }\lambda ^{-1}\, d\omega (\lambda ) = (T_k^{-1})_{1,1} + \frac{\Vert x-x_k\Vert _A^2}{\Vert r_0\Vert ^{2}}, \end{aligned}$$
(15)

where \((T_k^{-1})_{1,1}\) represents the Gauss quadrature approximation, and the reminder is nothing but the scaled and squared A-norm of the kth error, i.e., the quantity of our interest.

To approximate the integral (14), one can also apply a modified quadrature rule. In this paper we consider the Gauss-Radau quadrature rule consisting in prescribing a node \(0<\mu \le \lambda _1\) and choosing the other nodes and weights to maximize the degree of exactness of the quadrature rule. We can write the corresponding Gauss-Radau quadrature rule in the form

$$\begin{aligned} \int _{\eta }^{\xi }\lambda ^{-1}\, d\omega (\lambda ) = (({T}_k^{(\mu )})^{-1})_{1,1} + {\mathcal {R}}_k^{(\mu )}, \end{aligned}$$

where the reminder \({\mathcal {R}}_k^{(\mu )}\) is negative, and \({T}_k^{(\mu )}\) is given by (5).

The idea of deriving (basic) quadrature-based bounds in CG is to consider the Gauss quadrature rule (15) at iteration k, and a (eventually modified) quadrature rule at iteration \(k+1\),

$$\begin{aligned} \frac{\Vert x-x_{0}\Vert _{A}^{2}}{\Vert r_{0} \Vert ^{2}}=\left( \widehat{T}_{k+1}^{-1}\right) _{1,1}+\widehat{\mathcal {R}}_{k+1}, \end{aligned}$$
(16)

where \(\widehat{T}_{k+1}=T_{k+1}\) when using the Gauss rule and \(\widehat{T}_{k+1}={T}_{k+1}^{(\mu )} \) in the case of using the Gauss-Radau rule. From the equations (15) and (16) we get

$$\begin{aligned} \Vert x-x_{k}\Vert _{A}^{2} = \left[ \Vert r_{0} \Vert ^{2} \left( \left( \widehat{T}_{k+1}^{-1}\right) _{1,1}-\left( T_{k}^{-1}\right) _{1,1}\right) \right] + \widehat{\mathcal {R}}_{k+1}. \end{aligned}$$
(17)

The term in square brackets represents either a lower bound on \(\Vert x-x_{k}\Vert _{A}^{2}\) if \(\widehat{T}_{k+1}=T_{k+1}\) (because of the positive reminder), or an upper bound if \(\widehat{T}_{k+1} = {T}_{k+1}^{(\mu )}\) (because of the negative reminder). In both cases, the term in square brackets can easily be evaluated using the available CG related quantities. In particular, the lower bound is given by \(\gamma _k \Vert r_k \Vert ^2\), and the upper bound by \(\gamma _{k}^{(\mu )}\Vert r_{k}\Vert ^{2}\), where \(\gamma _{k}^{(\mu )}\) can be updated using (12).

To summarize results of [5, 7, 12], and [1, 10, 11] related to the Gauss and Gauss-Radau quadrature bounds for the A-norm of the error in CG, it has been shown that

$$\begin{aligned} \gamma _{k}\Vert r_{k}\Vert ^{2}\le \Vert x-x_{k}\Vert _{A}^{2}<\gamma _{k}^{(\mu )}\Vert r_{k}\Vert ^{2}<\left( \frac{\Vert r_{k}\Vert ^{2}}{\mu \Vert p_{k}\Vert ^{2}}\right) \Vert r_{k}\Vert ^{2} \end{aligned}$$
(18)

for \(k<n-1\), and \(\mu \) such that \(0<\mu \le \lambda _{1}\). Note that in the special case \(k=n-1\) it holds that \(\Vert x-x_{n-1}\Vert _{A}^{2}=\gamma _{n-1}\Vert r_{n-1}\Vert ^{2}\). If the initial residual \(r_{0}\) has a nontrivial component in the eigenvector corresponding to \(\lambda _{1}\), then \(\lambda _{1}\) is an eigenvalue of \(T_{n}\). If in addition \(\mu \) is chosen such that \(\mu =\lambda _{1}\), then \(\gamma _{n-1}=\gamma _{n-1}^{(\mu )}\) and the second inequality in (18) changes to equality. The last inequality is strict also for \(k=n-1\).

The rightmost bound in (18), that will be called the simple upper bound in the following, was derived in [1]. The norm \(\Vert p_k\Vert \) is not available in CG, but the ratio

$$\begin{aligned} \phi _{k}=\frac{\left\| r_{k}\right\| ^{2}}{\left\| p_{k}\right\| ^{2}} \end{aligned}$$

can be computed efficiently using

$$\begin{aligned} \phi _{j+1}^{-1}=1+\phi _{j}^{-1}\delta _{j+1},\qquad \phi _{0}=1. \end{aligned}$$
(19)

Note that at an iteration \(\ell \le k\) we can obtain a more accurate bound using

$$\begin{aligned} \Vert x-x_{\ell }\Vert _{A}^{2}=\sum _{j=\ell }^{k-1}\gamma _{j}\left\| r_{j}\right\| ^{2}+\Vert x-x_{k}\Vert _{A}^{2}, \end{aligned}$$
(20)

by applying the basic bounds (18) to the last term in (20); see [1] for details on the construction of more accurate bounds. In practice, however, one runs the CG algorithm, and estimates the error in a backward way, i.e., \(k-\ell \) iterations back. The adaptive choice of the delay \(k-\ell \) when using the Gauss quadrature lower bound was discussed recently in [19].

In the following we will we concentrate on the analysis of the behavior of the basic Gauss-Radau upper bound

$$\begin{aligned} \gamma _{k}^{(\mu )}\Vert r_{k}\Vert ^{2} \end{aligned}$$
(21)

in dependence of the choice of \(\mu \). As already mentioned, we observed in many examples that in earlier iterations, the bound is approximating the squared A-norm of the error quite well, but in later iterations it becomes worse, it is delayed and almost independent of \(\mu \). We observed that this phenomenon is related to the convergence of the smallest Ritz value to the smallest eigenvalue \(\lambda _1\). In particular, the bound is getting worse if the smallest Ritz value approximates \(\lambda _1\) better than \(\mu \). This often happens during finite precision computations when convergence of CG is delayed because of rounding errors and there are clusters of Ritz values approximating individual eigenvalues of A. Usually, such clusters arise around the largest eigenvalues. At some iteration, each eigenvalue of A can be approximated by a Ritz value, while the A-norm of the error still does not reach the required level of accuracy, and the process will continue and place more and more Ritz values in the clusters. In this situation, it can happen that \(\lambda _1\) is tightly (that is, to a high relative accuracy) approximated by a Ritz value while the CG process still continues. Note that if A has well separated eigenvalues and we run the experiment in exact arithmetic, then \(\lambda _1\) is usually tightly approximated by a Ritz value only in the last iterations. The above observation is key for constructing a motivating example, in which we can readily observe the studied phenomenon also in exact arithmetic, and which will motivate our analysis.

4 The model problem and a numerical experiment

In the construction of the motivating example we use results presented in [16, 20,21,22,23]. Based on the work by Chris Paige [24], Anne Greenbaum [20] proved that the results of finite precision CG computations can be interpreted (up to some small inaccuracies) as the results of the exact CG algorithm applied to a larger system with the system matrix having many eigenvalues distributed throughout “tiny” intervals around the eigenvalues of the original matrix. The experiments show that “tiny” means of the size comparable to \(\textbf{u}\Vert A\Vert \), where \(\textbf{u}\) is the roundoff unit. This result was used in [21] to predict the behavior of finite precision CG. Inspired by [20,21,22] we will construct a linear system \(Ax=b\) with similar properties as the one suggested by Greenbaum [20]. However, we want to emphasize and visualize some phenomenons concerning the behavior of the basic Gauss-Radau upper bound (21). Therefore, we choose the size of the intervals around the original eigenvalues larger than \(\textbf{u}\Vert A\Vert \).

We start with the test problem \(\Lambda y=w\) from [23], where \(w=m^{-1/2}(1,\dots ,1)^{T}\) and \(\Lambda =\textrm{diag}(\hat{\lambda }_{1},\dots ,\hat{\lambda }_{m})\),

$$\begin{aligned} \hat{\lambda }_{i}=\hat{\lambda }_{1}+\frac{i-1}{m-1}(\hat{\lambda }_{m}-\hat{\lambda }_{1})\rho ^{m-i},\quad i=2,\ldots ,m. \end{aligned}$$
(22)

The diagonal matrix \(\Lambda \) and the vector w determine the stepwise distribution function \(\omega (\lambda )\) with points of increase \(\hat{\lambda }_{i}\) and the individual jumps (weights) \(\omega _{j}=m^{-1}\),

$$\begin{aligned} \omega (\lambda )\equiv \;\left\{ \;\begin{array}{ccl} 0 &{} \text {for} &{} \lambda<\hat{\lambda }_{1}\,,\\[1mm] \sum _{j=1}^{i}\omega _{j} &{} \text {for} &{} \hat{\lambda }_{i}\le \lambda <\hat{\lambda }_{i+1}\,,\quad 1\le i\le m-1\,,\\[1mm] 1 &{} \text {for} &{} \hat{\lambda }_{m}\le \lambda \,. \end{array}\right. \, \end{aligned}$$
(23)

We construct a blurred distribution function \(\widetilde{\omega }(\lambda )\) having clusters of points of increase around the original eigenvalues \(\hat{\lambda }_{i}\). We consider each cluster to have the same radius \(\delta \), and let the number \(c_i\) of points in the ith cluster grow linearly from 1 to p,

$$\begin{aligned} c_{i}=\textrm{round}\left( \frac{p-1}{m-1}i+\frac{m-p}{m-1}\right) ,\quad i=1,\dots ,m. \end{aligned}$$

The blurred eigenvalues

$$\begin{aligned} \widetilde{\lambda }_{j}^{(i)},\quad j=1,\dots ,c_{i}, \end{aligned}$$

are uniformly distributed in \([\hat{\lambda }_{i}-\delta ,\hat{\lambda }_{i}+\delta ]\), with the corresponding weights given by

$$\begin{aligned} \widetilde{\omega }_{j}^{(i)}=\frac{\omega _{i}}{c_{i}}\quad j=1,\dots ,c_{i}, \end{aligned}$$

i.e., the weights that correspond to the ith cluster are equal, and their sum is \(\omega _{i}\). Having defined the blurred distribution function \(\widetilde{\omega }(\lambda )\) we can construct the corresponding Jacobi matrix \(T\in {\mathbb R}^{N\times N}\) in a numerically stable way using the Gragg and Harrod rkpw algorithm [25]. Note that the mapping from the nodes and weights of the computed quadrature to the recurrence coefficients is generally well-conditioned [26, p. 59]. To construct the above-mentioned Jacobi matrix T we used Matlab’s vpa arithmetic with 128 digits. Finally, we define the double precision data A and b that will be used for experimenting as

$$\begin{aligned} A=\textrm{double}(T),\quad b=e_{1}, \end{aligned}$$
(24)

where \(e_{1}\in {\mathbb R}^{N}\) is the first column of the identity matrix. We decided to use double precision input data since we can easily compare results of our computations performed in Matlab’s vpa arithmetic with results obtained using double precision arithmetic for the same input data.

In our experiment we choose \(m=12\), \(\hat{\lambda }_{1}=10^{-6}\), \(\hat{\lambda }_{m}=1\), \(\rho =0.8\), \(\delta =10^{-10}\), and \(p=4\), resulting in \(N=30.\) Let us run the “exact” CGQ algorithm of [10] on the model problem (24) constructed above, where exact arithmetic is simulated using Matlab’s variable precision with digits=128. Let \(\lambda _{1}\) be the exact smallest eigenvalue of A. We use four different values of \(\mu \) for the computation of the Gauss-Radau upper bound (21): \(\mu _{3}=(1-10^{-3})\lambda _{1}\), \(\mu _{8}=(1-10^{-8})\lambda _{1}\), \(\mu _{16}\) which denotes the double precision number closest to \(\lambda _{1}\) such that \(\mu _{16}\le \lambda _{1}\), and \(\mu _{50}=(1-10^{-50})\lambda _{1}\) which is almost like the exact value. Note that \(\gamma _{k}^{(\mu )}\) is updated using (12).

Figure 1 shows the A-norm of the error \(\Vert x-x_{k-1}\Vert _{A}\) (solid curve), the upper bounds for the considered values of \(\mu _{i}\), \(i=3,8,16,50\) (dotted solid curves), and the rightmost bound in (18) (the simple upper bound) for \(\mu _{50}\) (dashed curve). The dots represent the values \(\theta _{1}^{(k)}-\lambda _{1}\), i.e., the distances of the smallest Ritz values \(\theta _{1}^{(k)}\) to \(\lambda _{1}\). The horizontal dotted lines correspond to the values of \(\lambda _{1}-\mu _{i}\), \(i=3,8,16\).

Fig. 1
figure 1

\(\Vert x-x_{k-1}\Vert _{A}\), upper bounds and the distance of \(\theta _{1}^{(k)}\) to \(\lambda _{1}\), digits=128

The Gauss-Radau upper bounds in Fig. 1 first overestimate, and then closely approximate \(\Vert x-x_{k-1}\Vert _{A}\) (starting from iteration 5). However, at some point, the Gauss-Radau upper bounds start to differ significantly from \(\Vert x-x_{k-1}\Vert _{A}\) and represent worse approximations, except for \(\mu _{50}\). We observe that for a given \(\mu _{i}\), \(i=3,8,16\), the upper bounds are delayed when the distance of \(\theta _{1}^{(k)}\) to \(\lambda _{1}\) becomes smaller than the distance of \(\mu _{i}\) to \(\lambda _{1}\), i.e., when

$$\begin{aligned} \theta _{1}^{(k)}-\lambda _{1}<\lambda _{1}-\mu _{i}. \end{aligned}$$
(25)

If (25) holds, then the smallest Ritz value \(\theta _{1}^{(k)}\) is a better approximation to \(\lambda _{1}\) than \(\mu _{i}\). This moment is emphasized using vertical dashed lines that connect the value \(\theta _{1}^{(k)}-\lambda _{1}\) with \(\Vert x-x_{k-1}\Vert _{A}\) in the first iteration k such that (25) holds. Moreover, below a certain level, the upper bounds become almost independent of \(\mu _{i}\), \(i=3,8,16\), and visually coincide with the simple upper bound. The closer is \(\mu \) to \(\lambda _1\), the later this phenomenon occurs.

Depending on the validity of (25), we distinguish between phase 1 and phase 2 of convergence. If the inequality (25) does not hold, i.e., if \(\mu \) is a better approximation to \(\lambda _{1}\) than the smallest Ritz value, then we say we are in phase 1. If (25) holds, then the smallest Ritz value is closer to \(\lambda _{1}\) than \(\mu \) and we are in phase 2.

Obviously, the beginning of phase 2 depends on the choice of \(\mu \) and on the convergence of the smallest Ritz value to the smallest eigenvalue. Note that for \(\mu =\mu _{50}\) we are always in phase 1 before we stop the iterations.

In the given experiment as well as in many practical problems, the delay of the upper bounds is not large (just a few iterations), and the bounds can still provide a useful information for stopping the algorithm. However, we have also encountered examples where the delay of the Gauss-Radau upper bound was about 200 iterations; see, e.g., [1, Fig. 10] or [19, Fig. 2] concerning the matrix s3dkt3m2. Hence, we believe that this phenomenon deserves attention and explanation.

5 Analysis

The upper bounds are computed from the modified tridiagonal matrices (5) discussed in Section 2.2, that differ only in one coefficient at the position \((k+1,k+1)\). Therefore, the first step of the analysis is to understand how the choice of \(\mu \) and the validity of the condition (25) influences the value of the modified coefficient

$$\begin{aligned} \alpha _{k+1}^{(\mu )}= & {} \mu +\sum _{i=1}^{k}\eta _{i,k}^{(\mu )},\qquad \eta _{i,k}^{(\mu )}=\frac{\left( \beta _{k}s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }; \end{aligned}$$
(26)

see (8). We will compare its value to a modified coefficient for which phase 2 does not occur; see Fig. 1 for \(\mu _{50}\).

Based on that understanding we will then address further important questions. First, our aim is to explain the behavior of the basic Gauss-Radau upper bound (21) in phase 2, in particular, its closeness to the simple upper bound (18). Second, for practical reasons, without knowing \(\lambda _{1}\), we would like to be able to detect phase 2, i.e., the first iteration k for which the inequality (25) starts to hold. Finally, we address the problem of how to improve the accuracy of the basic Gauss-Radau upper bound (21) in phase 2.

We first analyze the relation between two modified coefficients \(\alpha _{k+1}^{(\mu )}\) and \(\alpha _{k+1}^{(\lambda )}\) where \(0<\mu<\lambda <\theta _{1}^{(k)}.\)

Lemma 2

Let \(0<\mu<\lambda <\theta _{1}^{(k)}\). Then

$$\begin{aligned} \frac{\eta _{i,k}^{(\lambda )}-\eta _{i,k}^{(\mu )}}{\eta _{i,k}^{(\mu )}}=\frac{\lambda -\mu }{\theta _{i}^{(k)}-\lambda } \end{aligned}$$
(27)

and

$$\begin{aligned} \alpha _{k+1}^{(\lambda )}-\alpha _{k+1}^{(\mu )}=\left( \frac{\lambda -\mu }{\theta _{1}^{(k)}-\mu }\right) \eta _{1,k}^{(\lambda )}+\left( \lambda -\mu \right) E_{k}^{(\lambda ,\mu )} \end{aligned}$$
(28)

where

$$\begin{aligned} E_{k}^{(\lambda ,\mu )}\equiv 1+\sum _{i=2}^{k}\frac{\eta _{i,k}^{(\lambda )}}{\theta _{i}^{(k)}-\mu } \end{aligned}$$
(29)

satisfies \(E_{k}^{(\lambda ,\mu )}=E_{k}^{(\mu ,\lambda )}.\)

Proof

From the definition of \(\eta _{i,k}^{(\mu )}\) and \(\eta _{i,k}^{(\lambda )}\), it follows immediately

$$\begin{aligned} \frac{\eta _{i,k}^{(\lambda )}}{\theta _{i}^{(k)}-\mu }=\frac{\eta _{i,k}^{(\mu )}}{\theta _{i}^{(k)}-\lambda }, \end{aligned}$$

which implies \(E_{k}^{(\lambda ,\mu )}=E_{k}^{(\mu ,\lambda )}\) and (27).

Note that \(0<\eta _{i,k}^{(\mu )}<\eta _{i,k}^{(\lambda )}\). Using (27), the difference of the coefficients \(\alpha \)’s is

$$\begin{aligned} \begin{aligned} \alpha _{k+1}^{(\lambda )}-\alpha _{k+1}^{(\mu )}&= \left( \lambda -\mu \right) +\sum _{i=1}^{k}\left( \eta _{i,k}^{(\lambda )}-\eta _{i,k}^{(\mu )}\right) \\&= {(\lambda -\mu )+(\lambda -\mu )\sum _{i=1}^{k}\frac{\eta _{i,k}^{(\mu )}}{\theta _{i}^{(k)}-\lambda }} \\&= {(\lambda -\mu )\frac{\eta _{1,k}^{(\lambda )}}{\theta _{1}^{(k)}-\mu } + (\lambda -\mu )\left( 1+\sum _{i=2}^{k}\frac{\eta _{i,k}^{(\mu )}}{\theta _{i}^{(k)}-\lambda }\right) } \end{aligned} \end{aligned}$$

which implies (28). \(\square \)

5.1 Assumptions

Let us describe in more detail the situation we are interested in. In the analysis that follows we will assume implicitly the following.

  1. 1.

    \(\lambda _{1}\) is well separated from \(\lambda _{2}\) so that we can use the gap theorem mentioned in Section 2.1, in particular relation (3) bounding \(\eta _{1,k}^{(\lambda _1)}\).

  2. 2.

    \(\mu \) is a tight underestimate to \(\lambda _{1}\) such that

    $$\begin{aligned} \lambda _{1}-\mu \ll \lambda _{2}-\lambda _{1}. \end{aligned}$$
    (30)
  3. 3.

    The smallest Ritz value \(\theta _1^{(k)}\) converges to \(\lambda _1\) with increasing k so that there is an iteration index k from which

    $$\begin{aligned} \theta _1^{(k)}-\lambda _1 \ll \lambda _1-\mu . \end{aligned}$$

Let us briefly comment on these assumptions. The assumption that \(\lambda _{1}\) is well separated from \(\lambda _{2}\) is used later to prove that \(\eta _{1,k}^{(\lambda _1)}\) is bounded away from zero; see (33). If there is a cluster of eigenvalues around \(\lambda _1\), one can still observe the discussed phenomenon of loss of accuracy of the upper bound, but a theoretical analysis would be much more complicated. Note that the first assumption is also often satisfied for a system matrix \(\hat{A}\) that models finite precision CG behavior, if the original matrix A has well separated eigenvalues \(\lambda _1\) and \(\lambda _2\). Using results of Greenbaum [20] we know that \(\hat{A}\) can have many eigenvalues distributed throughout tiny intervals around the eigenvalues of A. We have constructed the model matrix \(\hat{A}\) in many numerical experiments, using the procedure suggested in [20]. We found out that the constructed \(\hat{A}\) has usually clusters of eigenvalues around the larger eigenvalues of A while a smaller eigenvalue of A is usually approximated by just one eigenvalue of \(\hat{A}\). Therefore, the analysis presented below can then be applied to the matrix \(\hat{A}\) that models the finite precision CG behavior.

If \(\mu \) is not a tight underestimate, then the Gauss-Radau upper bound is usually not a very good approximation of the A-norm of the error. Then the condition (25) can hold from the beginning and phase 1 need not happen.

Finally, in theory, the smallest Ritz value need not converge to \(\lambda _1\) until the last iteration [27]. But, in that case, there won’t be any problem for the Gauss-Radau upper bound. However, in practical computations, we very often observe the convergence of \(\theta _1^{(k)}\) to \(\lambda _1\). In particular, in cases of matrices \(\hat{A}\) with clustered eigenvalues that model finite precision behavior of CG, \(\theta _1^{(k)}\) approximates \(\lambda _1\) to a high relative accuracy usually earlier before the A-norm of the error reaches the ultimate level of accuracy.

5.2 The modified coefficient \(\alpha _{k+1}^{(\mu )}\)

Below we would like to compare \(\alpha _{k+1}^{(\lambda _{1})}\) for which phase 2 does not occur with \(\alpha _{k+1}^{(\mu )}\)for which phase 2 occurs; see Fig. 1. Using (27) and (30), we are able to compare the individual \(\eta \)-terms. In particular, for \(i>1\) we get

$$\begin{aligned} \frac{\eta _{i,k}^{(\lambda _{1})}-\eta _{i,k}^{(\mu )}}{\eta _{i,k}^{(\mu )}}= {\frac{\lambda _{1}-\mu }{\theta _{i}^{(k)}-\lambda _{1}}} <\frac{\lambda _{1}-\mu }{\lambda _{2}-\lambda _{1}}\ll 1, \end{aligned}$$

where we have used \(\theta _i^{(k)} > \lambda _2\) for \(i>1\). Therefore,

$$\begin{aligned} \eta _{i,k}^{(\lambda _{1})}\approx \eta _{i,k}^{(\mu )}\quad \text {for}\quad i>1. \end{aligned}$$

Hence, \(\alpha _{k+1}^{(\mu )}\) can significantly differ from \(\alpha _{k+1}^{(\lambda _{1})}\) only in the first term of the sum in (26) for which

$$\begin{aligned} \frac{\eta _{1,k}^{(\lambda _{1})}-\eta _{1,k}^{(\mu )}}{\eta _{1,k}^{(\mu )}}=\frac{\lambda _{1}-\mu }{\theta _{1}^{(k)}-\lambda _{1}}. \end{aligned}$$
(31)

If \(\theta _{1}^{(k)}\) is a better approximation to \(\lambda _{1}\) than \(\mu \) in the sense of (25), then (31) shows that \(\eta _{1,k}^{(\lambda _{1})}\) can be much larger than \(\eta _{1,k}^{(\mu )}\). As a consequence, \(\alpha _{k+1}^{(\lambda _{1})}\) can differ significantly from \(\alpha _{k+1}^{(\mu )}\). On the other hand, if \(\mu \) is chosen such that

$$\begin{aligned} \lambda _{1}-\mu \ll \theta _{1}^{(k)}-\lambda _{1}, \end{aligned}$$

for all k we are interested in, then phase 2 will not occur, and

$$\begin{aligned} \alpha _{k+1}^{(\lambda _1)}-\alpha _{k+1}^{(\mu )} = \left( \lambda _1-\mu \right) +\sum _{i=1}^{k}\left( \eta _{i,k}^{(\lambda _1)}-\eta _{i,k}^{(\mu )}\right) \approx 0\,, \end{aligned}$$

since \(\mu \) is assumed to be a tight approximation to \(\lambda _1\) and \(\eta _{i,k}^{(\lambda _1)}\approx \eta _{i,k}^{(\mu )}\) for all i.

In the following we discuss phase 1 and phase 2 in more detail.

In phase 1,

$$\begin{aligned} \lambda _{1}-\mu <\theta _{1}^{(k)}-\lambda _{1}, \end{aligned}$$

and, therefore, all components \(\eta _{i,k}^{(\mu )}\) (including \(\eta _{1,k}^{(\mu )}\)) are not sensitive to small changes of \(\mu \); see (27). In other words, the coefficients \(\alpha _{k+1}^{(\mu )}\) are approximately the same for various choices of \(\mu \).

Let us denote

$$\begin{aligned} {h_k}=\frac{\lambda _{1}-\mu }{\theta _{1}^{(k)}-\lambda _{1}}<1. \end{aligned}$$

In fact, we can write \(\theta _{1}^{(k)}-\mu =\theta _{1}^{(k)}-\lambda _{1}+\lambda _{1}-\mu \) and use the Taylor expansion of \(1/(1+{h_k})\). It yields

$$\begin{aligned} \frac{1}{\theta _{1}^{(k)}-\mu }= & {} \frac{1}{\theta _{1}^{(k)}-\lambda _{1}}{\left( \frac{1}{h_k+1}\right) =\frac{1}{\theta _{1}^{(k)}-\lambda _{1}}\left[ 1-h_k+h_k^{2}-h_k^{3}+\cdots \right] }. \end{aligned}$$

Obviously, \(h_k\) is an increasing function of the iteration number k; the numerator is constant while the denominator is decreasing in absolute value. The size of \(h_k\) depends also on how well \(\mu \) approximates \(\lambda _1\). If \(\mu \) is a tight approximation to \(\lambda _1\), then, at the beginning of the CG iterations, the denominator of \(h_k\) can be large compared to the numerator, \(h_k\) is small and the right-hand side of \(1/(\theta _{1}^{(k)}-\mu )\) is almost given by \(1/(\theta _{1}^{(k)}-\lambda _{1})\), independent of \(\mu \). We observed that the first term of the sum of the \(\eta _{i,k}^{(\mu )}\) is then usually the largest one.

Let us now discuss phase 2. First recall that for any \(0<\mu <\lambda _{1}\) it holds that

$$\begin{aligned} \alpha _{k+1}^{(\mu )}<\alpha _{k+1}^{(\lambda _{1})}\quad \text {and}\quad \eta _{1,k}^{(\mu )}<\eta _{1,k}^{(\lambda _{1})}. \end{aligned}$$
(32)

As before, suppose that \(\lambda _{1}\) is well separated from \(\lambda _{2}\) and that (30) holds. Phase 2 begins when \(\theta _{1}^{(k)}\) is a better approximation to \(\lambda _{1}\) than \(\mu \), i.e., when (25) holds. Since \(\theta _{1}^{(k)}\) is a tight approximation to \(\lambda _{1}\) in phase 2, (3) and (25) imply that

$$\begin{aligned} \eta _{1,k}^{(\lambda _{1})}\ge \lambda _{2}-\theta _{1}^{(k)} = \lambda _{2}-\lambda _{1} + \lambda _1-\theta _{1}^{(k)} > (\lambda _{2}-\lambda _{1}) - (\lambda _1-\mu ). \end{aligned}$$
(33)

Therefore, using (30), \(\eta _{1,k}^{(\lambda _{1})}\) is bounded away from zero. On the other hand, (3) also implies that

$$\begin{aligned} \eta _{1,k}^{(\mu )}=\frac{\theta _{1}^{(k)}-\lambda _{1}}{\theta _{1}^{(k)}-\mu }\eta _{1,k}^{(\lambda _{1})}\le \frac{\theta _{1}^{(k)}-\lambda _{1}}{\theta _{1}^{(k)}-\mu }\left( \lambda _{n}-\lambda _{1}\right) \end{aligned}$$

and as \(\theta _{1}^{(k)}\) converges to \(\lambda _{1}\), \(\eta _{1,k}^{(\mu )}\) goes to zero. Therefore,

$$\begin{aligned} \alpha _{k+1}^{(\mu )}\approx \mu +\sum _{i=2}^{k}\eta _{i,k}^{(\mu )}, \end{aligned}$$

and the sum on the right-hand side is almost independent of \(\mu \). Note that having two values \(0<\mu<\lambda <\lambda _{1}\) such that

$$\begin{aligned} \theta _{1}^{(k)}-\lambda _{1}<\lambda _{1}-\lambda \quad \text {and}\quad \lambda -\mu \ll \lambda _{2}-\lambda _{1}, \end{aligned}$$
(34)

then one can expect that

$$\begin{aligned} \alpha _{k+1}^{(\mu )}\approx \alpha _{k+1}^{(\lambda )} \end{aligned}$$
(35)

because \(\eta _{1,k}^{(\mu )}\) and \(\eta _{1,k}^{(\lambda )}\) converge to zero and \(\eta _{i,k}^{(\mu )}\approx \eta _{i,k}^{(\lambda )}\) for \(i>1\) due to

$$\begin{aligned} \frac{\eta _{i,k}^{(\lambda )}-\eta _{i,k}^{(\mu )}}{\eta _{i,k}^{(\mu )}}=\frac{\lambda -\mu }{\theta _{i}^{(k)}-\lambda }<\frac{\lambda -\mu }{\lambda _{2}-\lambda _{1}}\ll 1, \end{aligned}$$

where we have used (27) and the assumption (34). Therefore, \(\alpha _{k+1}^{(\mu )}\) is relatively insensitive to small changes of \(\mu \) and the same is true for the upper bound (21).

5.3 The coefficient \(\alpha _{k+1}\)

The coefficient \(\alpha _{k+1}\) can also be written as

$$\begin{aligned} \alpha _{k+1}=\alpha _{k+1}^{(\mu )}\quad \text {for}\quad \mu =\theta _{1}^{(k+1)}, \end{aligned}$$

and the results of Lemmas 1 and 2 are still valid, even though, in practice, \(\mu \) must be smaller than \(\lambda _{1}\). Using (28) we can express the differences between the coefficients, it holds that

$$\begin{aligned} \alpha _{k+1}-\alpha _{k+1}^{(\lambda _{1})}= & {} \eta _{1,k}^{(\lambda _{1})}\frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\theta _{1}^{(k+1)}}+\left( \theta _{1}^{(k+1)}-\lambda _{1}\right) E_{k}^{(\theta _{1}^{(k+1)},\lambda _{1})}. \end{aligned}$$
(36)

If the smallest Ritz value \(\theta _{1}^{(k+1)}\) is close to \(\lambda _{1}\), then the second term of the right-hand side in (36) will be negligible in comparison to the first one, since

$$\begin{aligned} E_{k}^{(\theta _{1}^{(k+1)},\lambda _{1})}=\mathcal {O}(1), \end{aligned}$$

see (29), and since \(\eta _{1,k}^{(\lambda _{1})}\) is bounded away from zero; see (33). Therefore, one can expect that

$$\begin{aligned} \alpha _{k+1}-\alpha _{k+1}^{(\lambda _{1})}\ \approx \ \eta _{1,k}^{(\lambda _{1})}\frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\theta _{1}^{(k+1)}}. \end{aligned}$$
(37)

The size of the term on the right-hand side is related to the speed of convergence of the smallest Ritz value \(\theta _{1}^{(k)}\) to \(\lambda _{1}\). Denoting

$$\begin{aligned} \frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\lambda _{1}}=\rho _{k}<1, \end{aligned}$$

we obtain

$$\begin{aligned} \frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\theta _{1}^{(k+1)}}=\frac{\frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\lambda _{1}}}{1-\frac{\theta _{1}^{(k+1)}-\lambda _{1}}{\theta _{1}^{(k)}-\lambda _{1}}}=\frac{\rho _{k}}{1-\rho _{k}}. \end{aligned}$$

For example, if the convergence of \(\theta _{1}^{(k)}\) to \(\lambda _{1}\) is superlinear, i.e., if \(\rho _{k}\rightarrow 0\), then \(\alpha _{k+1}\) and \(\alpha _{k+1}^{(\lambda _{1})}\) are close.

5.4 Numerical experiments

Let us demonstrate numerically the theoretical results described in previous sections using our model problem. To compute the following results, we, again, use Matlab’s vpa arithmetic with 128 decimal digits.

Fig. 2
figure 2

First term \(\eta _{1,k}^{(\mu )}\), maximum term \(\eta _{i,k}^{(\mu )}\), and the sum \(\zeta _{k}^{(\mu )}\) for \(\mu =\mu _{3}\)

We first consider \(\mu =\mu _{3}=(1-10^{-3})\lambda _{1}\) for which we have \(\lambda _{1}-\mu =10^{-9}\). The switch from phase 1 to phase 2 occurs at iteration 13. Figure 2 displays the first term \(\eta _{1,k}^{(\mu )}\) and the maximum term \(\eta _{i,k}^{(\mu )}\) as well as the sum \(\zeta _{k}^{(\mu )}\) defined by (9), see Lemma 1, as a function of the iteration number k. In phase 1 the first term \(\eta _{1,k}^{(\mu )}\) is the largest one. As predicted, after the start of phase 2, the first term is decreasing quite fast.

Fig. 3
figure 3

First term \(\eta _{1,k}^{(\mu )}\), maximum term \(\eta _{i,k}^{(\mu )}\), and the sum \(\zeta _{k}^{(\mu )}\) for \(\mu =\mu _{8}\)

Let us now use \(\mu =\mu _{8}=(1-10^{-8})\lambda _{1}\) for which we have \(\lambda _{1}-\mu =10^{-14}\). The switch from phase 1 to phase 2 occurs at iteration 15; see Fig. 3. The conclusions are the same as for \(\mu _{3}\).

Fig. 4
figure 4

First term \(\eta _{1,k}^{(\mu )}\), maximum term \(\eta _{i,k}^{(\mu )}\), and the sum \(\zeta _{k}^{(\mu )}\) for \(\mu =\mu _{50}\)

The behavior of the first term is completely different for \(\mu =(1-10^{-50})\lambda _{1}\) which almost corresponds to using the exact smallest eigenvalue \(\lambda _{1}\).

The maximum term of the sum is then almost always the first one; see Fig. 4. Remember that, for this value of \(\mu \), we are always in phase 1.

Fig. 5
figure 5

Comparison of the sums \(\zeta _{k}^{(\mu _{3})}\), \(\zeta _{k}^{(\mu _{8})}\), and \(\zeta _{k}^{(\mu _{50})}\)

Finally, in Fig. 5 we present a comparison of the sums \(\zeta _{k}^{(\mu )}\) for \(\mu _{3}\), \(\mu _{8}\), and \(\mu _{50}\). We observe that from the beginning up to iteration 12, all sums visually coincide. Starting from iteration 13 we enter phase 2 for \(\mu =\mu _{3}\) and the sum \(\zeta _{k}^{(\mu _{3})}\) starts to differ significantly from the other sums, in particular from the “reference” term \(\zeta _{k}^{(\mu _{50})}\). Similarly, for \(k=15\) we enter phase 2 for \(\mu =\mu _{8}\) and \(\zeta _{k}^{(\mu _{8})}\) and \(\zeta _{k}^{(\mu _{50})}\) start to differ. We can also observe that \(\zeta _{k}^{(\mu _{3})}\) and \(\zeta _{k}^{(\mu _{8})}\) significantly differ only in iterations 13, 14, and 15, i.e., when we are in phase 2 for \(\mu =\mu _{3}\) but in phase 1 for \(\mu =\mu _{8}\). In all other iterations, \(\zeta _{k}^{(\mu _{3})}\) and \(\zeta _{k}^{(\mu _{8})}\) visually coincide.

Fig. 6
figure 6

\(\alpha _{k}^{(\mu _{3})}\), \(\alpha _{k}^{(\mu _{8})}\), \(\alpha _{k}^{(\lambda _{1})}\), and \(\alpha _{k}\)

In Fig. 6 we plot the coefficients \(\alpha _{k}^{(\mu _{3})}\), \(\alpha _{k}^{(\mu _{8})}\), \(\alpha _{k}^{(\lambda _{1})}\) and \(\alpha _{k}\), so that we can compare the observed behavior with the predicted one. Phase 2 starts for \(\mu _{3}\) at iteration 13, and for \(\mu _{8}\) at iteration 15; see also Fig. 1. For \(k\le 13\) we observe that

$$\begin{aligned} \alpha _{k}^{(\mu _{3})}\approx \alpha _{k}^{(\mu _{8})}\approx \alpha _{k}^{(\lambda _{1})} \end{aligned}$$

as explained in Section 5.2 and \(\alpha _k\) is larger. For \(k\ge 16\), the first terms \(\eta _{1,k-1}^{(\mu _{3})}\) and \(\eta _{1,k-1}^{(\mu _{8})}\) are close to zero, and, as explained in Section 5.2,

$$\begin{aligned} \alpha _{k}^{(\mu _{3})}\approx \alpha _{k}^{(\mu _{8})}. \end{aligned}$$

For \(k=14\) and \(k=15\), \(\alpha _{k}^{(\mu _{3})}\) and \(\alpha _{k}^{(\mu _{8})}\) can differ significantly because \(\alpha _{k}^{(\mu _{3})}\) is already in phase 2 while \(\alpha _{k}^{(\mu _{8})}\) is still in phase 1.

We can also observe that \(\alpha _{k}\) can be very close to \(\alpha _{k}^{(\lambda _{1})}\) when the smallest Ritz value \(\theta _{1}^{(k)}\) is a tight approximation to \(\lambda _{1}\), i.e., in later iterations. We know that the closeness of \(\alpha _{k}\) to \(\alpha _{k}^{(\lambda _{1})}\) depends on the speed of convergence of the smallest Ritz value to \(\lambda _{1}\); see (37) and the corresponding discussion.

6 The Gauss-Radau bound in phase 2

Our aim in this section is to investigate the relation between the basic Gauss-Radau upper bound (21) and the simple upper bound; see (18). Recall the notation

$$\begin{aligned} \phi _{k}=\frac{\left\| r_{k}\right\| ^{2}}{\left\| p_{k}\right\| ^{2}}; \end{aligned}$$

see (19). In particular, we would like to explain why the two bounds almost coincide in phase 2. Note that using (13) we obtain

$$\begin{aligned} \alpha _{k+1}^{(\mu )}=\left( \gamma _{k}^{(\mu )}\right) ^{-1}+\frac{\delta _{k}}{\gamma _{k-1}} \end{aligned}$$
(38)

and from (8) it follows

$$\begin{aligned} \alpha _{k+1}^{(\mu )}= & {} \mu +\beta _{k}^{2}e_{k}^{T}\left( T_{k}-\mu I\right) ^{-1}e_{k},\qquad \beta _{k}^{2}=\frac{1}{\gamma _{k-1}}\frac{\delta _{k}}{\gamma _{k-1}}. \end{aligned}$$

Therefore,

$$\begin{aligned} \left( \gamma _{k}^{(\mu )}\right) ^{-1}= & {} \mu +\beta _{k}^{2}\left( e_{k}^{T}\left( T_{k}-\mu I\right) ^{-1}e_{k}-\gamma _{k-1}\right) . \end{aligned}$$
(39)

In the following lemma we give another expression for \(e_{k}^{T}\left( T_{k}-\mu I\right) ^{-1}e_{k}\).

Lemma 3

Let \(0<\mu <\theta _{1}^{(k)}\). Then it holds that

$$\begin{aligned} e_{k}^{T}\left( T_{k}-\mu I\right) ^{-1}e_{k}=\gamma _{k-1}+\mu \frac{\gamma _{k-1}^{2}}{\phi _{k-1}}+\sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{\left( s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }. \end{aligned}$$
(40)

Proof

Since \(\left\| \mu T_{k}^{-1}\right\| <1\), we obtain using a Neumann series

$$\begin{aligned} \left( T_{k}-\mu I\right) ^{-1}=\left( I-\mu T_{k}^{-1}\right) ^{-1}T_{k}^{-1}= & {} \left( \sum _{j=0}^{\infty }\mu ^{j}T_{k}^{-j}\right) T_{k}^{-1} \end{aligned}$$

so that

$$\begin{aligned} e_{k}^{T}\left( T_{k}-\mu I\right) ^{-1}e_{k}=e_{k}^{T}T_{k}^{-1}e_{k}+\mu e_{k}^{T}T_{k}^{-2}e_{k}+\sum _{j=2}^{\infty }\mu ^{j}e_{k}^{T}T_{k}^{-(j+1)}e_{k}. \end{aligned}$$

We now express the terms on the right-hand side using the CG coefficients and the quantities from the spectral factorization of \(T_{k}\). Using \(T_{k}=L_{k}D_{k}L_{k}^{T}\) we obtain \(e_{k}^{T}T_{k}^{-1}e_{k}=\gamma _{k-1}\). After some algebraic manipulation, see, e.g., [28, p. 1369] we get

$$\begin{aligned} T_{k}^{-1}e_{k}=\gamma _{k-1}\Vert r_{k-1}\Vert \left[ \begin{array}{c} \frac{(-1)^{k-1}}{\Vert r_{0}\Vert }\\ \vdots \\ \frac{1}{\Vert r_{k-1}\Vert } \end{array}\right] \end{aligned}$$

so that

$$\begin{aligned} e_{k}^{T}T_{k}^{-2}e_{k}=e_{k}^{T}T_{k}^{-1}T_{k}^{-1}e_{k}=\gamma _{k-1}^{2}\sum _{i=0}^{k-1}\frac{\Vert r_{k-1}\Vert ^{2}}{\Vert r_{i}\Vert ^{2}}=\gamma _{k-1}^{2}\frac{\Vert p_{k-1}\Vert ^{2}}{\Vert r_{k-1}\Vert ^{2}}=\frac{\gamma _{k-1}^{2}}{\phi _{k-1}}. \end{aligned}$$

Finally,

$$\begin{aligned} e_{k}^{T}\left( \sum _{j=2}^{\infty }\mu ^{j}T_{k}^{-(j+1)}\right) e_{k}= & {} e_{k}^{T}S_{k}\left( \sum _{j=2}^{\infty }\mu ^{j}\Theta _{k}^{-(j+1)}\right) S_{k}^{T}e_{k} \end{aligned}$$

where the diagonal entries of the diagonal matrix

$$\begin{aligned} \sum _{j=2}^{\infty }\mu ^{j}\Theta _{k}^{-(j+1)} \end{aligned}$$

have the form

$$\begin{aligned} \frac{1}{\theta _{i}^{(k)}}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\sum _{j=0}^{\infty }\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{j}= & {} \frac{1}{\theta _{i}^{(k)}}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{1}{1-\frac{\mu }{\theta _{i}^{(k)}}}=\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{1}{\theta _{i}^{(k)}-\mu }. \end{aligned}$$

Hence,

$$\begin{aligned} e_{k}^{T}\left( \sum _{j=2}^{\infty }\mu ^{j}T_{k}^{-(j+1)}\right) e_{k}=\sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{\left( s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }. \end{aligned}$$

\(\square \)

Based on the previous lemma we can now express the coefficient \(\gamma _{k}^{(\mu )}\).

Theorem 1

Let \(0<\mu <\theta _{1}^{(k)}\). Then it holds that

$$\begin{aligned} \left( \gamma _{k}^{(\mu )}\right) ^{-1}=\frac{\mu }{\phi _{k}}+\sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\eta _{i,k}^{(\mu )}. \end{aligned}$$
(41)

Proof

We start with (39). Using the previous lemma

$$\begin{aligned} \begin{aligned} \left( \gamma _{k}^{(\mu )}\right) ^{-1}&= \mu +\mu \beta _{k}^{2}\gamma _{k-1}^{2}\phi _{k-1}^{-1}+\beta _{k}^{2}e_{k}^{T}\left( \sum _{j=2}^{\infty }\mu ^{j}T_{k}^{-(j+1)}\right) e_{k}\\&= \mu \left( 1+\delta _{k}\phi _{k-1}^{-1}\right) +\beta _{k}^{2}\sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{\left( s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }\\&= \mu \phi _{k}^{-1}+\sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\frac{\left( \beta _{k}s_{k,i}^{(k)}\right) ^{2}}{\theta _{i}^{(k)}-\mu }, \end{aligned} \end{aligned}$$

where we have used relation (19). \(\square \)

Obviously, using (41), the basic Gauss-Radau upper bound (21) and the simple upper bound in (18) are close to each other if and only if

$$\begin{aligned} \sum _{i=1}^{k}\left( \frac{\mu }{\theta _{i}^{(k)}}\right) ^{2}\eta _{i,k}^{(\mu )}\,\ll \,\frac{\mu }{\phi _{k}}, \end{aligned}$$
(42)

which can also be written as

$$\begin{aligned} \left( \frac{\mu }{\theta _{1}^{(k)}}\right) ^{2}\frac{\eta _{1,k}^{(\mu )}}{\mu }+\sum _{i=2}^{k}\left( \frac{\beta _{k}s_{k,i}^{(k)}}{\theta _{i}^{(k)}}\right) ^{2}\frac{\mu }{\theta _{i}^{(k)}-\mu }\,\ll \,\phi _{k}^{-1}. \end{aligned}$$
(43)

Under the assumptions formulated in Section 5.1, in particular that \(\lambda _{1}\) is well separated from \(\lambda _{2}\), and that \(\mu \) is a tight underestimate to \(\lambda _{1}\) in the sense of (30), the sum of terms on the left-hand side of (43) can be replaced by its tight upper bound

$$\begin{aligned} \left( \frac{\lambda _{1}}{\theta _{1}^{(k)}}\right) ^{2}\frac{\eta _{1,k}^{(\mu )}}{\mu }+\sum _{i=2}^{k}\left( \frac{\beta _{k}s_{k,i}^{(k)}}{\theta _{i}^{(k)}}\right) ^{2}\frac{\lambda _{1}}{\theta _{i}^{(k)}-\lambda _{1}} \end{aligned}$$
(44)

which simplifies the explanation of the dependence of the sum in (43) on \(\mu \).

The second term in (44) is independent of \(\mu \) and its size depends only on the behavior of the underlying Lanczos process. Here

$$\begin{aligned} \left( \frac{\beta _{k}s_{k,i}^{(k)}}{\theta _{i}^{(k)}}\right) ^{2}=\frac{\left\| A\left( V_{k}s_{:,i}^{(k)}\right) -\theta _{i}^{(k)}\left( V_{k}s_{:,i}^{(k)}\right) \right\| ^{2}}{\left( \theta _{i}^{(k)}\right) ^{2}} \end{aligned}$$
(45)

can be seen as the relative accuracy to which the ith Ritz value approximates an eigenvalue, and the size of the term

$$\begin{aligned} \frac{\lambda _{1}}{\theta _{i}^{(k)}-\lambda _{1}},\qquad i\ge 2, \end{aligned}$$
(46)

depends on the position of \(\theta _{i}^{(k)}\) relatively to the smallest eigenvalue. In particular, one can expect that the term (46) can be of size \(\mathcal {O}(1)\) if \(\theta _{i}^{(k)}\) approximates smallest eigenvalues, and it is small if \(\theta _{i}^{(k)}\) approximates largest eigenvalues.

Using the previous simplifications and assuming phase 2, the basic Gauss-Radau upper bound (21) and the rightmost upper bound in (18) are close to each other if and only if

$$\begin{aligned} \frac{\eta _{1,k}^{(\mu )}}{\mu }+\sum _{i=2}^{k}\left( \frac{\beta _{k}s_{k,i}^{(k)}}{\theta _{i}^{(k)}}\right) ^{2}\frac{\lambda _{1}}{\theta _{i}^{(k)}-\lambda _{1}}\,\ll \,\phi _{k}^{-1}. \end{aligned}$$
(47)

From Section 5.2 we know that \(\eta _{1,k}^{(\mu )}\) goes to zero in phase 2. Hence, if

$$\begin{aligned} \eta _{1,k}^{(\mu )}<\mu , \end{aligned}$$
(48)

which will happen for k sufficiently large, then the first term in (47) is smaller than the term on the right-hand side.

As already mentioned, the sum of positive terms in (47) depends only on approximation properties of the underlying Lanczos process, that are not easy to predict in general. Inspired by our model problem described in Section 4, we can just give an intuitive explanation why the sum could be small in phase 2.

Phase 2 occurs in later CG iterations and it is related to the convergence of the smallest Ritz value to the smallest eigenvalue. If the smallest eigenvalue is well approximated by the smallest Ritz value (to a high relative accuracy), then one can expect that many eigenvalues of A are relatively well approximated by Ritz values. If the eigenvalue \(\lambda _{j}\) of A is well separated from the other eigenvalues and if it is well approximated by a Ritz value, then the corresponding term (45) measuring the relative accuracy to which \(\lambda _{j}\) is approximated, is going to be small.

In particular, in our model problem, the smallest eigenvalues are well separated from each other, and in phase 2 they are well approximated by Ritz values. Therefore, the corresponding terms (45) are small. Hence, the Ritz values that did not converge yet in phase 2, are going to approximate eigenvalues in clusters which do not correspond to smallest eigenvalues, i.e., for which the terms (46) are small; see also Figs. 3 and 2. In our model problem, the sum of positive terms in (47) is small in phase 2 because either (45) or (46) are small. Therefore, one can expect that the validity of (47) will mainly depend on the size of the first term in (47); see Fig. 7.

Fig. 7
figure 7

The first and second term in (44), left-hand side of (43), and \(\phi _k^{-1}\)

The size of the sum of positive terms in (47) obviously depends on the clustering and the distribution of the eigenvalues, and we cannot guarantee in general that it will be small in phase 2. For example, it need not be small if the smallest eigenvalues of A are clustered.

7 Detection of phase 2

For our model problem it is not hard to detect phase 2 from the coefficients that are available during the computations. We first observe, see Fig. 7, that the coefficients

$$\begin{aligned} \gamma _{k}^{(\mu )}\quad \text {and}\quad \frac{\phi _{k}}{\mu }, \end{aligned}$$
(49)

and the corresponding bounds (21) and (18) visually coincide from the beginning up to some iteration \(\ell _{1}\). From iteration \(\ell _{1}+1\), the Gauss-Radau upper bound (21) starts to be a much better approximation to the squared A-norm of the error than the simple upper bound (18). When phase 2 occurs, the Gauss-Radau upper bound (21) loses its accuracy and, starting from iteration \(\ell _{2}\) (approximately when (48) holds), it will again visually coincide with the simple upper bound (18). We observe that phase 2 occurs at some iteration k where the two coefficients (49) significantly differ, i.e., for \(\ell _{1}<k<\ell _{2}.\) To measure the agreement between the coefficients (49), we can use the easily computable relative distance

$$\begin{aligned} \frac{\frac{\phi _{k}}{\mu }-\gamma _{k}^{(\mu )}}{\gamma _{k}^{(\mu )}}=\phi _{k}\left[ \left( \frac{\mu }{\theta _{1}^{(k)}}\right) ^{2} \frac{\eta _{1,k}^{(\mu )}}{\mu }+\sum _{i=2}^{k}\left( \frac{\beta _{k}s_{k,i}^{(k)}}{\theta _{i}^{(k)}}\right) ^{2}\frac{\mu }{\theta _{i}^{(k)} -\mu }\right] . \end{aligned}$$
(50)

We will consider this relative distance to be small, if it is smaller than 0.5.

Fig. 8
figure 8

The behavior of the relative distance in (50) for various values of \(\mu \)

The behavior of the term in (50) for various values of \(\mu \) is shown in Fig. 8. The index \(\ell _{1}=12\) is the same for all considered values of \(\mu \). For \(\mu _{3}\) we get \(\ell _{2}=15\) (red circle), for \(\mu _{8}\) we get \(\ell _{2}=18\) (magenta circle), for \(\mu _{16}\) \(\ell _{2}=25\) (blue circle), and finally, for \(\mu _{50}\) there is no index \(\ell _{2}\).

As explained in the previous section, in more complicated cases we cannot guarantee in general a similar behavior of the relative distance (50) as in our model problem. For example, in many practical problems we sometimes observe a staircase behavior of the A-norm of the error, when few iterations of stagnation are followed by few iterations of rapid convergence. In such cases, the quantity (50) can oscillate several times and it can be impossible to use it for detecting phase 2. Therefore, in general, we are not able to detect the beginning of phase 2 using (50) reliably. Nevertheless, in particular cases, the formulas (41) and (50) can be helpful.

8 Upper bounds with a guaranteed accuracy

In some applications it might be of interest to obtain upper bounds on the A-norm of the error that are sufficiently accurate. From the previous sections we know that the basic Gauss-Radau upper bound at iteration k can be delayed, and, therefore, it can overestimate the quantity of interest significantly. Nevertheless, going back in the convergence history, we can easily find an iteration index \(\ell \le k\) such that for all \(0\le i \le \ell \), the sufficiently accurate upper bound can be found. To find such \(\ell \), we will use the ideas described in [12] and [19].

For integers \(k\ge j \ge \ell \ge 0\), let us denote

$$\begin{aligned} \Delta _{j}=\gamma _{j}\left\| r_{j}\right\| ^{2},\quad \Delta _{\ell :k}=\sum _{j=\ell }^{k}\Delta _{j},\quad \text {and}\quad \Delta _{j:j-1}=0. \end{aligned}$$

Denoting \(\varepsilon _{j} \equiv \Vert x - x_j \Vert _A^2\), the relation (20) takes the form

$$\begin{aligned} \varepsilon _{\ell }=\Delta _{\ell :k-1}+\varepsilon _{k}, \end{aligned}$$
(51)

A more accurate bound at iteration \(\ell \) is obtained such that the last term in (51) is replaced by the basic lower or upper bounds on \(\varepsilon _{k}\). In particular, the improved Gauss-Radau upper bound at iteration \(\ell \) can be defined as

$$\begin{aligned} \Omega {}_{\ell :k}^{(\mu )}\,=\,\Delta _{\ell :k-1}+\gamma _{k}^{(\mu )}\left\| r_{k}\right\| ^{2}, \end{aligned}$$
(52)

and the improved Gauss lower bound is given by \(\Delta _{\ell :k}\).

To guarantee the relative accuracy of the improved Gauss-Radau upper bound, we would like to find the largest iteration index \(\ell \le k\) in the convergence history such that

$$\begin{aligned} \frac{\Omega {}_{\ell :k}^{(\mu )}-\varepsilon _{\ell }}{\varepsilon _{\ell }}\le \tau \end{aligned}$$
(53)

where \(\tau \) is a prescribed tolerance, say, \(\tau =0.25\). Since

$$\begin{aligned} \frac{{{\Omega {}_{\ell :k}^{(\mu )}}}-\varepsilon _{\ell }}{\varepsilon _{\ell }}<\frac{\Omega {}_{\ell :k}^{(\mu )}-\Delta _{\ell :k}}{\Delta _{\ell :k}}=\frac{\left\| r_{k}\right\| ^{2}\left( \gamma _{k}^{(\mu )}-\gamma _{k}\right) }{\Delta _{\ell :k}}, \end{aligned}$$

we can require \(\ell \le k\) to be the largest integer such that

$$\begin{aligned} \frac{\left\| r_{k}\right\| ^{2}\left( \gamma _{k}^{(\mu )}-\gamma _{k}\right) }{\Delta _{\ell :k}}\le \tau . \end{aligned}$$
(54)

If (54) holds, then also (53) holds. The just described adaptive strategy for obtaining \(\ell \) giving a sufficiently accurate upper bound is summarized in Algorithm 3.

figure c

CG with the improved Gauss-Radau upper bound.

Note that

$$\begin{aligned} \frac{\Omega {}_{\ell :k}^{(\mu )}-\varepsilon _{\ell }}{\varepsilon _{\ell }}+\frac{\varepsilon _{\ell }-\Delta _{\ell :k}}{\varepsilon _{\ell }}<\frac{\Omega {}_{\ell :k}^{(\mu )}-\Delta _{\ell :k}}{\Delta _{\ell :k}}, \end{aligned}$$

i.e., if (54) holds, then \(\tau \) represents also an upper bound on the sum of relative errors of the improved lower and upper bounds. In other words, if \(\ell \) is such that (54) is satisfied, then both the improved Gauss-Radau upper bound as well as the improved Gauss lower bound are sufficiently accurate. For a heuristic strategy focused on improving the accuracy of the Gauss lower bound, see [19].

In the previous sections we have seen that the basic Gauss-Radau upper bound is delayed, in particular in phase 2. The delay of the basic Gauss-Radau upper bound can be defined as the smallest nonnegative integer j such that

$$\begin{aligned} \gamma _{\ell +j+1}^{(\mu )}\left\| r_{\ell +j+1}\right\| ^{2}<\varepsilon _{\ell }. \end{aligned}$$
(55)

Having sufficiently accurate lower and upper bounds (e.g., if (54) is satisfied), we can approximately determine the delay of the basic Gauss-Radau upper bound as the smallest j satisfying (55) where \(\varepsilon _{\ell }\) in (55) is replaced by its tight lower bound \(\Delta _{\ell :k}\).

9 Conclusions

In this paper we discussed and analyzed the behavior of the Gauss-Radau upper bound on the A-norm of the error in CG. In particular, we concentrated on the phenomenon observed during computations showing that, in later CG iterations, the upper bound loses its accuracy, it is almost independent of \(\mu \), and visually coincides with the simple upper bound. We explained that this phenomenon is closely related to the convergence of the smallest Ritz value to the smallest eigenvalue of A. It occurs when the smallest Ritz value is a better approximation to the smallest eigenvalue than the prescribed underestimate \(\mu \). We developed formulas that can be helpful in understanding this behavior. Note that the loss of accuracy of the Gauss-Radau upper bound is not directly linked to rounding errors in computations of the bound, but it is related to the finite precision behavior of the underlying Lanczos process. In more detail, the phenomenon can occur when solving linear systems with clustered eigenvalues. However, the results of finite precision CG computations can be seen (up to some small inaccuracies) as the results of the exact CG algorithm applied to a larger system with the system matrix having clustered eigenvalues. Therefore, one can expect that the discussed phenomenon can occur in practical computations not only when A has clustered eigenvalues, but also whenever orthogonality is lost in the CG algorithm.