Skip to main content
Log in

Convergence Theory for Preconditioned Eigenvalue Solvers in a Nutshell

  • Published:
Foundations of Computational Mathematics Aims and scope Submit manuscript

Abstract

Preconditioned iterative methods for numerical solution of large matrix eigenvalue problems are increasingly gaining importance in various application areas, ranging from material sciences to data mining. Some of them, e.g., those using multilevel preconditioning for elliptic differential operators or graph Laplacian eigenvalue problems, exhibit almost optimal complexity in practice; i.e., their computational costs to calculate a fixed number of eigenvalues and eigenvectors grow linearly with the matrix problem size. Theoretical justification of their optimality requires convergence rate bounds that do not deteriorate with the increase of the problem size. Such bounds were pioneered by E. D’yakonov over three decades ago, but to date only a handful have been derived, mostly for symmetric eigenvalue problems. Just a few of known bounds are sharp. One of them is proved in doi:10.1016/S0024-3795(01)00461-X for the simplest preconditioned eigensolver with a fixed step size. The original proof has been greatly simplified and shortened in doi:10.1137/080727567 by using a gradient flow integration approach. In the present work, we give an even more succinct proof, using novel ideas based on Karush–Kuhn–Tucker theory and nonlinear programming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, eds., Templates for the solution of algebraic eigenvalue problems: A practical guide, SIAM, Philadelphia, 2000.

    MATH  Google Scholar 

  2. F. Bottin, S. Leroux, A. Knyazev, G. Zerah, Large-scale ab initio calculations based on three levels of parallelization, Computational Materials Science, 42(2008), 2, pp. 329–336. doi:10.1016/j.commatsci.2007.07.019

  3. H. Bouwmeester, A. Dougherty, A. V. Knyazev, Nonsymmetric Preconditioning for Conjugate Gradient and Steepest Descent Methods, Procedia Computer Science, 51 (2015), pp. 276–285.doi:10.1016/j.procs.2015.05.241. A preliminary version available at http://math.ucdenver.edu/~aknyazev/research/papers/old/k.pdf

  4. S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, 30 (1998), 17, pp. 107–117. doi:10.1016/S0169-7552(98)00110-X

  5. E. G. D’yakonov, Optimization in Solving Elliptic Problems, CRC Press, Boca Raton, Florida, 1996. ISBN: 978-0849328725

  6. R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, Second Edition, 1987.

    MATH  Google Scholar 

  7. A. V. Knyazev, Computation of eigenvalues and eigenvectors for mesh problems: algorithms and error estimates, (In Russian), Dept. Num. Math., USSR Ac. Sci., Moscow, 1986. http://math.ucdenver.edu/~aknyazev/research/papers/old/k.pdf

  8. A. V. Knyazev, Convergence rate estimates for iterative methods for a mesh symmetric eigenvalue problem, Russian J. Numer. Anal. Math. Modelling, 2 (1987), pp. 371–396. doi:10.1515/rnam.1987.2.5.371

  9. A. V. Knyazev, Preconditioned eigensolvers—an oxymoron?, Electronic Transactions on Numerical Analysis, 7(1998), pp. 104–123. http://etna.mcs.kent.edu/vol.7.1998/pp104-123.dir/pp104-123.pdf

  10. A. V. Knyazev, Modern Preconditioned Eigensolvers for Spectral Image Segmentation and Graph Bisection, Workshop on Clustering Large Data Sets Third IEEE International Conference on Data Mining (ICDM 2003), 2003. http://math.ucdenver.edu/~aknyazev/research/conf/ICDM03

  11. A. V. Knyazev and K. Neymeyr, A geometric theory for preconditioned inverse iteration. III: A short and sharp convergence estimate for generalized eigenvalue problems, Linear Algebra Appl., 358 (2003), pp. 95–114. doi:10.1016/S0024-3795(01)00461-X

  12. A. V. Knyazev and K. Neymeyr, Efficient solution of symmetric eigenvalue problems using multigrid preconditioners in the locally optimal block conjugate gradient method, Electronic Transactions on Numerical Analysis, 15 (2003), pp. 38–55. http://etna.mcs.kent.edu/vol.15.2003/pp38-55.dir/pp38-55.pdf

  13. A. V. Knyazev and K. Neymeyr, Gradient flow approach to geometric convergence analysis of preconditioned eigensolvers, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 621–628. doi:10.1137/080727567

  14. D. Kressner, M. Steinlechner, and A. Uschmajew, Low-rank tensor methods with subspace correction for symmetric eigenvalue problems, SIAM J. Sci. Comput., 36(2014), 5, pp. A2346–A2368. http://sma.epfl.ch/~anchpcommon/publications/EVAMEN.pdf

  15. D. Kressner, M. M. Pandur, M. Shao, An indefinite variant of LOBPCG for definite matrix pencils, J Numerical Algorithms, 66(2014), 4, pp. 681–703. doi:10.1007/s11075-013-9754-3

  16. K. Neymeyr, A geometric convergence theory for the preconditioned steepest descent iteration, SIAM J. Numer. Anal., 50 (2012), pp. 3188–3207. doi:10.1137/11084488X

  17. K. Neymeyr, E. Ovtchinnikov, and M. Zhou, Convergence analysis of gradient iterations for the symmetric eigenvalue problem, SIAM J. Matrix Anal. Appl., 32 (2011), pp. 443–456. doi:10.1137/100784928

  18. J. Nocedal and S.J. Wright, Numerical Optimization, Springer, 2006.

    MATH  Google Scholar 

  19. E. E. Ovtchinnikov, Sharp convergence estimates for the preconditioned steepest descent method for Hermitian eigenvalue problems, SIAM J. Numer. Anal., 43(6):2668–2689, 2006. doi:10.1137/040620643

  20. D. B. Szyld and F. Xue, Preconditioned eigensolvers for large-scale nonlinear Hermitian eigenproblems with variational characterizations. I. Conjugate gradient methods, Research Report 14-08-26, Department of Mathematics, Temple University, August 2014. Revised April 2015. To appear in Mathematics of Computation. https://www.math.temple.edu/~szyld/reports/NLPCG.report.rev

  21. D. B. Szyld, E. Vecharynski and F. Xue, Preconditioned eigensolvers for large-scale nonlinear Hermitian eigenproblems with variational characterizations. II. Interior eigenvalues, Research Report 15-04-10, Department of Mathematics, Temple University, April 2015. To appear in SIAM Journal on Scientific Computing. arXiv:1504.02811

  22. E. Vecharynski, Y. Saad, and M. Sosonkina Graph partitioning using matrix values for preconditioning symmetric positive definite systems, SIAM J. Sci. Comput., 36(2014), 1, pp. A63–A87. doi:10.1137/120898760

  23. E. Vecharynski, C. Yang, and J. E. Pask, A projected preconditioned conjugate gradient algorithm for computing a large invariant subspace of a Hermitian matrix, Journal of Computational Physics, Vol. 290, pp. 73–89, 2015. doi:10.1016/j.jcp.2015.02.030

  24. S. Yamada, T. Imamura, T. Kano, and M. Machida, High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator, In Proceedings of the 2006 ACM/IEEE conference on Supercomputing (SC ’06). ACM, New York, NY, USA, article 47, 2006. doi:10.1145/1188455.1188504

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Zhou.

Additional information

Dedicated to the memory of Evgenii G. D’yakonov, Moscow, Russia, 1935–2006.

Communicated by Nicholas Higham. A preliminary version is posted at http://arxiv.org.

Appendix

Appendix

I. An alternative estimate for the left-hand side of (4.8), which is sharp with respect to all variables, can be derived as follows: With \(\delta :=\beta /\alpha \) and \(\varepsilon :=\mu _l/\mu _k\), a new representation of \(\gamma ^2\) is given by

$$\begin{aligned} \gamma ^2 =\frac{(\delta -\varepsilon )^2(1+\alpha ^2)}{(1-\varepsilon )^2(1+\alpha ^2\delta ^2)}. \end{aligned}$$

This results in a quadratic equation for \(\delta \) with the roots

$$\begin{aligned} \delta _{\pm }=\frac{\varepsilon (1+\alpha ^2)\pm \gamma (1-\varepsilon ) \sqrt{(1+\alpha ^2)(1+\alpha ^2\varepsilon ^2)-\alpha ^2\gamma ^2(1-\varepsilon )^2}}{(1+\alpha ^2)-\alpha ^2\gamma ^2(1-\varepsilon )^2}. \end{aligned}$$

Since \(\delta ^2=\dfrac{\beta ^2}{\alpha ^2}=\dfrac{\mu _k-\mu (y)}{\mu (y)-\mu _l}\; \dfrac{\mu (x)-\mu _l}{\mu _k-\mu (x)}\), a strictly sharp bound for the estimate in (4.8) is given by \(\max \{\delta _{+}^2,\delta _{-}^2\}=\delta _{+}^2\). We note that in the limit case \(\mu (x)\rightarrow \mu _k\) it holds that \(\alpha \rightarrow 0\), and \(\delta _{+}^2\) turns into \((\varepsilon +\gamma (1-\varepsilon ))^2\). This coincides with the known bound in (4.8).

II. The bound in (4.8) contains a convex combination of 1 and \(\mu _l/\mu _k\). Interestingly, this bound can also be derived by using a convex function as follows: Without loss of generality, we assume that x has a positive \(x_k\) coordinate. Then, \(\angle (x,x_k)\) is an acute angle. Since \(B>0, \angle (Bx,x)\) and \(\angle (Bx,x_k)\) are also acute angles. The equality (4.1) together with \(\gamma <1\) shows further \(\angle (Bx,y)<\angle (Bx,x)<\pi /2\). Since \(\angle (Bx,x_k)\) and \(\angle (Bx,y)\) are acute angles, the vectors \(x_k\) and y are located in a half-plane whose boundary line is orthogonal to Bx. A simple case differentiation shows that \(\angle (y,x_k)\) is either equal to \(|\angle (Bx,x_k)-\angle (Bx,y)|\) or equal to \(\angle (Bx,x_k)+\angle (Bx,y)\). Further, we use the equalities

$$\begin{aligned} \tan ^2\angle (y,x_k)=\frac{\mu _k-\mu (y)}{\mu (y)-\mu _l},\;\; \tan ^2\angle (x,x_k)=\frac{\mu _k-\mu (x)}{\mu (x)-\mu _l},\;\; \frac{\tan ^2\angle (Bx,x_k)}{\tan ^2\angle (x,x_k)}=\frac{\mu _l^2}{\mu _k^2}, \end{aligned}$$

which can be derived in a similar way to Sect. 3. The last equality proves \(\angle (Bx,x_k)<\angle (x,x_k)\), since the tangent is an increasing function for acute angles, and \(\mu _l<\mu _k\). This leads to \(\angle (x,x_k)=\angle (Bx,x_k)+\angle (Bx,x)\), since \(x, Bx, x_k\) are all in the same quadrant. In summary, it holds that

$$\begin{aligned} \angle (y,x_k)\le \angle (Bx,x_k)+\angle (Bx,y)<\angle (Bx,x_k)+\angle (Bx,x) =\angle (x,x_k)<\pi /2, \end{aligned}$$
(4.9)

i.e., \(\angle (y,x_k)\) is a further acute angle. Using these acute angles, we write (4.8) equivalently as

$$\begin{aligned} \tan \angle (y,x_k)\le \gamma \tan \angle (x,x_k) +(1-\gamma )\tan \angle (Bx,x_k). \end{aligned}$$
(4.10)

In order to prove (4.10), we use (4.1) again, together with \(\varphi :=\angle (Bx,x), \vartheta :=\angle (Bx,x_k)\) and the first inequality in (4.9). It holds that

$$\begin{aligned} \tan \angle (y,x_k)\le \tan [\vartheta +\arcsin \big (\gamma \sin (\varphi )\big )]=:f(\gamma ). \end{aligned}$$

Because

$$\begin{aligned} f'(\gamma )= \frac{\big (1+f(\gamma )^2\big )\sin (\varphi )}{\sqrt{1-\big (\gamma \sin (\varphi )\big )^2}}\ge 0 \quad \text{ for }\quad \gamma \in [0,1], \end{aligned}$$

\(f(\gamma )\) is a monotonically increasing function in [0, 1]. The numerator of \(f'(\gamma )\) is also a monotonically increasing function and its denominator is monotonically decreasing in \(\gamma \in [0,1]\). These two functions are positive so that \(f'(\gamma )\) is also a monotonically increasing function. Thus, \(f(\gamma )\) is a convex function in [0, 1], and

$$\begin{aligned} \tan \angle (y,x_k)\le f(\gamma )\le (1-\gamma )f(0)+\gamma f(1) =(1-\gamma )\tan (\vartheta )+\gamma \tan (\vartheta +\varphi ), \end{aligned}$$

which proves (4.10) and hence (4.8).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Argentati, M.E., Knyazev, A.V., Neymeyr, K. et al. Convergence Theory for Preconditioned Eigenvalue Solvers in a Nutshell. Found Comput Math 17, 713–727 (2017). https://doi.org/10.1007/s10208-015-9297-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10208-015-9297-1

Keywords

Mathematics Subject Classifications

Navigation