Skip to main content
Log in

Trace-Penalty Minimization for Large-Scale Eigenspace Computation

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

In a block algorithm for computing relatively high-dimensional eigenspaces of large sparse symmetric matrices, the Rayleigh-Ritz (RR) procedure often constitutes a major bottleneck. Although dense eigenvalue calculations for subproblems in RR steps can be parallelized to a certain level, their parallel scalability, which is limited by some inherent sequential steps, is lower than dense matrix-matrix multiplications. The primary motivation of this paper is to develop a methodology that reduces the use of the RR procedure in exchange for matrix-matrix multiplications. We propose an unconstrained trace-penalty minimization model and establish its equivalence to the eigenvalue problem. With a suitably chosen penalty parameter, this model possesses far fewer undesirable full-rank stationary points than the classic trace minimization model. More importantly, it enables us to deploy algorithms that makes heavy use of dense matrix-matrix multiplications. Although the proposed algorithm does not necessarily reduce the total number of arithmetic operations, it leverages highly optimized operations on modern high performance computers to achieve parallel scalability. Numerical results based on a preliminary implementation, parallelized using OpenMP, show that our approach is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Downloadable from http://code.google.com/p/blopex.

  2. Downloadable from http://www.cs.wm.edu/~andreas/software.

  3. Downloadable from http://www.cise.ufl.edu/research/sparse/matrices.

  4. More information at http://www.nersc.gov/users/computational-systems/hopper/.

  5. http://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m.

References

  1. Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammerling, S., Demmel, J., Bischof, C., Sorensen, D.: Lapack: a portable linear algebra library for high-performance computers, in Proceedings of the 1990 ACM/IEEE conference on Supercomputing, Supercomputing ’90, IEEE Computer Society Press, pp. 2–11 (1990)

  2. Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1997)

    Book  Google Scholar 

  4. Courant, R.: Variational methods for the solution of problems of equilibrium and vibrations. Bull. Am. Math. Soc. 49, 1–23 (1943)

    Article  MathSciNet  MATH  Google Scholar 

  5. Dai, Y.H.: On the nonmonotone line search. J. Optim. Theory Appl. 112, 315–330 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  7. Knyazev, A., Argentati, M., Lashuk, I., Ovtchinnikov, E.: Block locally optimal preconditioned eigenvalue xolvers (blopex) in hypre and petsc. SIAM J. Sci. Comput. 29, 2224–2239 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Knyazev, Andrew V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23, 517–541 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kronik, L., Makmal, A., Tiago, M., Alemany, M.M.G., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC - the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi. (b) 243, 1063–1079 (2006)

    Article  Google Scholar 

  10. Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide: Solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6 of software, environments, and tools, society for industrial and applied mathematics (SIAM), Philadelphia, PA, (1998)

  11. Nocedal, Jorge, Wright, Stephen J.: Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

    Google Scholar 

  12. Saad, Yousef, Chelikowsky, James R., Shontz, Suzanne M.: Numerical methods for electronic structure calculations of materials. SIAM Rev. 52, 3–54 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  13. Sameh, Ahmed H., Wisniewski, John A.: A trace minimization algorithm for the generalized eigenvalue problem. SIAM J. Numer. Anal. 19, 1243–1259 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  14. Stathopoulos, Andreas, McCombs, James R.: Nearly optimal preconditioned methods for hermitian eigenproblems under limited memory. Part II: seeking many eigenvalues. SIAM J. Sci. Comput. 29, 2162–2188 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Stathopoulos, A., McCombs, J.R.: PRIMME: preconditioned iterative multimethod eigensolver-methods and software description. ACM Trans. Math. Softw. 37, 21:1–21:30 (2010)

    Article  Google Scholar 

  16. Sun, Wenyu, Yuan, Yaxiang: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)

    Google Scholar 

  17. Teter, M.P., Payne, M.C., Allan, D.C.: Solution of schrödinger’s equation for large systems. Phys. Rev. B 40, 12255–12263 (1989)

    Article  Google Scholar 

  18. Yang, Chao, Meza, Juan C., Lee, Byounghak, Wang, Lin-Wang: KSSOLV–a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Softw. 36, 1–35 (2009)

    Article  MathSciNet  Google Scholar 

  19. Zhang, Hongchao, Hager, William W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14, 1043–1056 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  20. Zhou, Y.: A block Chebyshev-Davidson method with inner-outer restart for large eigenvalue problems. J. Comput. Phys. 229, 9188–9200 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Zhou, Y., Saad, Y.: A Chebyshev-Davidson algorithm for large symmetric eigenproblems. SIAM J. Matrix Anal. Appl. 29, 954–971 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  22. Zhou, Y., Saad, Y.: Block krylovschur method for large symmetric eigenvalue problems. Numer. Algorithms 47, 341–359 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The computational results were obtained at the National Energy Research Scientific Computing Center (NERSC), which is supported by the Director, Office of Advanced Scientific Computing Research of the U.S. Department of Energy under contract number DE-AC02-05CH11232. Z. Wen would like to thank Prof. Michael Ulbrich for hosting his visit at Technische Universität München. X. Liu would like to thank Prof. Yuhong Dai for discussing nonlinear programming techniques for eigenvalue computation. C. Yang would like to thank Dr. Eugene Vencharynski for helping test EigPen, especially the preconditioned version. The authors are grateful to Prof. Chi-Wang Shu, the associate editor and the anonymous referees for their detailed and valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaiwen Wen.

Additional information

Z. Wen: Research supported in part by NSFC Grants 11322109, 91330202 and 11421101, and by the National Basic Research Project under the grant 2015CB856000.

C. Yang: Support for this work was provided through the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (and Basic Energy Sciences) under award number DE-SC0008666.

X. Liu: Research supported in part by NSFC Grants 11331012, 11471325 and 11461161005, China 863 Program 2013AA122902 and the National Center for Mathematics and Interdisciplinary Sciences, CAS.

Y. Zhang: Research supported in part by NSF Grant DMS-0811188, ONR Grant N00014-08-1-1101, and NSF Grant DMS-1115950.

Appendix: Proofs of Technical Results

Appendix: Proofs of Technical Results

1.1 Proof of Theorem 2.1

It can be easily seen that condition (2.3) is necessary for the existence of a rank-k stationary point. On the other hand, suppose that \(\mu \) satisfies (2.3). It is suffice to consider the representation \(X=UW\), where U consists of any k eigenvectors of A and \(W \in \mathbb {R}^{k\times k}\). Hence, we obtain

$$\begin{aligned} 2f_\mu (X) = \mathrm {tr}(DWW^{\mathrm {T}}) + \frac{\mu }{2}\Vert W^{\mathrm {T}}W-I\Vert ^2_F, \end{aligned}$$

where \(D = \mathrm{Diag}(d) \in \mathbb {R}^{k\times k}\) is a diagonal matrix with k eigenvalues of A on the diagonal corresponding to eigenvectors in U. A short calculation shows that

$$\begin{aligned} 2f_\mu (X)= & {} \frac{\mu }{2}\Vert WW^{\mathrm {T}}+(D/\mu -I)\Vert ^2_F + \mathrm {tr}(D) - \frac{1}{2\mu }\mathrm {tr}(D^2)\\\ge & {} \frac{\mu }{2}\Vert (D/\mu -I)_+\Vert _F^2 + \mathrm {tr}(D) - \frac{1}{2\mu }\mathrm {tr}(D^2)\\= & {} \sum _{i=1}^k\left( \frac{\mu }{2}\left( \frac{d_i}{\mu }-1\right) _+^2 + d_i - \frac{d_i^2}{2\mu }\right) \equiv \sum _{i=1}^k\theta (d_i), \end{aligned}$$

where \((t)_+ = \max (0,t)\) and

$$\begin{aligned} \theta (t) = \frac{\mu }{2}\left( \frac{t}{\mu }-1\right) _+^2 + t - \frac{t^2}{2\mu } = \left\{ \begin{array}{cc} t - t^2/(2\mu ), &{} t < \mu , \\ \mu /2, &{} t \ge \mu . \end{array}\right. \end{aligned}$$

Note that \(\theta (t)\) is monotonically nondecreasing since \(\theta '(t) = 1 - t/\mu > 0\) in \((-\infty ,\mu )\). Substituting the formulation of \(\hat{X}\) defined in (2.4) into \(f_\mu (\hat{X})\), we obtain

$$\begin{aligned} 2f_\mu (\hat{X}) = \mathrm {tr}(\Lambda _k) - \frac{1}{2\mu }\mathrm {tr}(\Lambda _k^2) = \sum _{i=1}^k\theta ({\uplambda }_i) \le 2f_\mu (X), \end{aligned}$$

which verifies that \(\hat{X}\) is a global minimizer. This completes the proof. \(\square \)

1.2 Proof of Theorem 2.2

We only prove the first statement by showing that for \(\mu \in (\max (0,{\uplambda }_k),{\uplambda }_n)\) any stationary point other than the global minimizers can only be saddle points. Without loss of generality, consider stationary points in the form of

$$\begin{aligned} \hat{X}= U[P(I-D/\mu )]^{1/2} = U[(I-D/\mu )P]^{1/2}, \end{aligned}$$
(7.1)

where \(AU=UD, U^{\mathrm {T}}U=I, D\) is diagonal, and \(P\in \mathbb {R}^{k\times k}\) is a diagonal, projection matrix with diagonal entries

$$\begin{aligned} P_{ii} = \left\{ \begin{array}{ll} 0, &{}\quad \hbox {if }\mu \le \hbox {D}_{ii}, \\ 0 \hbox { or } 1, &{}\quad \hbox {otherwise}. \end{array}\right. \end{aligned}$$
(7.2)

Substituting (7.1) into the Hessian formula (2.7), we obtain

$$\begin{aligned} \nabla ^2f_\mu (\hat{X})(S) = AS - S(\mu (I-P)+DP) + \mu \hat{X}(S^{\mathrm {T}}\hat{X}+\hat{X}^{\mathrm {T}}S). \end{aligned}$$
(7.3)

We next show that there exist different matrices \(S \in \mathbb {R}^{n \times k}\) at which \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S))\) takes opposite signs, unless the stationary point \(\hat{X}\) is constructed from eigenvectors associated with a set of k smallest eigenvalues which corresponds to the global minimum.

First assume that \(\hat{X}\) has full rank. Then \(\mu I \succ D\) and \(P=I\) in (7.1). Letting \(P=I\) in (7.3) yields

$$\begin{aligned} \nabla ^2f_\mu (\hat{X})(S) = AS - SD + \mu \hat{X}(S^{\mathrm {T}}\hat{X}+\hat{X}^{\mathrm {T}}S). \end{aligned}$$

For \(S=U\), we have \(S^{\mathrm {T}}\hat{X}=\hat{X}^{\mathrm {T}}S=(I-D/\mu )^{1/2}\) and

$$\begin{aligned} \mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)) = 0 + 2\,\mathrm {tr}(\mu I - D) > 0. \end{aligned}$$

On the other hand, if \(\hat{X}\) is not a global minimizer, without loss of generality we can assume that U contains \(q_j\) but not \(q_i\) where \({\uplambda }_i < {\uplambda }_j\). Let S contain all zero columns except a single nonzero column that is \(q_i\) at the position so that the only nonzero column of SD is \(q_i{\uplambda }_j\). For such an S, we have \(S^{\mathrm {T}}\hat{X}=0\) and

$$\begin{aligned} \mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)) = q_i^{\mathrm {T}}(Aq_i - q_i{\uplambda }_j) + \mu \, \mathrm {tr}(S^{\mathrm {T}}\hat{X}(S^{\mathrm {T}}\hat{X}+\hat{X}^{\mathrm {T}}S)) = ({\uplambda }_i-{\uplambda }_j) < 0. \end{aligned}$$

Hence, all full-rank stationary points are saddle points except the global minimizers.

We now consider the rank-deficient case, namely, there exists at least one zero entry in the diagonal of P, say \(P_{ii} = 0\) for some \(i \in [1,k]\). Let \(\bar{U}\) be the remaining matrix after deleting the i-th column from U. Since \(\hbox {rank}(\bar{U})=k-1\), there must exist at least one column, denoted by \(q_j\), of \(Q_k\) that is not contained in \(\bar{U}\). Then it holds \(q_j^{\mathrm {T}}\bar{U}=0\) and \(q_j^{\mathrm {T}}A q_j \le {\uplambda }_k\). Let S contain all zero columns except one nonzero column that is \(q_j\) at the i-th position so that both \(SP=0\) and \(S^{\mathrm {T}}\hat{X}=0\). Consequently, in view of (7.3) we have

$$\begin{aligned} \mathrm {tr}\left( S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)\right) = q_j^{\mathrm {T}}A q_j - \mu + \mu \, \mathrm {tr}\left( S^{\mathrm {T}}\hat{X}(S^{\mathrm {T}}\hat{X}+\hat{X}^{\mathrm {T}}S)\right) \le ({\uplambda }_k - \mu ) + 0 < 0. \end{aligned}$$

On the other side, let S contain all zero columns except that the i-th column is \(q_n\). For any integer \(l\in [1,k]\), if the column \(U_l=q_n\), then it can shown that \(P_{ll}=0\) and \(q_n^{\mathrm {T}}\hat{X}_l = 0\). Otherwise, the column \(U_l\ne q_n\), thus \(q_n^{\mathrm {T}}U_l=0\) which implies \(q_n^{\mathrm {T}}\hat{X}=0\). By our assumption, \(\mu < q_n^{\mathrm {T}}A q_n = {\uplambda }_n\). Hence, \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)) = {\uplambda }_n - \mu >0\). This completes the proof. \(\square \)

1.3 Proof of Lemma 2.3

Consider any \(S \in Q_k^{\bot }\). In view of (2.7) and (2.4),

$$\begin{aligned} S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S) = S^{\mathrm {T}}AS + \mu S^{\mathrm {T}}S(\hat{X}^{\mathrm {T}}\hat{X}-I) = S^{\mathrm {T}}AS - S^{\mathrm {T}}S V\Lambda _{k}V^{\mathrm {T}}, \end{aligned}$$
(7.4)

where \(V \in \mathbb {R}^{k \times k}\) is orthogonal. Since the columns of S are contained in the eigenspace associated with \(\{{\uplambda }_{k+1}, \ldots , {\uplambda }_n\}\) and \(\mathrm {tr}(S^{\mathrm {T}}S)=1\), we obtain

$$\begin{aligned} {\uplambda }_{k+1} \le \mathrm {tr}(S^{\mathrm {T}}AS) \le {\uplambda }_n. \end{aligned}$$
(7.5)

On the other hand, we note that both \(\mathrm {tr}(S^{\mathrm {T}}S(V\Lambda _{k}V^{\mathrm {T}}-{\uplambda }_1 I))\) and \(\mathrm {tr}(S^{\mathrm {T}}S({\uplambda }_k I - V\Lambda _{k}V^{\mathrm {T}}))\) are nonnegative, since both are traces for products of symmetric positive semidefinite matrices. These two inequalities imply that

$$\begin{aligned} {\uplambda }_1 \le \mathrm {tr}(S^{\mathrm {T}}S V\Lambda _{k}V^{\mathrm {T}}) \le {\uplambda }_k, \end{aligned}$$
(7.6)

given the fact that \(\mathrm {tr}(S^{\mathrm {T}}S)=1\). From (7.4), (7.5) and (7.6) we deduce

$$\begin{aligned} {\uplambda }_{k+1} - {\uplambda }_{k} \le \mathrm {tr}\left( S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)\right) \le {\uplambda }_{n} - {\uplambda }_{1}, \end{aligned}$$
(7.7)

which proves that the left-hand side of (2.8) is no greater than the right hand side of (2.8). Furthermore, the lower and upper bounds in (7.7) are attained at the \(n\times k\) rank-one matrices \(S = [0\, \ldots 0\, q_{k+1}]\) and \(S = [q_{n}\, 0 \ldots 0]\), respectively. Therefore, the equality in (2.8) must hold, which completes the proof. \(\square \)

1.4 Proof of Proposition 3.1

Suppose that \(X^{j+1}\) is rank deficient. Then there exists a nonzero vector u such that \(X^{j+1}u=0\). In view of (3.1), we have

$$\begin{aligned} X^{j} u - \alpha ^{j} \nabla f_\mu (X^{j})u =0. \end{aligned}$$
(7.8)

Hence, (3.2) holds under \({\uplambda }= {1}/{\alpha ^{ j}}\) after multiplying both sides of (7.8) by \((X^{ j})^{\mathrm {T}}/{\alpha ^{ j}}\). Due to the full rank of \(X^{ j}, (X^{ j})^{\mathrm {T}}(X^{ j})\) is positive definite. The expression of the gradient in (2.5) implies that \((X^{ j})^{\mathrm {T}}\nabla f_\mu (X^{ j})\) is symmetric. Therefore, (3.2) is a generalized symmetric eigenvalue problem. The second part of the proposition follows directly from (7.8). \(\square \)

1.5 Proof of Lemma 3.2

Since \(U \in \mathbb {R}^{n \times d}\) is a basis of \(\mathcal {S}\), the solution of (3.7) can be expressed as \(X = UW\) for some \(W \in \mathbb {R}^{d \times k}\). Substituting \(X=UW\) into (3.7) and noting that \(U^{\mathrm {T}}AU=\Sigma \) and \(U^{\mathrm {T}}U=I\), we reduce (3.7) to

$$\begin{aligned} \min _{W \in \mathbb {R}^{d \times k}} f_\mu (UW)= \frac{1}{2}\mathrm {tr}(W ^{\mathrm {T}}\Sigma W)+\frac{\mu }{4} \Vert W^{\mathrm {T}}W-I\Vert _F^2. \end{aligned}$$
(7.9)

Using the fact that \(\Sigma \) is a diagonal matrix, it can be verified (see Theorem 2.1) that \(W = \begin{pmatrix} D&0 \end{pmatrix}^{\mathrm {T}}\), with the diagonal matrix D defined as in (3.10), is indeed a solution of (7.9). Therefore, \(Y=UW = U_kD\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wen, Z., Yang, C., Liu, X. et al. Trace-Penalty Minimization for Large-Scale Eigenspace Computation. J Sci Comput 66, 1175–1203 (2016). https://doi.org/10.1007/s10915-015-0061-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-015-0061-0

Keywords

Mathematics Subject Classification

Navigation