## Abstract

In a block algorithm for computing relatively high-dimensional eigenspaces of large sparse symmetric matrices, the Rayleigh-Ritz (RR) procedure often constitutes a major bottleneck. Although dense eigenvalue calculations for subproblems in RR steps can be parallelized to a certain level, their parallel scalability, which is limited by some inherent sequential steps, is lower than dense matrix-matrix multiplications. The primary motivation of this paper is to develop a methodology that reduces the use of the RR procedure in exchange for matrix-matrix multiplications. We propose an unconstrained trace-penalty minimization model and establish its equivalence to the eigenvalue problem. With a suitably chosen penalty parameter, this model possesses far fewer undesirable full-rank stationary points than the classic trace minimization model. More importantly, it enables us to deploy algorithms that makes heavy use of dense matrix-matrix multiplications. Although the proposed algorithm does not necessarily reduce the total number of arithmetic operations, it leverages highly optimized operations on modern high performance computers to achieve parallel scalability. Numerical results based on a preliminary implementation, parallelized using OpenMP, show that our approach is promising.

### Similar content being viewed by others

## Notes

Downloadable from http://code.google.com/p/blopex.

Downloadable from http://www.cs.wm.edu/~andreas/software.

Downloadable from http://www.cise.ufl.edu/research/sparse/matrices.

More information at http://www.nersc.gov/users/computational-systems/hopper/.

## References

Anderson, E., Bai, Z., Dongarra, J., Greenbaum, A., McKenney, A., Du Croz, J., Hammerling, S., Demmel, J., Bischof, C., Sorensen, D.: Lapack: a portable linear algebra library for high-performance computers, in Proceedings of the 1990 ACM/IEEE conference on Supercomputing, Supercomputing ’90, IEEE Computer Society Press, pp. 2–11 (1990)

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal.

**8**, 141–148 (1988)Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1997)

Courant, R.: Variational methods for the solution of problems of equilibrium and vibrations. Bull. Am. Math. Soc.

**49**, 1–23 (1943)Dai, Y.H.: On the nonmonotone line search. J. Optim. Theory Appl.

**112**, 315–330 (2002)Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal.

**23**, 707–716 (1986)Knyazev, A., Argentati, M., Lashuk, I., Ovtchinnikov, E.: Block locally optimal preconditioned eigenvalue xolvers (blopex) in hypre and petsc. SIAM J. Sci. Comput.

**29**, 2224–2239 (2007)Knyazev, Andrew V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput.

**23**, 517–541 (2001)Kronik, L., Makmal, A., Tiago, M., Alemany, M.M.G., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC - the pseudopotential algorithm for real-space electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi. (b)

**243**, 1063–1079 (2006)Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide: Solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, vol. 6 of software, environments, and tools, society for industrial and applied mathematics (SIAM), Philadelphia, PA, (1998)

Nocedal, Jorge, Wright, Stephen J.: Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2nd edn. Springer, New York (2006)

Saad, Yousef, Chelikowsky, James R., Shontz, Suzanne M.: Numerical methods for electronic structure calculations of materials. SIAM Rev.

**52**, 3–54 (2010)Sameh, Ahmed H., Wisniewski, John A.: A trace minimization algorithm for the generalized eigenvalue problem. SIAM J. Numer. Anal.

**19**, 1243–1259 (1982)Stathopoulos, Andreas, McCombs, James R.: Nearly optimal preconditioned methods for hermitian eigenproblems under limited memory. Part II: seeking many eigenvalues. SIAM J. Sci. Comput.

**29**, 2162–2188 (2007)Stathopoulos, A., McCombs, J.R.: PRIMME: preconditioned iterative multimethod eigensolver-methods and software description. ACM Trans. Math. Softw.

**37**, 21:1–21:30 (2010)Sun, Wenyu, Yuan, Yaxiang: Optimization Theory and Methods: Nonlinear Programming. Springer, New York (2006)

Teter, M.P., Payne, M.C., Allan, D.C.: Solution of schrödinger’s equation for large systems. Phys. Rev. B

**40**, 12255–12263 (1989)Yang, Chao, Meza, Juan C., Lee, Byounghak, Wang, Lin-Wang: KSSOLV–a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Softw.

**36**, 1–35 (2009)Zhang, Hongchao, Hager, William W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim.

**14**, 1043–1056 (2004)Zhou, Y.: A block Chebyshev-Davidson method with inner-outer restart for large eigenvalue problems. J. Comput. Phys.

**229**, 9188–9200 (2010)Zhou, Y., Saad, Y.: A Chebyshev-Davidson algorithm for large symmetric eigenproblems. SIAM J. Matrix Anal. Appl.

**29**, 954–971 (2007)Zhou, Y., Saad, Y.: Block krylovschur method for large symmetric eigenvalue problems. Numer. Algorithms

**47**, 341–359 (2008)

## Acknowledgments

The computational results were obtained at the National Energy Research Scientific Computing Center (NERSC), which is supported by the Director, Office of Advanced Scientific Computing Research of the U.S. Department of Energy under contract number DE-AC02-05CH11232. Z. Wen would like to thank Prof. Michael Ulbrich for hosting his visit at Technische Universität München. X. Liu would like to thank Prof. Yuhong Dai for discussing nonlinear programming techniques for eigenvalue computation. C. Yang would like to thank Dr. Eugene Vencharynski for helping test EigPen, especially the preconditioned version. The authors are grateful to Prof. Chi-Wang Shu, the associate editor and the anonymous referees for their detailed and valuable comments and suggestions.

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

Z. Wen: Research supported in part by NSFC Grants 11322109, 91330202 and 11421101, and by the National Basic Research Project under the grant 2015CB856000.

C. Yang: Support for this work was provided through the Scientific Discovery through Advanced Computing (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (and Basic Energy Sciences) under award number DE-SC0008666.

X. Liu: Research supported in part by NSFC Grants 11331012, 11471325 and 11461161005, China 863 Program 2013AA122902 and the National Center for Mathematics and Interdisciplinary Sciences, CAS.

Y. Zhang: Research supported in part by NSF Grant DMS-0811188, ONR Grant N00014-08-1-1101, and NSF Grant DMS-1115950.

## Appendix: Proofs of Technical Results

### Appendix: Proofs of Technical Results

### 1.1 Proof of Theorem 2.1

It can be easily seen that condition (2.3) is necessary for the existence of a rank-*k* stationary point. On the other hand, suppose that \(\mu \) satisfies (2.3). It is suffice to consider the representation \(X=UW\), where *U* consists of any *k* eigenvectors of *A* and \(W \in \mathbb {R}^{k\times k}\). Hence, we obtain

where \(D = \mathrm{Diag}(d) \in \mathbb {R}^{k\times k}\) is a diagonal matrix with *k* eigenvalues of *A* on the diagonal corresponding to eigenvectors in *U*. A short calculation shows that

where \((t)_+ = \max (0,t)\) and

Note that \(\theta (t)\) is monotonically nondecreasing since \(\theta '(t) = 1 - t/\mu > 0\) in \((-\infty ,\mu )\). Substituting the formulation of \(\hat{X}\) defined in (2.4) into \(f_\mu (\hat{X})\), we obtain

which verifies that \(\hat{X}\) is a global minimizer. This completes the proof. \(\square \)

### 1.2 Proof of Theorem 2.2

We only prove the first statement by showing that for \(\mu \in (\max (0,{\uplambda }_k),{\uplambda }_n)\) any stationary point other than the global minimizers can only be saddle points. Without loss of generality, consider stationary points in the form of

where \(AU=UD, U^{\mathrm {T}}U=I, D\) is diagonal, and \(P\in \mathbb {R}^{k\times k}\) is a diagonal, projection matrix with diagonal entries

Substituting (7.1) into the Hessian formula (2.7), we obtain

We next show that there exist different matrices \(S \in \mathbb {R}^{n \times k}\) at which \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S))\) takes opposite signs, unless the stationary point \(\hat{X}\) is constructed from eigenvectors associated with a set of *k* smallest eigenvalues which corresponds to the global minimum.

First assume that \(\hat{X}\) has full rank. Then \(\mu I \succ D\) and \(P=I\) in (7.1). Letting \(P=I\) in (7.3) yields

For \(S=U\), we have \(S^{\mathrm {T}}\hat{X}=\hat{X}^{\mathrm {T}}S=(I-D/\mu )^{1/2}\) and

On the other hand, if \(\hat{X}\) is not a global minimizer, without loss of generality we can assume that *U* contains \(q_j\) but not \(q_i\) where \({\uplambda }_i < {\uplambda }_j\). Let *S* contain all zero columns except a single nonzero column that is \(q_i\) at the position so that the only nonzero column of *SD* is \(q_i{\uplambda }_j\). For such an *S*, we have \(S^{\mathrm {T}}\hat{X}=0\) and

Hence, all full-rank stationary points are saddle points except the global minimizers.

We now consider the rank-deficient case, namely, there exists at least one zero entry in the diagonal of *P*, say \(P_{ii} = 0\) for some \(i \in [1,k]\). Let \(\bar{U}\) be the remaining matrix after deleting the *i*-th column from *U*. Since \(\hbox {rank}(\bar{U})=k-1\), there must exist at least one column, denoted by \(q_j\), of \(Q_k\) that is not contained in \(\bar{U}\). Then it holds \(q_j^{\mathrm {T}}\bar{U}=0\) and \(q_j^{\mathrm {T}}A q_j \le {\uplambda }_k\). Let *S* contain all zero columns except one nonzero column that is \(q_j\) at the *i*-th position so that both \(SP=0\) and \(S^{\mathrm {T}}\hat{X}=0\). Consequently, in view of (7.3) we have

On the other side, let *S* contain all zero columns except that the *i*-th column is \(q_n\). For any integer \(l\in [1,k]\), if the column \(U_l=q_n\), then it can shown that \(P_{ll}=0\) and \(q_n^{\mathrm {T}}\hat{X}_l = 0\). Otherwise, the column \(U_l\ne q_n\), thus \(q_n^{\mathrm {T}}U_l=0\) which implies \(q_n^{\mathrm {T}}\hat{X}=0\). By our assumption, \(\mu < q_n^{\mathrm {T}}A q_n = {\uplambda }_n\). Hence, \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)) = {\uplambda }_n - \mu >0\). This completes the proof. \(\square \)

### 1.3 Proof of Lemma 2.3

Consider any \(S \in Q_k^{\bot }\). In view of (2.7) and (2.4),

where \(V \in \mathbb {R}^{k \times k}\) is orthogonal. Since the columns of *S* are contained in the eigenspace associated with \(\{{\uplambda }_{k+1}, \ldots , {\uplambda }_n\}\) and \(\mathrm {tr}(S^{\mathrm {T}}S)=1\), we obtain

On the other hand, we note that both \(\mathrm {tr}(S^{\mathrm {T}}S(V\Lambda _{k}V^{\mathrm {T}}-{\uplambda }_1 I))\) and \(\mathrm {tr}(S^{\mathrm {T}}S({\uplambda }_k I - V\Lambda _{k}V^{\mathrm {T}}))\) are nonnegative, since both are traces for products of symmetric positive semidefinite matrices. These two inequalities imply that

given the fact that \(\mathrm {tr}(S^{\mathrm {T}}S)=1\). From (7.4), (7.5) and (7.6) we deduce

which proves that the left-hand side of (2.8) is no greater than the right hand side of (2.8). Furthermore, the lower and upper bounds in (7.7) are attained at the \(n\times k\) rank-one matrices \(S = [0\, \ldots 0\, q_{k+1}]\) and \(S = [q_{n}\, 0 \ldots 0]\), respectively. Therefore, the equality in (2.8) must hold, which completes the proof. \(\square \)

### 1.4 Proof of Proposition 3.1

Suppose that \(X^{j+1}\) is rank deficient. Then there exists a nonzero vector *u* such that \(X^{j+1}u=0\). In view of (3.1), we have

Hence, (3.2) holds under \({\uplambda }= {1}/{\alpha ^{ j}}\) after multiplying both sides of (7.8) by \((X^{ j})^{\mathrm {T}}/{\alpha ^{ j}}\). Due to the full rank of \(X^{ j}, (X^{ j})^{\mathrm {T}}(X^{ j})\) is positive definite. The expression of the gradient in (2.5) implies that \((X^{ j})^{\mathrm {T}}\nabla f_\mu (X^{ j})\) is symmetric. Therefore, (3.2) is a generalized symmetric eigenvalue problem. The second part of the proposition follows directly from (7.8). \(\square \)

### 1.5 Proof of Lemma 3.2

Since \(U \in \mathbb {R}^{n \times d}\) is a basis of \(\mathcal {S}\), the solution of (3.7) can be expressed as \(X = UW\) for some \(W \in \mathbb {R}^{d \times k}\). Substituting \(X=UW\) into (3.7) and noting that \(U^{\mathrm {T}}AU=\Sigma \) and \(U^{\mathrm {T}}U=I\), we reduce (3.7) to

Using the fact that \(\Sigma \) is a diagonal matrix, it can be verified (see Theorem 2.1) that \(W = \begin{pmatrix} D&0 \end{pmatrix}^{\mathrm {T}}\), with the diagonal matrix *D* defined as in (3.10), is indeed a solution of (7.9). Therefore, \(Y=UW = U_kD\). \(\square \)

## Rights and permissions

## About this article

### Cite this article

Wen, Z., Yang, C., Liu, X. *et al.* Trace-Penalty Minimization for Large-Scale Eigenspace Computation.
*J Sci Comput* **66**, 1175–1203 (2016). https://doi.org/10.1007/s10915-015-0061-0

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s10915-015-0061-0