In a block algorithm for computing relatively high-dimensional eigenspaces of large sparse symmetric matrices, the Rayleigh-Ritz (RR) procedure often constitutes a major bottleneck. Although dense eigenvalue calculations for subproblems in RR steps can be parallelized to a certain level, their parallel scalability, which is limited by some inherent sequential steps, is lower than dense matrix-matrix multiplications. The primary motivation of this paper is to develop a methodology that reduces the use of the RR procedure in exchange for matrix-matrix multiplications. We propose an unconstrained trace-penalty minimization model and establish its equivalence to the eigenvalue problem. With a suitably chosen penalty parameter, this model possesses far fewer undesirable full-rank stationary points than the classic trace minimization model. More importantly, it enables us to deploy algorithms that makes heavy use of dense matrix-matrix multiplications. Although the proposed algorithm does not necessarily reduce the total number of arithmetic operations, it leverages highly optimized operations on modern high performance computers to achieve parallel scalability. Numerical results based on a preliminary implementation, parallelized using OpenMP, show that our approach is promising.

## Acknowledgments

The computational results were obtained at the National Energy Research Scientific Computing Center (NERSC), which is supported by the Director, Office of Advanced Scientific Computing Research of the U.S. Department of Energy under contract number DE-AC02-05CH11232. Z. Wen would like to thank Prof. Michael Ulbrich for hosting his visit at Technische Universität München. X. Liu would like to thank Prof. Yuhong Dai for discussing nonlinear programming techniques for eigenvalue computation. C. Yang would like to thank Dr. Eugene Vencharynski for helping test EigPen, especially the preconditioned version. The authors are grateful to Prof. Chi-Wang Shu, the associate editor and the anonymous referees for their detailed and valuable comments and suggestions.

## Appendix: Proofs of Technical Results

### Appendix: Proofs of Technical Results

### 1.1 Proof of Theorem 2.1

It can be easily seen that condition (2.3) is necessary for the existence of a rank-*k* stationary point. On the other hand, suppose that \(\mu \) satisfies (2.3). It is suffice to consider the representation \(X=UW\), where *U* consists of any *k* eigenvectors of *A* and \(W \in \mathbb {R}^{k\times k}\). Hence, we obtain

where \(D = \mathrm{Diag}(d) \in \mathbb {R}^{k\times k}\) is a diagonal matrix with *k* eigenvalues of *A* on the diagonal corresponding to eigenvectors in *U*. A short calculation shows that

where \((t)_+ = \max (0,t)\) and

Note that \(\theta (t)\) is monotonically nondecreasing since \(\theta '(t) = 1 - t/\mu > 0\) in \((-\infty ,\mu )\). Substituting the formulation of \(\hat{X}\) defined in (2.4) into \(f_\mu (\hat{X})\), we obtain

which verifies that \(\hat{X}\) is a global minimizer. This completes the proof. \(\square \)

### 1.2 Proof of Theorem 2.2

We only prove the first statement by showing that for \(\mu \in (\max (0,{\uplambda }_k),{\uplambda }_n)\) any stationary point other than the global minimizers can only be saddle points. Without loss of generality, consider stationary points in the form of

where \(AU=UD, U^{\mathrm {T}}U=I, D\) is diagonal, and \(P\in \mathbb {R}^{k\times k}\) is a diagonal, projection matrix with diagonal entries

Substituting (7.1) into the Hessian formula (2.7), we obtain

We next show that there exist different matrices \(S \in \mathbb {R}^{n \times k}\) at which \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S))\) takes opposite signs, unless the stationary point \(\hat{X}\) is constructed from eigenvectors associated with a set of *k* smallest eigenvalues which corresponds to the global minimum.

First assume that \(\hat{X}\) has full rank. Then \(\mu I \succ D\) and \(P=I\) in (7.1). Letting \(P=I\) in (7.3) yields

For \(S=U\), we have \(S^{\mathrm {T}}\hat{X}=\hat{X}^{\mathrm {T}}S=(I-D/\mu )^{1/2}\) and

On the other hand, if \(\hat{X}\) is not a global minimizer, without loss of generality we can assume that *U* contains \(q_j\) but not \(q_i\) where \({\uplambda }_i < {\uplambda }_j\). Let *S* contain all zero columns except a single nonzero column that is \(q_i\) at the position so that the only nonzero column of *SD* is \(q_i{\uplambda }_j\). For such an *S*, we have \(S^{\mathrm {T}}\hat{X}=0\) and

Hence, all full-rank stationary points are saddle points except the global minimizers.

We now consider the rank-deficient case, namely, there exists at least one zero entry in the diagonal of *P*, say \(P_{ii} = 0\) for some \(i \in [1,k]\). Let \(\bar{U}\) be the remaining matrix after deleting the *i*-th column from *U*. Since \(\hbox {rank}(\bar{U})=k-1\), there must exist at least one column, denoted by \(q_j\), of \(Q_k\) that is not contained in \(\bar{U}\). Then it holds \(q_j^{\mathrm {T}}\bar{U}=0\) and \(q_j^{\mathrm {T}}A q_j \le {\uplambda }_k\). Let *S* contain all zero columns except one nonzero column that is \(q_j\) at the *i*-th position so that both \(SP=0\) and \(S^{\mathrm {T}}\hat{X}=0\). Consequently, in view of (7.3) we have

On the other side, let *S* contain all zero columns except that the *i*-th column is \(q_n\). For any integer \(l\in [1,k]\), if the column \(U_l=q_n\), then it can shown that \(P_{ll}=0\) and \(q_n^{\mathrm {T}}\hat{X}_l = 0\). Otherwise, the column \(U_l\ne q_n\), thus \(q_n^{\mathrm {T}}U_l=0\) which implies \(q_n^{\mathrm {T}}\hat{X}=0\). By our assumption, \(\mu < q_n^{\mathrm {T}}A q_n = {\uplambda }_n\). Hence, \(\mathrm {tr}(S^{\mathrm {T}}\nabla ^2f_\mu (\hat{X})(S)) = {\uplambda }_n - \mu >0\). This completes the proof. \(\square \)

### 1.3 Proof of Lemma 2.3

Consider any \(S \in Q_k^{\bot }\). In view of (2.7) and (2.4),

where \(V \in \mathbb {R}^{k \times k}\) is orthogonal. Since the columns of *S* are contained in the eigenspace associated with \(\{{\uplambda }_{k+1}, \ldots , {\uplambda }_n\}\) and \(\mathrm {tr}(S^{\mathrm {T}}S)=1\), we obtain

On the other hand, we note that both \(\mathrm {tr}(S^{\mathrm {T}}S(V\Lambda _{k}V^{\mathrm {T}}-{\uplambda }_1 I))\) and \(\mathrm {tr}(S^{\mathrm {T}}S({\uplambda }_k I - V\Lambda _{k}V^{\mathrm {T}}))\) are nonnegative, since both are traces for products of symmetric positive semidefinite matrices. These two inequalities imply that

given the fact that \(\mathrm {tr}(S^{\mathrm {T}}S)=1\). From (7.4), (7.5) and (7.6) we deduce

which proves that the left-hand side of (2.8) is no greater than the right hand side of (2.8). Furthermore, the lower and upper bounds in (7.7) are attained at the \(n\times k\) rank-one matrices \(S = [0\, \ldots 0\, q_{k+1}]\) and \(S = [q_{n}\, 0 \ldots 0]\), respectively. Therefore, the equality in (2.8) must hold, which completes the proof. \(\square \)

### 1.4 Proof of Proposition 3.1

Suppose that \(X^{j+1}\) is rank deficient. Then there exists a nonzero vector *u* such that \(X^{j+1}u=0\). In view of (3.1), we have

Hence, (3.2) holds under \({\uplambda }= {1}/{\alpha ^{ j}}\) after multiplying both sides of (7.8) by \((X^{ j})^{\mathrm {T}}/{\alpha ^{ j}}\). Due to the full rank of \(X^{ j}, (X^{ j})^{\mathrm {T}}(X^{ j})\) is positive definite. The expression of the gradient in (2.5) implies that \((X^{ j})^{\mathrm {T}}\nabla f_\mu (X^{ j})\) is symmetric. Therefore, (3.2) is a generalized symmetric eigenvalue problem. The second part of the proposition follows directly from (7.8). \(\square \)

### 1.5 Proof of Lemma 3.2

Since \(U \in \mathbb {R}^{n \times d}\) is a basis of \(\mathcal {S}\), the solution of (3.7) can be expressed as \(X = UW\) for some \(W \in \mathbb {R}^{d \times k}\). Substituting \(X=UW\) into (3.7) and noting that \(U^{\mathrm {T}}AU=\Sigma \) and \(U^{\mathrm {T}}U=I\), we reduce (3.7) to

Using the fact that \(\Sigma \) is a diagonal matrix, it can be verified (see Theorem 2.1) that \(W = \begin{pmatrix} D&0 \end{pmatrix}^{\mathrm {T}}\), with the diagonal matrix *D* defined as in (3.10), is indeed a solution of (7.9). Therefore, \(Y=UW = U_kD\). \(\square \)

