Minimization principles and computation for the generalized linear response eigenvalue problem

Abstract

The minimization principle and Cauchy-like interlacing inequalities for the generalized linear response eigenvalue problem are presented. Based on these theoretical results, the best approximations through structure-preserving subspace projection and a locally optimal block conjugate gradient-like algorithm for simultaneously computing the first few smallest eigenvalues with the positive sign are proposed. Numerical results are presented to illustrate essential convergence behaviors of the proposed algorithm.

Introduction

In [2, 3, 19, 21], minimization principles and locally optimal 4-D conjugate gradient methods are established for the eigenvalue problem of the form:

$$\begin{aligned} \begin{bmatrix} 0&K \\ M&0 \end{bmatrix}\begin{bmatrix} y \\ x \end{bmatrix} =\lambda \begin{bmatrix} y \\ x \end{bmatrix}, \end{aligned}$$
(1.1)

where \(K\) and \(M\) are \(n \times n\) real symmetric positive semi-definite matrices and one of them is definite. It is referred to as the linear response (LR) eigenvalue problem because it is equivalent to the eigenvalue problem

$$\begin{aligned} \begin{bmatrix} A&B \\ -B&-A \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \lambda \begin{bmatrix} u \\ v \end{bmatrix} \end{aligned}$$
(1.2)

via a similarity transformation with the orthogonal matrix

$$\begin{aligned} J=\frac{1}{\sqrt{2}}\begin{bmatrix} I_n&I_n \\ I_n&-I_n \\ \end{bmatrix}, \end{aligned}$$
(1.3)

where \(A\) and \(B\) are \(n\times n\) real symmetric matrices such that the symmetric matrix \(\begin{bmatrix} A&B \\ B&A \end{bmatrix}\) is symmetric positive definiteFootnote 1. The eigenvalue problem (1.2) is the computational kernel in the response theory models for analyzing the response of a self-consistent-field state to an external perturbation in computational physics and chemistry, e.g., see [9, 14, 16, 20]. The eigenvalue problem (1.2) is also widely known as a random phase approximation eigenvalue problem, e.g. see [17, 18].

The generalized LR eigenvalue problem is of the form

$$\begin{aligned} \begin{bmatrix} A&B \\ -B&-A \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \lambda \begin{bmatrix} \Sigma&\Delta \\ \Delta&\Sigma \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}, \end{aligned}$$
(1.4)

where \(A\) and \(B\) are as in (1.2), and \(\Sigma \) and \(\Delta \) are also \(n\times n\) with \(\Sigma \) being symmetric while \(\Delta \) skew-symmetric (i.e., \(\Delta ^{{{\mathrm{T}}}}=-\Delta \)) such that \(\begin{bmatrix} \Sigma&\Delta \\ \Delta&\Sigma \end{bmatrix}\) is nonsingular. The generalized eigenvalue problem (1.4) arises from the study of transition properties and second and higher order response properties using a response function approach [7, 14, 15].

The generalized LR eigenvalue problem (1.4) can be transformed via the orthogonal matrix \(J\) to an equivalent eigenvalue problem that differs from (1.1) in the right hand side. In fact, it is easy to verify that

$$\begin{aligned} J^{{{\mathrm{T}}}}\begin{bmatrix} A&B \\ -B&-A \end{bmatrix}JJ^{{{\mathrm{T}}}} \begin{bmatrix} u \\ v \end{bmatrix} = \lambda J^{{{\mathrm{T}}}} \begin{bmatrix} \Sigma&\Delta \\ \Delta&\Sigma \end{bmatrix}J J^{{{\mathrm{T}}}} \begin{bmatrix} u \\ v \end{bmatrix} \end{aligned}$$

gives rise to

$$\begin{aligned} H z \equiv \begin{bmatrix} 0&K \\ M&0 \end{bmatrix}\begin{bmatrix} y \\ x \end{bmatrix} =\lambda \begin{bmatrix} E_+&0 \\ 0&E_- \end{bmatrix}\begin{bmatrix} y \\ x \end{bmatrix} \equiv \lambda E z, \end{aligned}$$
(1.5)

where

$$\begin{aligned} K=A-B,\, M=A+B, \, E_{\pm } = \Sigma \pm \Delta , \, \,\, \hbox {and} \,\, \begin{bmatrix} y \\ x \end{bmatrix} = J^{{{\mathrm{T}}}} \begin{bmatrix} u \\ v \end{bmatrix}. \end{aligned}$$
(1.6)

Furthermore, the positive definiteness of \(\begin{bmatrix} A&B \\ B&A \end{bmatrix}\) and the nonsingularity of \(\begin{bmatrix} \Sigma&\Delta \\ \Delta&\Sigma \end{bmatrix}\) are equivalent to that both \(K\) and \(M\) are positive definite, and \(E_{\pm }\) are nonsingularFootnote 2. Hence the eigenvalue problems (1.4) and (1.5) are equivalent: both have the same eigenvalues with corresponding eigenvectors related by

$$\begin{aligned} \begin{bmatrix} u \\ v \end{bmatrix} = J \begin{bmatrix} y \\ x \end{bmatrix}. \end{aligned}$$
(1.7)

The imposed conditions on \(A\), \(B\), \(\Sigma \), and \(\Delta \) imply that both \(K\) and \(M\) are real symmetric positive definite and \(E_{\pm }\) are nonsingular and \(E_+^{{{\mathrm{T}}}}=E_-\). In the rest of this article, the condition on \(K\) and \(M\) will be relaxed to that both are symmetric positive semi-definite and one of them is definite, unless explicitly stated otherwise.

Later, we will see that the \(2n\) eigenvalues of (1.5) are all real:

$$\begin{aligned} -\lambda _n\le \cdots \le -\lambda _1\le +\lambda _1\le \cdots \le +\lambda _n. \end{aligned}$$

Our main contributions in this paper are as follows.

  1. 1.

    As an extension of Thouless’ minimization principle, we will prove

    $$\begin{aligned} \lambda _1=\inf _{x,y}\frac{x^{{{\mathrm{T}}}}Kx+y^{{{\mathrm{T}}}}My}{2|x^{{{\mathrm{T}}}}E_+y|}. \end{aligned}$$
    (1.8)

    In the case when \(E_{\pm }=I\) and both \(K\) and \(M\) are definite, (1.8) becomes Thouless’ minimization principle [2, 19, 21].

  2. 2.

    We will prove a subspace version of the minimization principle (1.8):

    $$\begin{aligned} \sum _{i=1}^k\lambda _i=\frac{1}{2}\inf _{\begin{array}{c} U^{{{\mathrm{T}}}}E_+V=I_k \\ {U, V} \in {\mathbb R}^{n\times k} \end{array}}{{\mathrm{trace}}}(U^{{{\mathrm{T}}}}KU+V^{{{\mathrm{T}}}}MV). \end{aligned}$$
    (1.9)

    In the case when \(E_{\pm }=I\), (1.9) has already been proven in [2].

  3. 3.

    Let \(U\) and \(V\) be \(n\times \ell \) (where \(\ell <n\)) such that \(W{\mathop {=}\limits ^{\mathrm{def}}} U^{{{\mathrm{T}}}}E_+V\) is nonsingular, and factorize \(W\) as \(W=W_1^{{{\mathrm{T}}}}W_2\), where \(W_i\) are \(\ell \times \ell \) (and thus necessarily nonsingular). Define

    $$\begin{aligned} H_{\hbox {SR}}=\begin{bmatrix} 0&W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1} \\ W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}&0 \end{bmatrix} \end{aligned}$$
    (1.10)

    and denote the eigenvalues of \(H_{\hbox {SR}}\) by \(-\mu _{\ell }\le \cdots \le -\mu _1\le +\mu _1\le \cdots \le +\mu _{\ell }\). We obtain Cauchy-like inequalities for \(\lambda _i\) and \(\mu _i\) (see Theorem 3.4). In addition, we also show that

    $$\begin{aligned} \sum _{i=1}^k\mu _i= \frac{1}{2}\inf _{\begin{array}{c} \widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k\\ {{\mathrm{span}}}(\widehat{U})\subseteq \mathcal{U},\,{{\mathrm{span}}}(\widehat{V})\subseteq \mathcal{V} \end{array}}{{\mathrm{trace}}}(\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}), \end{aligned}$$
    (1.11)

    where \(\mathcal{U}={{\mathrm{span}}}(U)\) and \(\mathcal{V}={{\mathrm{span}}}(V)\) are the column spaces of \(U\) and \(V\), respectively.

  4. 4.

    Combining (1.10) and (1.11) with a variation of the classical conjugate gradient method, we establish a locally optimal block 4-D preconditioned conjugate gradient method to simultaneously compute the several smallest eigenvalues with the positive sign of the generalized LR eigenvalue problem (1.5).

The rest of this paper is organized as follows. In Sect. 2, we review basic theoretical results about the eigenvalue problem (1.1) and then introduce the concept of a pair of deflating subspaces and its approximation properties. In Sect. 3, we will prove a couple of the minimization principles and Cauchy-like interlacing inequalities. In Sect. 4, we discuss the metric about the best approximation from a pair of approximate deflating subspaces. In Sect. 5, we apply newly established minimization principles to derive CG type algorithms for computing the first few \(\lambda _i\). In Sect. 6, we present numerical results to illustrate the convergence behaviors of CG methods. Concluding remarks are in Sect. 7.

Basic theory and pair of deflating subspaces

Basics

In this subsection, we discuss some basic theoretical results on the LR eigenvalue problem (1.5). Mehl et al. [11] investigated the canonical forms of the same eigenvalue problem (1.5) under a more general context, namely no assumptions on \(K\) and \(M\) being positive (semi-)definite, except symmetry, and \(E_{\pm }\) being nonsingular. The results below in this section can essentially be derived from their more general setting, but in our context they can also be easily derived (see [1] for details). For this reason, we will leave out the proofs of the theorems in this subsection.

Decompose \(E_{\pm }\) as

$$\begin{aligned} E_-^{{{\mathrm{T}}}}=E_+=CD^{{{\mathrm{T}}}}, \end{aligned}$$
(2.1)

where \(C,\, D\in {\mathbb R}^{n\times n}\) are nonsingular. How this factorization is done is not mathematically essential. For example, we can simply let one of \(C\) and \(D\) be \(I_n\).

With (2.1), the LR eigenvalue problem (1.5) is equivalent to

(2.2)

where

(2.3)

We now have two equivalent eigenvalue problems (1.5) and (2.2) in the sense that both have the same eigenvalues and their eigenvectors are related by the relation shown in (2.3).

The problem (2.2) takes the same form as (1.1), making it possible for us to simply adapt the results in [2, 3] for (2.2) and then translate them for the generalized LR eigenvalue problem (1.5). However, we should note that for practical considerations, the problem (2.2) should never be explicitly formed to avoid destroying. e.g., the sparsity in \(K\) and \(M\) or other structural properties. Sometimes \(K\), \(M\), and \(E_{\pm }\) simply may not be available and their very existences are through matrix-vector multiplications. In such cases, explicitly forming (2.2) just cannot be accomplished. For our purpose in this paper, the significance of transforming (1.5) into (2.2) lies only in theoretical developments and efficient algorithm derivations.

For the eigenvalue problem (2.2), we know that \(\mathcal {K},\,\mathcal {M}\succeq 0\) because \(K,\,M\succeq 0\), where \(X\succ 0\) (\(X\succeq 0\)) means \(X\) is real symmetric positive (semi-)definite. As argued in [2] for (1.1), the eigenvalues for (2.2) are real and come in \(\pm \lambda \) pairs. More precisely, denote the eigenvalues of \(\mathcal {K}\,\mathcal {M}\) by \(\lambda _i^2\) (\(1\le i\le n\)) in the ascending order:

$$\begin{aligned} 0\le \lambda _1^2\le \lambda _2^2\le \cdots \le \lambda _n^2, \end{aligned}$$
(2.4)

where all \(\lambda _i\ge 0\) and thus \(0\le \lambda _1\le \lambda _2\le \cdots \le \lambda _n\). The eigenvalues of \(\mathcal {M}\mathcal {K}\) are \(\lambda _i^2\) (\(1\le i\le n\)), too. The eigenvalues of \(H-\lambda E\) are then \(\pm \lambda _i\) for \(i=1,2,\ldots ,n\) with the ordering

$$\begin{aligned} -\lambda _n\le \cdots \le -\lambda _1\le +\lambda _1\le \cdots \le +\lambda _n. \end{aligned}$$
(2.5)

For convenience, we shall associate half of 0 eigenvalues with the positive sign and the other half with the negative sign, as argued in [2]. Doing so legitimizes the use of the phrase “the first \(k\) smallest eigenvalues with the positive sign of \(H-\lambda E\)” to refer to \(\lambda _i\) for \(1\le i\le k\) without ambiguity even when \(\lambda _1=+0\). Throughout this paper, we will stick to using \(\pm \lambda _i\) for \(1\le i\le n\) in the order of (2.5) to denote the eigenvalues of \(H-\lambda E\).

Set

$$\begin{aligned} \fancyscript{I}=\begin{bmatrix} 0&I_n \\ I_n&0 \end{bmatrix}, \quad \fancyscript{I}_{E}=\begin{bmatrix} 0&E_- \\ E_+&0 \end{bmatrix}=\Gamma \fancyscript{I} \Gamma ^{{{\mathrm{T}}}}, \end{aligned}$$
(2.6)

where \(\Gamma \) is given in (2.3). Both are symmetric but indefinite. The matrices \(\fancyscript{I}_{E}\) and \(\fancyscript{I}\) induce indefinite inner products on \({\mathbb R}^{2n}\):

$$\begin{aligned} \langle z_1,z_2\rangle _{\fancyscript{I}_{E}}\mathop {=}\limits ^{{\hbox {def}}}z_1^{{{\mathrm{T}}}}\fancyscript{I}_{E}z_2 \equiv \langle w_1,w_2\rangle _{\fancyscript{I}}\mathop {=}\limits ^{{\hbox {def}}}w_1^{{{\mathrm{T}}}}\fancyscript{I}w_2, \end{aligned}$$

where \(w_i=\Gamma ^{{{\mathrm{T}}}} z_i\). The following theorem tells us some orthogonality properties among the eigenvectors for \(H-\lambda E\).

Theorem 2.1

  1. 1.

    Let \((\alpha ,z)\) be an eigenpair of \(H-\lambda E\), where \(z= \begin{bmatrix} y \\ x \end{bmatrix}\ne 0\) and \(x,\,y\in {\mathbb R}^n\). Then \(\alpha \langle z,z\rangle _{\fancyscript{I}_{E}}=2\alpha \,x^{{{\mathrm{T}}}}E_+y>0\) if \(\alpha \ne 0\). In particular, this implies \(\langle z,z\rangle _{\fancyscript{I}_{E}}=2x^{{{\mathrm{T}}}}E_+y\ne 0\) if \(\alpha \ne 0\).

  2. 2.

    Let \((\alpha _i,z_i)\) \((i=1,2)\) be two eigenpairs of \(H-\lambda E\). Partition \(z_i=\begin{bmatrix} y_i \\ x_i \end{bmatrix}\ne 0\), where \(x_i,\,y_i\in {\mathbb R}^n\).

    1. a.

      If \(\alpha _1\ne \alpha _2\), then \(\langle z_1,z_2\rangle _{\fancyscript{I}_{E}}=y_1^{{{\mathrm{T}}}}E_-x_2+x_1^{{{\mathrm{T}}}}E_+y_2=0\).

    2. b.

      If \(\alpha _1\ne \pm \alpha _2\ne 0\), then \(y_1^{{{\mathrm{T}}}}E_-x_2=x_1^{{{\mathrm{T}}}}E_+y_2=0\).

For the sake of presentation, in what follows we either assume that \(M\) is definite or only provide proofs for definite \(M\) whenever one of \(K\) and \(M\) is required to be definite. Doing so loses no generality because the interchangeable roles played by \(K\) and \(M\) make it rather straightforward to create a version for the case when \(K\) is definite by simply swapping \(K\) and \(M\) in each of their appearances and \(E_+\) and \(E_-\) in each of their appearances.

Theorem 2.2

Suppose that \(M\succ 0\), and define \(C\) and \(D\) by (2.1). Then the following statements are true:

  1. 1.

    There exist nonsingular \(\mathcal {X},\,\mathcal {Y}\in {\mathbb R}^{n\times n}\) such that

    $$\begin{aligned} K=C\mathcal {Y}\Lambda ^2 \mathcal {Y}^{{{\mathrm{T}}}}C^{{{\mathrm{T}}}}, \quad M=D\mathcal {X}\mathcal {X}^{{{\mathrm{T}}}}D^{{{\mathrm{T}}}}, \end{aligned}$$
    (2.7)

    where \(\Lambda ={{\mathrm{diag}}}(\lambda _1,\lambda _2,\ldots ,\lambda _n)\) and \(\mathcal {X}= \mathcal {Y}^{-{{\mathrm{T}}}}\).

  2. 2.

    If \(K\) is also definite, then all \(\lambda _i>0\) and \(H-\lambda E\) is diagonalizable:

    $$\begin{aligned} H Z =EZ \begin{bmatrix} \Lambda&\\&-\Lambda \end{bmatrix}, \quad \hbox {where} \quad Z = \Gamma ^{-{{\mathrm{T}}}}\begin{bmatrix} \mathcal {Y}\Lambda&\mathcal {Y}\Lambda \\ \mathcal {X}&-\mathcal {X}\end{bmatrix}. \end{aligned}$$
    (2.8)
  3. 3.

    \(H-\lambda E\) is not diagonalizable if and only if \(\lambda _1=0\) which happens when and only when \(K\) is singular.

  4. 4.

    The \(i\)th column of \(Z\) is the eigenvector of \(H-\lambda E\) corresponding to \(\lambda _i\), where \(1 \le i \le n\), and it is unique if

    1. a.

      \(\lambda _i\) is a simple eigenvalue of (2.2), or

    2. b.

      \(i=1, \lambda _1=+0<\lambda _2\). In this case, \(0\) is a double eigenvalue of \(H-\lambda E\) but there is only one eigenvector associated with it.

  5. 5.

    If \(0=\lambda _1=\cdots =\lambda _{\ell }<\lambda _{\ell +1}\), then the Kronecker canonical form of \(H-\lambda E\) is

    $$\begin{aligned} \underbrace{\begin{bmatrix} 0&0 \\ 1&0 \end{bmatrix}\oplus \cdots \oplus \begin{bmatrix} 0&0 \\ 1&0 \end{bmatrix}}_{\ell }\oplus {{\mathrm{diag}}}(\lambda _{\ell +1},-\lambda _{\ell +1},\ldots , \lambda _n,-\lambda _n)-\lambda I_{2n}, . \end{aligned}$$
    (2.9)

    where \(X_1\oplus \cdots \oplus X_k\) denote a block-diagonal matrix with \(i\)th diagonal block \(X_i\). Thus \(H-\lambda E\) has \(0\) as an eigenvalue of algebraic multiplicity \(2\ell \) with only \(\ell \) linearly independent eigenvectors which are the columns of \(\Gamma ^{-{{\mathrm{T}}}}\begin{bmatrix} 0 \\ \mathcal {X}_{(:,1:\ell )} \end{bmatrix}\).

Pair of deflating subspaces

Let \(\mathcal{U}, \mathcal{V}\subseteq {\mathbb R}^n\) be subspaces. We call \(\{\mathcal{U},\mathcal{V}\}\) a pair of deflating subspaces of \(H-\lambda E\) if

$$\begin{aligned} K\mathcal{U}\subseteq E_+\mathcal{V} \quad \hbox {and} \quad M\mathcal{V}\subseteq E_-\mathcal{U}. \end{aligned}$$
(2.10)

Let \(U\in {\mathbb R}^{n\times k}\) and \(V\in {\mathbb R}^{n\times \ell }\) be the basis matrices for the subspaces \(\mathcal{U}\) and \(\mathcal{V}\), respectively, where \(\dim (\mathcal{U})=k\) and \(\dim (\mathcal{V})=\ell \). Then (2.10) implies that there exist \(K_{\hbox {R}}\in {\mathbb R}^{\ell \times k}\) and \(M_{\hbox {R}}\in {\mathbb R}^{k\times \ell }\) such that

$$\begin{aligned} KU=E_+VK_{\hbox {R}}, \quad MV=E_-UM_{\hbox {R}}. \end{aligned}$$
(2.11)

Given \(U\) and \(V\), both \(K_{\hbox {R}}\) and \(M_{\hbox {R}}\) are uniquely determined by respective equations in (2.11), but there are numerous ways to express them. In fact for any left generalized inverses \(U^{\dashv }\) and \(V^{\dashv }\) of \(E_-U\) and \(E_+V\), respectively, i.e., \(U^{\dashv }E_-U=I_k\) and \(V^{\dashv }E_+V=I_{\ell }\),

$$\begin{aligned} K_{\hbox {R}}=V^{\dashv }KU, \quad M_{\hbox {R}}=U^{\dashv }MV. \end{aligned}$$
(2.12)

There are infinitely many left generalized inverses \(U^{\dashv }\) and \(V^{\dashv }\). For example,

$$\begin{aligned} U^{\dashv }&= (U^{{{\mathrm{T}}}}E_-U)^{-1}U^{{{\mathrm{T}}}}\quad \hbox {if} \ (U^{{{\mathrm{T}}}}E_-U)^{-1} \hbox {exists}, \\ V^{\dashv }&= (V^{{{\mathrm{T}}}}E_+V)^{-1}V^{{{\mathrm{T}}}}\quad \hbox {if} \ (V^{{{\mathrm{T}}}}E_+V)^{-1} \hbox {exists} \end{aligned}$$

or, if \(U^{{{\mathrm{T}}}}E_+V=(V^{{{\mathrm{T}}}}E_-U)^{{{\mathrm{T}}}}\) is nonsingular, then

$$\begin{aligned} U^{\dashv }= (V^{{{\mathrm{T}}}}E_-U)^{-1}V^{{{\mathrm{T}}}}, \quad V^{\dashv }= (U^{{{\mathrm{T}}}}E_+V)^{-1}U^{{{\mathrm{T}}}}. \end{aligned}$$
(2.13)

But still \(K_{\hbox {R}}\) and \(M_{\hbox {R}}\) are unique. The left generalized inverses in (2.13) will become important later in preserving symmetry in \(K\) and \(M\).

Define

$$\begin{aligned} H_{\hbox {R}}=\begin{bmatrix} 0&K_{\hbox {R}} \\ M_{\hbox {R}}&0 \end{bmatrix}. \end{aligned}$$
(2.14)

Then \(H_{\hbox {R}}\) is the restriction of \(H-\lambda E\) onto \(\mathcal{V}\oplus \mathcal{U}\) with respect to the basis matrix \(V\oplus U\):

$$\begin{aligned} H\begin{bmatrix} V&\\&U \end{bmatrix} =E\begin{bmatrix} V&\\&U \end{bmatrix} H_{\hbox {R}}. \end{aligned}$$
(2.15)

\(H_{\hbox {R}}\) in (2.14) inherits the block structure in \(H\): zero blocks remain zero blocks. But when \(K\) and \(M\) are symmetric, in general \(H_{\hbox {R}}\) may lose the symmetry property in its off-diagonal blocks \(K_{\hbox {R}}\) and \(M_{\hbox {R}}\), not to mention positive semi-definiteness in \(K\) and \(M\). We propose a modification to \(H_{\hbox {R}}\) to overcome this potential loss, when

$$\begin{aligned} W\mathop {=}\limits ^{{\hbox {def}}}U^{{{\mathrm{T}}}}E_+V \end{aligned}$$

is nonsingular. Factorize \(W=W_1^{{{\mathrm{T}}}}W_2\), where \(W_1\) and \(W_2\) are nonsingular, and define

$$\begin{aligned} H_{\hbox {SR}}=\begin{bmatrix} 0&W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1} \\ W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}&0 \end{bmatrix}. \end{aligned}$$
(2.16)

Note \(H_{\hbox {SR}}\) shares not only the block structure in \(H\) but also the symmetry and semi-definiteness in its off-diagonal blocks.

Theorem 2.3

Let \(H_{\hbox {SR}}\) be defined by (2.16). Then

$$\begin{aligned} H\begin{bmatrix} VW_2^{-1}&\\&UW_1^{-1} \end{bmatrix} =E\begin{bmatrix} VW_2^{-1}&\\&UW_1^{-1} \end{bmatrix} H_{\hbox {SR}}. \end{aligned}$$
(2.17)

Consequently, if \((\hat{\lambda }, \hat{z})\) is an eigenpair of \(H_{\hbox {SR}}\), then \(z = \begin{bmatrix} y \\ x \end{bmatrix} = \begin{bmatrix} VW_2^{-1}\hat{y} \\ UW_1^{-1}\hat{x} \end{bmatrix}\) is an eigenpair of the LR eigenproblem \(H-\lambda E\), where \(\hat{z}=\begin{bmatrix} \hat{y} \\ \hat{x} \end{bmatrix}\) is conformally partitioned.

Proof

Equations in (2.11) hold for some \(K_{\hbox {R}}\) and \(M_{\hbox {R}}\). Thus

$$\begin{aligned} U^{{{\mathrm{T}}}}KU&=(U^{{{\mathrm{T}}}}E_+V)K_{\hbox {R}}=W_1^{{{\mathrm{T}}}}W_2K_{\hbox {R}}, \\ V^{{{\mathrm{T}}}}MV&=(V^{{{\mathrm{T}}}}E_-U)M_{\hbox {R}}=W_2^{{{\mathrm{T}}}}W_1M_{\hbox {R}}, \end{aligned}$$

which gives

$$\begin{aligned} W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1}=W_2K_{\hbox {R}}W_1^{-1}, \quad W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}=W_1M_{\hbox {R}}W_2^{-1}. \end{aligned}$$
(2.18)

Now use (2.11) and (2.18) to get

$$\begin{aligned} K(UW_1^{-1})&=E_+VK_{\hbox {R}}W_1^{-1} \\&=E_+(VW_2^{-1})(W_2K_{\hbox {R}}W_1^{-1}) \\&=E_+(VW_2^{-1})(W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1}), \\ M(VW_2^{-1})&=E_-(UW_1^{-1})(W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}). \end{aligned}$$

They yield (2.17).

Multiply \(\hat{z}\) to the both sides of (2.17) from the right and use \(H_{\hbox {SR}}\hat{z}=\hat{\lambda }\hat{z}\) to conclude the rest of the theorem. \(\square \)

Note that the well-definedness of \(H_{\hbox {SR}}\) as in (2.16) alone does not require \(\{{{\mathrm{span}}}(U),{{\mathrm{span}}}(V)\}\) be a pair of deflating subspaces of \(H-\lambda E\). For that, the nonsingularity of \(U^{{{\mathrm{T}}}}E_+V\) is sufficient. \(H_{\hbox {SR}}\) will play particularly important roles in the rest of this article.

Approximate pair of deflating subspaces

In practical computations, rarely pairs of exact deflating subspaces are known, only approximate ones. The question then arises: how to compute approximate eigenpairs of \(H-\lambda E\) given a pair of approximate deflating subspaces. Theorem 2.3 shed light on how this can be done.

Let \(\{\mathcal{U},\mathcal{V}\}\) be a pair of approximate deflating subspaces. Pick basis matrices \(U\) and \(V\) of \(\mathcal{U}\) and \(\mathcal{V}\), respectively, and define \(H_{\hbox {SR}}\) according to (2.16). The following algorithm returns approximate eigenvalues and eigenvectors of \(H-\lambda E\) from the given approximate pair of deflating subspaces \(\{\mathcal{U},\mathcal{V}\}\):

Algorithm 2.1

  1. 1.

    Construct \(H_{\hbox {SR}}\) as in (2.16) if \(U^{{{\mathrm{T}}}}E_+V\) is nonsingular;

  2. 2.

    Compute the eigenpairs \(\left\{ \hat{\lambda }, \begin{bmatrix} \hat{y} \\ \hat{x} \\ \end{bmatrix} \right\} \) of \(H_{\hbox {SR}}\);

  3. 3.

    The computed eigenvalues \(\hat{\lambda }\) approximate some eigenvalues of \(H-\lambda E\), and the associated approximate eigenvectors are \(\begin{bmatrix} VW_2^{-1}\hat{y} \\ UW_1^{-1}\hat{x} \end{bmatrix}\) according to Theorem 2.3.

Given two subspaces \(\mathcal{U}\) and \(\mathcal{V}\), there are many ways to construct \(H_{\hbox {SR}}\) due to the factorization \(W=W_1^{{{\mathrm{T}}}}W_2\) and basis matrices \(U\) and \(V\) are not unique. The argument similar to the one in [3] can be used to argue that the approximations by Algorithm 2.1 are invariant with respect to how \(H_{\hbox {SR}}\) is constructed. See also [1].

Minimization principles

Define the functional

$$\begin{aligned} \rho (x,y)\mathop {=}\limits ^{{\hbox {def}}}\frac{x^{{{\mathrm{T}}}}Kx+y^{{{\mathrm{T}}}}My}{2|x^{{{\mathrm{T}}}}E_+y|}, \end{aligned}$$
(3.1)

where \(y^{{{\mathrm{T}}}}E_-x\) can be used in place of \(x^{{{\mathrm{T}}}}E_+y\) due to the fact \((x^{{{\mathrm{T}}}}E_+y)^{{{\mathrm{T}}}}=y^{{{\mathrm{T}}}}E_-x\) for any \(x\) and \(y\). Relating (1.5) to (1.4) through the transformation (1.7), we find

$$\begin{aligned} \rho (x,y)\equiv \varrho (u,v)\mathop {=}\limits ^{{\hbox {def}}}\frac{\begin{bmatrix} u \\ v \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} A&B \\ B&A \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}}{\left| \begin{bmatrix} u \\ v \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} \Sigma&\Delta \\ -\Delta&-\Sigma \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}\right| }. \end{aligned}$$
(3.2)

Both \(\varrho (u,v)\) and \(\rho (x,y)\) were defined in [2] but only for the case \(E_{\pm }=I_n\) and, correspondingly, \(\Sigma =I_n\) and \(\Delta =0\). We will call them, without distinction, the Thouless functional (in different forms).

Several theoretical results for the case \(E=I_{2n}\) were established in [2]. In this section, we establish analogs of these results for the matrix pencil \(H-\lambda E\).

Minimization principles

Theorem 3.1 is actually a corollary of Theorem 3.2, it is presented here for its simplicity.

Theorem 3.1

We have

$$\begin{aligned} \lambda _1=\inf _{x,y}\rho (x,y). \end{aligned}$$
(3.3)

Moreover, “\(\inf \)” can be replaced by “\(\min \)” if and only if both \(K,\,M\succ 0\). When both \(K,\,M\succ 0\), the optimal argument pair \((x,y)\) gives rise to an eigenvector \(z=\begin{bmatrix} y \\ x \end{bmatrix}\) of \(H-\lambda E\) associated with \(\lambda _1\).

Proof

It is easy to see that

where , and \(\mathcal {K}\) and \(\mathcal {M}\) are as given in (2.3). The theorem is then a consequence of [2, Theorem 3.1]. \(\square \)

Owing to that (1.5) and (1.4) being equivalent through the transformation (1.7), we have for the original LR problem (1.4)

$$\begin{aligned} \lambda _1=\inf _{u,v}\varrho (u,v). \end{aligned}$$
(3.4)

For the case \(\Sigma =I_n\) and \(\Delta =0\) and when both \(K,\,M\succ 0\), this was established by Thouless [19].

Theorem 3.2

We have

$$\begin{aligned} \sum _{i=1}^k\lambda _i=\frac{1}{2} \inf _{\begin{array}{c} {U^{{{\mathrm{T}}}}E_+V=I_k}\\ {U, V \in {\mathbb R}^{n\times k}} \end{array}}{{\mathrm{trace}}}(U^{{{\mathrm{T}}}}KU+V^{{{\mathrm{T}}}}MV). \end{aligned}$$
(3.5)

Moreover, “\(\inf \)” can be replaced by “\(\min \)” if and only if both \(K,\,M\succ 0\). When both \(K,\,M\succ 0\) and if also \(\lambda _k<\lambda _{k+1}\), then for any \(U\) and \(V\) that attain the minimum, \(\{{{\mathrm{span}}}(U),{{\mathrm{span}}}(V)\}\) is a pair of deflating subspaces of \(H-\lambda E\) and the corresponding \(H_{\hbox {SR}}\) has eigenvalues \(\pm \lambda _i\) for \(1\le i\le k\).

Proof

We notice that

$$\begin{aligned} U^{{{\mathrm{T}}}}KU+V^{{{\mathrm{T}}}}MV=\mathcal {U}^{{{\mathrm{T}}}}\mathcal {K}\mathcal {U}+\mathcal {V}^{{{\mathrm{T}}}}\mathcal {M}\mathcal {V}, U^{{{\mathrm{T}}}}E_+V=\mathcal {U}^{{{\mathrm{T}}}}\mathcal {V}, \end{aligned}$$

where \(\mathcal {U}=C^{{{\mathrm{T}}}}U\) and \(\mathcal {V}=D^{{{\mathrm{T}}}}V\). Therefore, the theorem is a consequence of [2, Theorem 3.2], \(\square \)

Exploiting the close relation (3.2) between the two different functionals \(\varrho (\cdot ,\cdot )\) and \(\rho (\cdot ,\cdot )\), we have by Theorem 3.2 the following theorem for the original LR eigenvalue problem (1.4).

Theorem 3.3

Suppose that \(A,\,B,\,\Sigma \in {\mathbb R}^{n\times n}\) are symmetric and \(\Delta \in {\mathbb R}^{n\times n}\) is antisymmetric, and that both \(A\pm B\succeq 0\) and one of them is definite and \(\Sigma \pm \Delta \) are nonsingular. Then we have

$$\begin{aligned} \sum _{i=1}^k\lambda _i=\frac{1}{2}\,\inf {{\mathrm{trace}}}\left( \begin{bmatrix} U \\ V \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} A&B \\ B&A \end{bmatrix} \begin{bmatrix} U \\ V \end{bmatrix}\right) , \end{aligned}$$
(3.6)

where “\(\inf \)” is taken over all \(U,\,V\in {\mathbb R}^{n\times k}\) subject to

$$\begin{aligned} \begin{bmatrix} U \\ V \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} \Sigma&\Delta \\ -\Delta&-\Sigma \end{bmatrix} \begin{bmatrix} U \\ V \end{bmatrix}=2I_k \quad \hbox {and} \quad \begin{bmatrix} U \\ V \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} \Sigma&\Delta \\ -\Delta&-\Sigma \end{bmatrix} \begin{bmatrix} V \\ U \end{bmatrix}=0. \end{aligned}$$
(3.7)

Moreover, “\(\inf \)” can be replaced by “\(\min \)” if and only if both \(A\pm B\succ 0\).

Proof

Assume the assignments in (1.6) for \(K\) and \(M\). We have

$$\begin{aligned} \begin{bmatrix} U \\ V \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} A&B \\ B&A \end{bmatrix} \begin{bmatrix} U \\ V \end{bmatrix} =\begin{bmatrix} \widehat{V} \\ \widehat{U} \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} M&\\&K \end{bmatrix} \begin{bmatrix} \widehat{V} \\ \widehat{U} \end{bmatrix} =\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}, \end{aligned}$$

where

$$\begin{aligned} \begin{bmatrix} \widehat{V} \\ \widehat{U} \end{bmatrix}=J^{{{\mathrm{T}}}}\begin{bmatrix} U \\ V \end{bmatrix} =\frac{1}{\sqrt{2}}\begin{bmatrix} U+V \\ U-V \end{bmatrix}. \end{aligned}$$

Therefore

$$\begin{aligned}&\inf _{\widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k}{{\mathrm{trace}}}(\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}) \nonumber \\&\quad =\inf _{(U-V)^{{{\mathrm{T}}}}E_+(U+V)=2I_k} {{\mathrm{trace}}}\left( \begin{bmatrix} U \\ V \end{bmatrix}^{{{\mathrm{T}}}} \begin{bmatrix} A&B \\ B&A \end{bmatrix} \begin{bmatrix} U \\ V \end{bmatrix}\right) , \end{aligned}$$
(3.8)

where \(E_{\pm } = \Sigma \pm \Delta \). We claim

$$\begin{aligned} (U-V)^{{{\mathrm{T}}}}E_+(U+V)=2I_k\quad \Leftrightarrow \quad (3.7). \end{aligned}$$
(3.9)

This is because \((U-V)^{{{\mathrm{T}}}}E_+(U+V)=2I_k\) and its transpose version give

$$\begin{aligned} U^{{{\mathrm{T}}}}E_+U+U^{{{\mathrm{T}}}}E_+V-V^{{{\mathrm{T}}}}E_+U-V^{{{\mathrm{T}}}}E_+V=&2I_k,\end{aligned}$$
(3.10a)
$$\begin{aligned} U^{{{\mathrm{T}}}}E_-U+V^{{{\mathrm{T}}}}E_-U-U^{{{\mathrm{T}}}}E_-V-V^{{{\mathrm{T}}}}E_-V=&2I_k. \end{aligned}$$
(3.10b)

Add both equations in (3.10) and subtract one from the other to get

$$\begin{aligned} U^{{{\mathrm{T}}}}\Sigma U+U^{{{\mathrm{T}}}}\Delta V-V^{{{\mathrm{T}}}}\Delta U-V^{{{\mathrm{T}}}}\Sigma V=&2I_k,\\ U^{{{\mathrm{T}}}}\Delta U-V^{{{\mathrm{T}}}}\Sigma U+U^{{{\mathrm{T}}}}\Sigma V-V^{{{\mathrm{T}}}}\Delta V=&0. \end{aligned}$$

They are equivalent to (3.7). Equation (3.6) is now a consequence of Theorem 3.2, (3.8), and (3.9). \(\square \)

Cauchy-like interlacing inequalities

The following theorem can be regarded as an extension of Cauchy’s interlacing inequalities for the symmetric eigenvalue problem.

Theorem 3.4

Let \(U, V\in {\mathbb R}^{n\times k}\) such that \(W\mathop {=}\limits ^{{\hbox {def}}}U^{{{\mathrm{T}}}}E_+V\) is nonsingular. Factorize \(W=W_1^{{{\mathrm{T}}}}W_2\), where \(W_i\in {\mathbb R}^{k\times k}\) are nonsingular, and define \(H_{\hbox {SR}}\) by (2.16). Denote by \(\pm \mu _i\) for \(1\le i\le k\) the eigenvalues of \(H_{\hbox {SR}}\), where \(0\le \mu _1\le \cdots \le \mu _k\). Then

$$\begin{aligned} \lambda _i\le \mu _i\le \beta \,\lambda _{i+n-k}\quad \hbox {for} \ 1\le i\le k, \end{aligned}$$
(3.11)

where \(\beta = {\sqrt{\min \{\kappa (\mathcal {K}),\kappa (\mathcal {M})\}}}/{\cos \angle (C^{{{\mathrm{T}}}}\mathcal{U},D^{{{\mathrm{T}}}}\mathcal{V})}, \kappa (X)\mathop {=}\limits ^{{\hbox {def}}}\Vert X\Vert _2\Vert X^{-1}\Vert _2\) is the spectral condition number of the matrix \(X, \mathcal{U}={{\mathrm{span}}}(U)\) and \(\mathcal{V}={{\mathrm{span}}}(V)\), and \(\angle (C^{{{\mathrm{T}}}}\mathcal{U},D^{{{\mathrm{T}}}}\mathcal{V})\) is the angle between \(C^{{{\mathrm{T}}}}\mathcal{U}\) and \(D^{{{\mathrm{T}}}}\mathcal{V}\).

Furthermore, if \(\lambda _k<\lambda _{k+1}\) and \(\lambda _i=\mu _i\) for \(1\le i\le k\), then

  1. 1.

    \(\mathcal{U}={{\mathrm{span}}}(C^{-{{\mathrm{T}}}}\mathcal {X}_{(1:k,:)})\) whenFootnote 3 \(M\succ 0\), where \(\mathcal {X}\) is as in Theorem 2.2;

  2. 2.

    \(\{\mathcal{U},\mathcal{V}\}\) is a pair of deflating subspaces of \(H-\lambda E\) corresponding to the eigenvalues \(\pm \lambda _i\) for \(1\le i\le k\) of (1.5) when both \(K,\,M\succ 0\).

Proof

Apply [2, Theorem 4.1] to the eigenvalue problem for \(\mathcal {H}\) in (2.2). \(\square \)

The inequalities in (3.11) mirror Cauchy’s interlacing inequalities for the symmetric eigenvalue problem. But the upper bound on \(\mu _i\) by (3.11) is more complicated. The factor \([\cos \angle (C^{{{\mathrm{T}}}}\mathcal{U},D^{{{\mathrm{T}}}}\mathcal{V})]^{-1}\) in general cannot be removed according to the example in [2, Remark 4.2] for the case \(C=D=I\).

Theorem 3.5

Under the assumptions of Theorem 3.4, if either \(E_-\mathcal{U}\subseteq M\mathcal{V}\) when \(M\succ 0\) or \(E_+\mathcal{V}\subseteq K\mathcal{U}\) when \(K\succ 0\), then

$$\begin{aligned} \lambda _i\le \mu _i\le \lambda _{i+n-k}\quad \hbox {for} \ 1\le i\le k. \end{aligned}$$
(3.12)

Proof

Note that

$$\begin{aligned} E_-\mathcal{U}\subseteq M\mathcal{V} \,\Leftrightarrow \, C^{{{\mathrm{T}}}}\mathcal{U}\subseteq \mathcal {M}D^{{{\mathrm{T}}}}\mathcal{V}, \quad E_+\mathcal{V}\subseteq K\mathcal{U} \,\Leftrightarrow \, D^{{{\mathrm{T}}}}\mathcal{V}\subseteq \mathcal {K}C^{{{\mathrm{T}}}}\mathcal{U} \end{aligned}$$

and then apply [2, Theorem 4.3] to the eigenvalue problem for \(\mathcal {H}\) in (2.2). \(\square \)

Best approximations by a pair of subspaces

Recall the default assumption that \(K, \, M \succeq 0\) and one of them is definite. Let \(\{\mathcal{U}, \mathcal{V}\}\) be a pair of approximate deflating subspaces of \(H-\lambda E\) and \(\dim (\mathcal{U})=\ell _1\) and \(\dim (\mathcal{V})=\ell _2\). Motivated by the minimization principles in Theorems 3.1 and 3.2, we would seek the best approximations to \(\lambda _j\) for \(1\le j\le k\) in the sense of

$$\begin{aligned} \frac{1}{2}\inf _{\begin{array}{c} \widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k\\ {{\mathrm{span}}}(\widehat{U})\subseteq \mathcal{U},{{\mathrm{span}}}(\widehat{V})\subseteq \mathcal{V} \end{array} } {{\mathrm{trace}}}(\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}) \end{aligned}$$
(4.1)

and their associated approximate eigenvectors. Necessarily \(k\le \ell \). To this end, we divide our investigation into two cases. Let \(U\in {\mathbb R}^{n\times \ell _1},\,V\in {\mathbb R}^{n\times \ell _2}\) be the basis matrices of \(\mathcal{U}\) and \(\mathcal{V}\), respectively, and set \(W=U^{{{\mathrm{T}}}}E_+V\). The two cases are

  1. 1.

    \(W\) is nonsingular. Necessarily, \(\ell _1=\ell _2\). Set \(\ell =\ell _1\).

  2. 2.

    \(W\) is singular or \(\ell _1\ne \ell _2\).

For the first case, i.e., \(W\) is nonsingular, let us factorize \(W=W_1^{{{\mathrm{T}}}}W_2\), where \(W_i\in {\mathbb R}^{\ell \times \ell }\) are nonsingularFootnote 4. Note that any \(\widehat{U}\) and \(\widehat{V}\) such that \({{\mathrm{span}}}(\widehat{U})\subseteq \mathcal{U}\), \({{\mathrm{span}}}(\widehat{V})\subseteq \mathcal{V}\), and \(\widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k\) can be written as

$$\begin{aligned} \widehat{U}=UW_1^{-1}\,\widehat{X}, \quad \widehat{V}=VW_2^{-1}\,\widehat{Y}, \end{aligned}$$

where \(\widehat{X},\,\widehat{Y}\in {\mathbb R}^{\ell \times k}\) and \(\widehat{X}^{{{\mathrm{T}}}}\widehat{Y}=I_k\), and vice versa. Hence we have

$$\begin{aligned} \widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V} =\widehat{X}^{{{\mathrm{T}}}}W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1}\widehat{X}+\widehat{Y}^{{{\mathrm{T}}}}W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}\widehat{Y} \end{aligned}$$

and thus

$$\begin{aligned}&\inf _{\begin{array}{c} \widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k \\ {{\mathrm{span}}}(\widehat{U})\subseteq \mathcal{U},{{\mathrm{span}}}(\widehat{V})\subseteq \mathcal{V} \end{array} } {{\mathrm{trace}}}(\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}) \nonumber \\&\quad =\inf _{\widehat{X}^{{{\mathrm{T}}}}\widehat{Y}=I_k} {{\mathrm{trace}}}(\widehat{X}^{{{\mathrm{T}}}}W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1}\widehat{X}+\widehat{Y}^{{{\mathrm{T}}}}W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}\widehat{Y}). \end{aligned}$$
(4.2)

By Theorem 3.2, we know that the right-hand side of (4.2) is the sum of the \(k\) smallest eigenvalues with the positive sign of \(H_{\hbox {SR}}\) defined earlier in subsection 2.2:

$$\begin{aligned} H_{\hbox {SR}}=\begin{bmatrix} 0&W_1^{-{{\mathrm{T}}}}U^{{{\mathrm{T}}}}KUW_1^{-1} \\ W_2^{-{{\mathrm{T}}}}V^{{{\mathrm{T}}}}MVW_2^{-1}&0 \end{bmatrix}\in {\mathbb R}^{2\ell \times 2\ell }. \end{aligned}$$
(2.16)

In summary, the best approximations to the first \(k\) eigenvalues with the positive sign of \(H-\lambda E\) within the pair of approximate deflating subspaces are those of \(H_{\hbox {SR}}\). Algorithmically, denote by \(\mu _j\) (\(j=1,\ldots ,\ell \)) the eigenvalues with the positive sign of \(H_{\hbox {SR}}\) in the ascending order, i.e., \(0\le \mu _1\le \cdots \le \mu _{\ell }\), and by \(\hat{z}_j\) the associated eigenvectors:

$$\begin{aligned} H_{\hbox {SR}}\hat{z}_j=\mu _j\hat{z}_j, \quad \hat{z}_j=\begin{bmatrix} \hat{y}_j \\ \hat{x}_j \end{bmatrix}. \end{aligned}$$
(4.3)

It can be verified that

$$\begin{aligned} \rho (UW_1^{-1}\hat{x}_j,VW_2^{-1}\hat{y}_j)=\mu _j\quad \hbox {for} j=1,\ldots ,\ell . \end{aligned}$$

Naturally, according to Algorithm 2.1, we take \(\lambda _j\approx \mu _j\) and the corresponding approximate eigenvectors of \(H-\lambda E\) as

$$\begin{aligned} \tilde{z}_j\equiv \begin{bmatrix} \tilde{y}_j \\ \tilde{x}_j \end{bmatrix}=\begin{bmatrix} VW_2^{-1}\hat{y}_j \\ UW_1^{-1}\hat{x}_j \end{bmatrix}\quad \hbox {for} j=1,\ldots ,\ell . \end{aligned}$$
(4.4)

In practice, not all of the approximate eigenpairs \((\mu _j,\tilde{z}_j)\) are equally accurate to the same level. Usually the first few pairs are more accurate than the next few.

For the ease of reference, we summarize the above findings into the following theorem.

Theorem 4.1

Let \(\{\mathcal{U}, \mathcal{V}\}\) be a pair of approximate deflating subspaces of \(H-\lambda E\) with \(\dim (\mathcal{U})=\dim (\mathcal{V})=\ell \), and let \(U,\,V\in {\mathbb R}^{n\times \ell }\) be the basis matrices of \(\mathcal{U}\) and \(\mathcal{V}\), respectively. If \(W\mathop {=}\limits ^{{\hbox {def}}}U^{{{\mathrm{T}}}}E_+V\) is nonsingular, then the best approximations to \(\lambda _j\) for \(1\le j\le k\) in the sense of (4.1) are the eigenvalues \(\mu _j\) of \(H_{\hbox {SR}}\) defined in (2.16) with the corresponding approximate eigenvectors given by (4.4), and

$$\begin{aligned} \sum _{j=1}^k\mu _j=\frac{1}{2}\inf _{\begin{array}{c} \widehat{U}^{{{\mathrm{T}}}}E_+\widehat{V}=I_k \\ {{\mathrm{span}}}(\widehat{U})\subseteq \mathcal{U},{{\mathrm{span}}}(\widehat{V})\subseteq \mathcal{V} \end{array} } {{\mathrm{trace}}}(\widehat{U}^{{{\mathrm{T}}}}K\widehat{U}+\widehat{V}^{{{\mathrm{T}}}}M\widehat{V}). \end{aligned}$$

The next theorem turns the eigenvalue problem of \(H_{\hbox {SR}}\) into a generalized eigenvalue problem of the same kind as \(H-\lambda E\). We omit its proof because of its simplicity.

Theorem 4.2

Let \(U\in {\mathbb R}^{n\times k}\) and \(V\in {\mathbb R}^{n\times k}\) such that \(W\mathop {=}\limits ^{{\hbox {def}}}U^{{{\mathrm{T}}}}E_+V\) is nonsingular, and define \(H_{\hbox {SR}}\) as in (2.16). Then the eigenvalues of \(H_{\hbox {SR}}\) are same as those of the matrix pencil

$$\begin{aligned} \begin{bmatrix} U&\\&V \end{bmatrix}^{{{\mathrm{T}}}}(H-\lambda E)\begin{bmatrix} V&\\&U \end{bmatrix} =\begin{bmatrix}&U^{{{\mathrm{T}}}}KU \\ V^{{{\mathrm{T}}}}MV&\end{bmatrix}-\lambda \begin{bmatrix} U^{{{\mathrm{T}}}}E_+V&\\&V^{{{\mathrm{T}}}}E_-U \end{bmatrix}, \end{aligned}$$
(4.5)

and the eigenvectors \(\hat{z}\) of \(H_{\hbox {SR}}\) and those \(\check{z}\) of the pencil are related by \(\hat{z}=(W_2\oplus W_1)\check{z}\).

Remark 4.1

The best approximation technique so far is based on the minimization principles in Theorems 3.1 and 3.2. Naturally one may wonder if a similar technique could be devised using the minimization principles in Theorem 3.3 for the original LR eigenvalue problem (1.4). But that seems hard if we seek to project each individual matrices \(A\), \(B\), \(\Delta \), and \(\Sigma \) separately. Alternatively, we may resort to Theorem 4.2 by recasting the projection in (4.5) back to the original LR eigenvalue problem (1.4). The resulting scheme turns out to be the projection idea in [14], where Olsen, Jensen, and Jørgensen [14] were simply aiming at producing a much smaller projected problem of the same kind in the form of (1.4). Note that Theorem 3.3 was not yet known in 1988 and thus it was not possible in [14] to investigate any issue regarding the best possible approximations in the sense of the theorem. What we are doing here is to not only produce a much smaller projected problem of the same kind in form as (1.5) but also make sure the projected problem to give the best possible approximations to the desired eigenvalues. Despite that we seek our projection scheme to achieve multiple goals, the end result is not essentially different from the one in [14]. That is remarkable.

\(\Diamond \)

It turns out the second case (namely \(W\) is singular or \(\ell _1\ne \ell _2\)) is much more complicated, but the conclusion is similar in that the optimization problem (4.1) can still be solved through solving a smaller eigenvalue problem for a projection matrix \(\widehat{H}_{\hbox {SR}}\) to be defined in Appendix A, where Theorem 8.1 similar to Theorem 4.1 will be presented.

Locally optimal 4-D CG algorithms

4-D search

Line search is a common approach in the process of optimizing a function value. For our case, we are interested in solving

$$\begin{aligned} \inf _{x,y }\rho (x,y) =\inf _{x,y }\frac{x^{{{\mathrm{T}}}}Kx+y^{{{\mathrm{T}}}}My}{2|x^{{{\mathrm{T}}}}E_+y|} \end{aligned}$$
(5.1)

in order to compute \(\lambda _1\) and its associated eigenvector of \(H-\lambda E\).

Given a search direction \(\begin{bmatrix} q \\ p \end{bmatrix}\) from the current position \(\begin{bmatrix} y \\ x \end{bmatrix}\), the basic idea of the standard line search is to look for the best possible scalar argument \(t\) on the line

$$\begin{aligned} \left\{ \begin{bmatrix} y \\ x \end{bmatrix}+t\begin{bmatrix} q \\ p \end{bmatrix}\,:\,t\in {\mathbb R}\right\} \end{aligned}$$
(5.2)

to minimize \(\rho \):

$$\begin{aligned} \min _t \rho ( x+ tp, y+ tq). \end{aligned}$$
(5.3)

For the steepest descent method, the search directions \(p\) and \(q\) are the gradients [1] of \(\rho (x,y)\) with respect to \(x\) and \(y\):

$$\begin{aligned} \nabla _x \rho = \frac{1}{x^{{{\mathrm{T}}}}E_+y}\left[ Kx-\rho (x,y)\,E_+y\right] ,\, \nabla _y \rho = \frac{1}{x^{{{\mathrm{T}}}}E_+y}\left[ My-\rho (x,y)\,E_-x\right] . \end{aligned}$$
(5.4)

Note that there is a close relation between these two gradients and the residual:

$$\begin{aligned} H z - \rho (x,y) Ez = x^{{{\mathrm{T}}}}E_+y\,\begin{bmatrix} \nabla _x\rho \\ \nabla _y\rho \end{bmatrix}. \end{aligned}$$
(5.5)

Namely the block vector obtained by stacking \(\nabla _x\rho \) over \(\nabla _y\rho \) is parallel to the residual.

The idea of the dual-channel line search in [4] for the case \(E=I_{2n}\) can be readily extended to solve the minimization problem

$$\begin{aligned} \min _{s,t} \rho ( x+ sp, y+ tq). \end{aligned}$$
(5.6)

It goes as follows: solve (5.6) iteratively by freezing one of \(s\) and \(t\) and minimize the functional \(\rho \) over the other in an alternative manner. Choices of \(p\) and \(q\) in (5.6) include the gradients \(\nabla _x\rho \) and \(\nabla _y\rho \) as well.

However we did not pursue these ideas for the reasons as discussed in [3]. Instead, we look for four scalars \(\alpha \), \(\beta \), \(s\), and \(t\) to minimize \(\rho (\alpha x+sp,\beta y+tq)\). This no longer performs a line or dual search, but a 4-dimensional subspace search:

$$\begin{aligned} \inf _{\alpha , \beta , s,t}\rho (\alpha x+sp,\beta y+tq)=\min _{u\in {{\mathrm{span}}}(U),\,\,v\in {{\mathrm{span}}}(V)}\rho (u,v), \end{aligned}$$
(5.7)

within the 4-dimensional subspace

$$\begin{aligned} \left\{ \begin{bmatrix} \beta y+t q \\ \alpha x+s p \end{bmatrix} \quad \hbox {for all scalars}\,\, \alpha , \beta , s, \hbox {and}\,\, t\right\} , \end{aligned}$$
(5.8)

where \(U=[x,p]\in {\mathbb R}^{n\times 2}\) and \(V=[y,q]\in {\mathbb R}^{n\times 2}\). The right-hand side of (5.7) can be solved by the methods given in section 4 if \(U^{{{\mathrm{T}}}}E_+V\) is nonsingular (the common case) or in Appendix A otherwise.

Algorithms

The minimization principle (3.3)/(3.4), and the one in Theorem 3.2 make it tempting to apply steepest descent (SD) or nonlinear CG algorithms [13] to solve the LR eigenvalue problem. For the case \(\Sigma =I_n\) and \(\Delta =0\) (which corresponds to \(E=I_{2n}\)), such applications had been attempted in [10, 12] to solve the LR eigenvalue problem (1.2). Conceivably when only one eigenvalue and its associated eigenvector are requested, it matters little, if any, to apply CG to \(\rho (x,y)\) based on (3.3) for the eigenvalue problem (1.5) or to \(\varrho (u,v)\) based on (3.4) for the original eigenvalue problem (1.4). But it is a very different story if more than one eigenpair are requested, in which case block algorithms are better options. As in [3] which is for the case \(E=I_{2n}\), we will present locally optimal 4-D CG algorithms for the current case, based on the minimization principle (3.5) and the Cauchy-like interlacing inequalities in Theorem 3.4. This is Algorithm 5.1 below, collectively called the Locally Optimal Block Preconditioned 4-D CG Algorithm (LOBP4DCG), where \(k=1\) or \(k>1\) corresponds to a no-block or block version, and

$$\begin{aligned} \Phi =\begin{bmatrix} 0&I_n \\ I_n&0 \end{bmatrix} \end{aligned}$$
(5.9)

or some nontrivial ones corresponds to a no-preconditioned or preconditioned version.

Algorithm 5.1

The locally optimal 4-D CG algorithms:

figurea

Most comments we made for [3, Algorithm 4.1] there apply here (see also [1]). But we will briefly discuss the choosing of a preconditioner \(\Phi \). Taking \(\Phi \) as in (5.9) means no preconditioner. In general, a generic preconditioner to compute the eigenvalues of \(H-\lambda E\) near a prescribed point \(\mu \) is

$$\begin{aligned} \Phi = (H-\mu E)^{-1}. \end{aligned}$$

When \(\mu \) is closer to the desired eigenvalues than any others, the preconditioned directions should have “larger” components in the desired eigenvectors than the ones obtained without preconditioning. Since we are particularly interested in the smallest eigenvalues with the positive sign, \(\mu =0\) is often an obvious choice. Then

$$\begin{aligned} \Phi \begin{bmatrix} \nabla _x\rho \\ \nabla _y\rho \end{bmatrix} = \begin{bmatrix} 0&M^{-1} \\ K^{-1}&0 \end{bmatrix} \begin{bmatrix} \nabla _x\rho \\ \nabla _y\rho \end{bmatrix} =\begin{bmatrix} M^{-1}\nabla _y\rho \\ K^{-1}\nabla _x\rho \end{bmatrix} \equiv \begin{bmatrix} q \\ p \end{bmatrix}. \end{aligned}$$
(5.10)

In this case, both \(p\) and \(q\) can be computed by using the conjugate gradient method [6, 8].

Numerical results

In this section, we present some numerical results to illustrate the essential convergence behaviors of locally optimal 4-D CG algorithms in Sect. 5. The matrices \(K,M\succ 0\) in the LR problem (1.5) are chosen from [5], and \(E_+\) is a sparse random matrix \(E_+\). Specifically, \(n=3,600\), \(K\) is bcsstk21, and \(M\) is the \(n\times n\) leading principle matrix of sts4098, \(E_+=\mathtt{sprandn(n,n,0.1)}\) in MATLAB. Both \(K\) and \(M\) are first symmetrically permuted through MATLAB’s symamd (symmetric approximate minimum degree permutation) in attempt to reduce the numbers of fill-ins in their respective incomplete Cholesky decompositions.

Our goal is to compute \(4\) smallest positive eigenvalues \(0 < \lambda _1 < \lambda _2 < \lambda _3 < \lambda _4\) and corresponding eigenvectors \(z_1, z_2, z_3, z_4\) of \(H-\lambda E\). The initial approximate eigenvectors of \(z_i\) are randomly chosen. Two different preconditioners are used to approximate

$$\begin{aligned} H^{-1}=\begin{bmatrix} 0&M^{-1} \\ K^{-1}&0 \\ \end{bmatrix}. \end{aligned}$$

The first preconditioner \(\Phi _1\) is constructed through incomplete Cholesky decompositions of \(K\) and \(M\):

$$\begin{aligned} \Phi _1=\begin{bmatrix} 0&(R_M^{{{\mathrm{T}}}}R_M)^{-1} \\ (R_K^{{{\mathrm{T}}}}R_K)^{-1}&0 \\ \end{bmatrix}, \end{aligned}$$

where \(R_K\) and \(R_M\) are the incomplete Cholesky decomposition factors, respectively. It turns out that both \(\mathtt{cholinc(K,^{\prime }\!\!0^{\prime })}\) and \(\mathtt{cholinc(M,^{\prime }\!\!0')}\) with no fill-ins do not exists; so we end up using

$$\begin{aligned} R_K=\mathtt{cholinc(K,tol)}, \quad R_M=\mathtt{cholinc(K,tol)} \end{aligned}$$
(6.1)

with a tolerance \(\mathtt{tol}\). Among various tol we tested, we found that for \(\mathtt{tol}=10^{-4}\) or smaller, \(\Phi _1\) works very well, but not so for \(\mathtt{tol}=10^{-3}\) or bigger. In the reported results below, \(\mathtt{tol}=10^{-4}\).

The second preconditioner \(\Phi _2\) is via applying \(H^{-1}\) approximately by calculating the preconditioned vectors \(p\) and \(q\) as in (5.10) by the preconditioned linear CG method [6, 8] with stopping tolerance \(10^{-2}\) on the associated normalized residual norms or maximum 20 iterations. The preconditioners for calculating \(p\) and \(q\) are \((R_K^{{{\mathrm{T}}}}R_K)^{-1}\) and \((R_M^{{{\mathrm{T}}}}R_M)^{-1}\), respectively, with again \(R_K\) and \(R_M\) as given by (6.1). Note both \(K\) and \(M\) are very ill-conditioned: \(\kappa (K)=4.5\cdot 10^7\) and \(\kappa (M)=4.3\cdot 10^8\). The plain (i.e., without a suitable preconditioner) linear CG iteration for computing \(p\) and \(q\) converges extremely slowly. But the preconditioners \((R_K^{{{\mathrm{T}}}}R_K)^{-1}\) and \((R_M^{{{\mathrm{T}}}}R_M)^{-1}\) with modest fill-in tolerance \(10^{-3}\) are sufficient for the linear CG iteration.

Note that \(\Phi _1\) can be regarded as a \(\Phi _2\) with using just one step of the linear CG to compute the preconditioned vectors \(p\) and \(q\). This explains why a smaller tol in (6.1) is needed for constructing \(\Phi _1\), while a larger tol in (6.1) for constructing \(\Phi _2\) is fine so long as the associated linear systems are solved with adequate accuracy (recall the stopping tolerance \(10^{-2}\)).

Figure 1 shows the normalized residual norms of a MATLAB implementation of Algorithm 5.1 with the preconditioners \(\Phi _1\) and \(\Phi _2\). The normalized residual norms for the \(j\)th approximate eigenpair \((\lambda ^{(i)}_j, z^{(i)}_j)\) at the \(i\)th iterative step are defined by

$$\begin{aligned} \frac{ \Vert H z^{(i)}_j - \lambda ^{(i)}_j Ez^{(i)}_j \Vert _1}{ (\Vert H\Vert _1 + \lambda ^{(i)}_j\Vert E\Vert _1) \Vert z^{(i)}_j\Vert _1}, \end{aligned}$$

where \(\Vert \cdot \Vert _1\) is the \(\ell _1\)-norm of a vector or the \(\ell _1\)-operator norm of a matrix. We observe rather steady convergence towards the desired 4 eigenpairs. Other examples we have run but not reported here show similar behavior.

Fig. 1
figure1

The convergence behaviors of the locally optimal block 4-D preconditioned CG algorithms for computing the 4 smallest positive eigenvalues of an made-up LR problem.

Concluding remarks

We have presented minimization principles and Cauchy-like interlacing inequalities for the generalized LR eigenvalue problem. These new results mirror the three well-known results for the eigenvalue problem of a real symmetric matrix, and enable us to devise new efficient numerical methods for computing the first few smallest eigenvalues with the positive sign and corresponding eigenvectors simultaneously.

Although, throughout this paper, it is assumed \(K\), \(M\), and \(E_{\pm }\) are real matrices, all results are valid for Hermitian positive semi-definite \(K\) and \(M\) with one of them being definite after minor changes: replacing all \({\mathbb R}\) by \({\mathbb C}\) and all superscripts \((\cdot )^{{{\mathrm{T}}}}\) by complex conjugate transposes \((\cdot )^{{{\mathrm{H}}}}\).

The numerical results in Sect. 6 demonstrates the effectiveness of the new algorithms. Although they are for an artificial generalized LR problem. we argue that its numerical behavior is rather suggestive. In the future, we would like to test the proposed method on realistic LR eigenvalue problems arising from the excited states calucation in computational quantum physics [14].

Notes

  1. 1.

    This condition is equivalent to that both \(A\pm B\) are positive definite. In [2, 3] and this article, we focus on very much this case, except that one of \(A\pm B\) is allowed to be positive semi-definite.

  2. 2.

    It suffices to assume one of \(E_{\pm }\) is nonsingular since \(E_{\pm }^{{{\mathrm{T}}}}=E_{\mp }\).

  3. 3.

    A similar statement for the case in which \(K\succ 0\) but \(M\succeq 0\) can be made, noting that the decompositions in (2.7) no longer hold but similar decompositions exist.

  4. 4.

    How this factorization is done is not essential mathematically. But it is included to accommodate cases when such a factorization may offer certain conveniences. In general, simply taking \(W_1=W^{{{\mathrm{T}}}}\) and \(W_2=I_{\ell }\) or \(W_1=I_{\ell }\) and \(W_2=W\) may be sufficient.

  5. 5.

    Computationally, this can be realized by the QR decompositions of \(W_i^{{{\mathrm{T}}}}\). For more generality in presentation, we do not assume that they have to be QR decompositions.

References

  1. 1.

    Bai, Z., Li, R.C.: Minimization principle for linear response eigenvalue problem iii: general case. Technical Report 2013–01, Department of Mathematics, University of Texas at Arlington (2011). Available at http://www.uta.edu/math/preprint/

  2. 2.

    Bai, Z., Li, R.C.: Minimization principles for the linear response eigenvalue problem I: theory. SIAM J. Matrix Anal. Appl. 33(4), 1075–1100 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  3. 3.

    Bai, Z., Li, R.C.: Minimization principles for linear response eigenvalue problem II: Computation. SIAM J. Matrix Anal. Appl. 44(2), 392–416 (2013)

    Google Scholar 

  4. 4.

    Challacombe, M.: Linear scaling solution of the time-dependent self-consisten-field equations. e-print arXiv:1001.2586v2 (2010)

  5. 5.

    Davis, T., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)

    MathSciNet  Google Scholar 

  6. 6.

    Demmel, J.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)

  7. 7.

    Flaschka, U., Lin, W.W., Wu, J.L.: A KQZ algorithm for solving linear-response eigenvalue equations. Linear Algebra Appl. 165, 93–123 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  8. 8.

    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)

    Google Scholar 

  9. 9.

    Grüning, M., Marini, A., Gonze, X.: Exciton-plasmon states in nanoscale materials: breakdown of the Tamm-Dancoff approximation. Nano Lett. 9, 2820–2824 (2009)

    Article  Google Scholar 

  10. 10.

    Lucero, M.J., Niklasson, A.M.N., Tretiak, S., Challacombe, M.: Molecular-orbital-free algorithm for excited states in time-dependent perturbation theory. J. Chem. Phys. 129(6), 64–114 (2008)

    Article  Google Scholar 

  11. 11.

    Mehl, C., Mehrmann, V., Xu, H.: On doubly structured matrices and pencils that arise in linear response theory. Linear Algebra Appl. 380, 3–51 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  12. 12.

    Muta, A., Iwata, J.I., Hashimoto, Y., Yabana, K.: Solving the RPA eigenvalue equation in real-space. Prog. Theor. Phys. 108(6), 1065–1076 (2002)

    Article  MATH  Google Scholar 

  13. 13.

    Nocedal, J., Wright, S.: Numerical Optimization, 2nd edn. Springer, New York (2006)

    Google Scholar 

  14. 14.

    Olsen, J., Jensen, H.J.A., Jørgensen, P.: Solution of the large matrix equations which occur in response theory. J. Comput. Phys. 74(2), 265–282 (1988)

    Article  MATH  Google Scholar 

  15. 15.

    Olsen, J., Jorgensen, P.: Linear and nonlinear response functions for an exact state and for an MCSCF state. J. Chem. Phys. 82(7), 3235–3264 (1985)

    Article  Google Scholar 

  16. 16.

    Ring, P., Schuck, P.: The nuclear many-body problem. Springer, New York (1980)

    Google Scholar 

  17. 17.

    Rocca, D., Bai, Z., Li, R.C., Galli, G.: A block variational procedure for the iterative diagonalization of non-Hermitian random-phase approximation matrices. J. Chem. Phys. 136, 034–111 (2012)

    Article  Google Scholar 

  18. 18.

    Stratmann, R.E., Scuseria, G.E., Frisch, M.J.: An efficient implementation of time-dependent density-functional theory for the calculation of excitation of large molecules. J. Chem. Phys. 109, 8218–8824 (1998)

    Article  Google Scholar 

  19. 19.

    Thouless, D.J.: Vibrational states of nuclei in the random phase approximation. Nucl. Phys. 22(1), 78–95 (1961)

    Article  MATH  MathSciNet  Google Scholar 

  20. 20.

    Thouless, D.J.: The Quantum Mechanics of Many-Body Systems. Academic Press, New York (1972)

  21. 21.

    Tsiper, E.V.: Variational procedure and generalized Lanczos recursion for small-amplitude classical oscillations. JETP Lett. 70(11), 751–755 (1999)

    Article  Google Scholar 

Download references

Acknowledgments

We thank the referees for valuable comments and suggestions to improve the presentation of the paper Bai is supported in part by NSF grants DMR-1035468 and DMS-1115817. Li is supported in part by NSF grant DMS-1115834.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhaojun Bai.

Additional information

Dedicated to Professor Axel Ruhe on the occasion of his 70th birthday.

Communicated by Peter Benner.

Appendix: Best approximations: the singular/unequal dimension case

Appendix: Best approximations: the singular/unequal dimension case

This appendix continues the investigation in Sect. 4 to seek best approximate eigenpairs of \(H-\lambda E\) for given \(\{\mathcal{U}, \mathcal{V}\}\), a pair of approximate deflating subspaces of \(H-\lambda E\) with \(\dim (\mathcal{U})=\ell _1\) and \(\dim (\mathcal{V})=\ell _2\). In Sect. 4, we have treated the case in which \(\ell _1=\ell _2\) and \(W \mathop {=}\limits ^{{\hbox {def}}}U^{{{\mathrm{T}}}}E_+V\) is nonsingular, where \(U\in {\mathbb R}^{n\times \ell _1},\,V\in {\mathbb R}^{n\times \ell _2}\) are the basis matrices of \(\mathcal{U}\) and \(\mathcal{V}\), respectively. In what follows, we will focus on the general case: \(\ell _1\) and \(\ell _2\) are not necessarily equal or \(W\) may be singular.

The case is much more complicated than the one in section 4, but it can be handled in the similar way as in [3] which is for \(E=I_{2n}\). So we will simply summarize the results and the reader is referred to [1, Appendix A] for detail.

Factorize

$$\begin{aligned} W=W_1^{{{\mathrm{T}}}}W_2, \quad W_i\in {\mathbb R}^{r\times \ell _i},\quad r={{\mathrm{rank}}}(W)\le \min _i\ell _i. \end{aligned}$$
(8.1)

Both \(W_i\) have full row rank. FactorizeFootnote 5

$$\begin{aligned} W_i^{{{\mathrm{T}}}}=Q_i\begin{bmatrix} R_i \\ 0 \end{bmatrix} \quad \hbox {for} i=1,2, \end{aligned}$$
(8.2)

where \(R_i\in {\mathbb R}^{r\times r},\,Q_i\in {\mathbb R}^{\ell _i\times \ell _i}\) (\(i=1,2\)) are nonsingular. Partition

Set

$$\begin{aligned} \widehat{H}_{\hbox {SR}}=\begin{bmatrix} 0&R_1^{-1}\mathcal {K}_{11}R_1^{-{{\mathrm{T}}}} \\ R_2^{-1}\mathcal {M}_{11}R_2^{-{{\mathrm{T}}}}&0 \end{bmatrix}\in {\mathbb R}^{2r\times 2r}, \end{aligned}$$
(8.4)

where \(K_{22}^{\dagger }\) and \(M_{22}^{\dagger }\) are the Moore-Penrose inverses of \(K_{22}\) and \(M_{22}\), respectively, and

$$\begin{aligned} \mathcal {K}_{11} =K_{11}-K_{12}K_{22}^{\dagger }K_{12}^{{{\mathrm{H}}}}, \quad \mathcal {M}_{11} =M_{11}-M_{12}M_{22}^{\dagger }M_{12}^{{{\mathrm{H}}}}. \end{aligned}$$
(8.5)

Denote by \(\mu _j\) for \(j=1,\ldots ,r\) the eigenvalues with the positive sign of \(\widehat{H}_{\hbox {SR}}\) in the ascending order and by \(\hat{z}_j\) the associated eigenvectors:

$$\begin{aligned} \widehat{H}_{\hbox {SR}}\hat{z}_j=\mu _j\hat{z}_j, \quad \hat{z}_j=\begin{bmatrix} \hat{y}_j \\ \hat{x}_j \end{bmatrix}. \end{aligned}$$
(8.6)

It can be verified that \( \rho (\tilde{x}_j,\tilde{y}_j)=\mu _j\quad \hbox {for} j=1,\ldots ,r, \) where

$$\begin{aligned} \tilde{x}_j=UQ_1^{-{{\mathrm{T}}}}\begin{bmatrix} R_1^{-{{\mathrm{T}}}}\hat{x}_j \\ u_j \end{bmatrix} , \quad \tilde{y}_j=VQ_2^{-{{\mathrm{T}}}}\begin{bmatrix} R_2^{-{{\mathrm{T}}}}\hat{y}_j \\ v_j \end{bmatrix} \end{aligned}$$
(8.7)

for any \(u_j\) and \(v_j\) satisfying

$$\begin{aligned} K_{22}u_j=-K_{12}^{{{\mathrm{T}}}}R_1^{-{{\mathrm{T}}}}\hat{x}_j, \quad M_{22}v_j=-M_{12}^{{{\mathrm{T}}}}R_2^{-{{\mathrm{T}}}}\hat{y}_j. \end{aligned}$$
(8.8)

Naturally the approximate eigenvectors of \(H-\lambda E\) should be taken as

$$\begin{aligned} \tilde{z}_j=\begin{bmatrix} \tilde{y}_j \\ \tilde{x}_j \end{bmatrix}\quad \hbox {for} j=1,\ldots ,r. \end{aligned}$$
(8.9)

Theorem 8.1

Let \(\{\mathcal{U}, \mathcal{V}\}\) be a pair of approximate deflating subspaces of \(H-\lambda E\) with \(\dim (\mathcal{U})=\ell _1\) and \(\dim (\mathcal{V})=\ell _2\), and let \(U\in {\mathbb R}^{n\times \ell _1},\,V\in {\mathbb R}^{n\times \ell _2}\) be the basis matrices of \(\mathcal{U}\) and \(\mathcal{V}\), respectively. Let \(\widehat{H}_{\hbox {SR}}\) be defined by (8.4). Then the best approximations to \(\lambda _j\) for \(1\le j\le k\) in the sense of (4.1) are the corresponding eigenvalues of \(\widehat{H}_{\hbox {SR}}\), with the corresponding approximate eigenvectors given by (8.7)–(8.9).

Despite much more complicated appearance of \(\widehat{H}_{\hbox {SR}}\) compared to \(H_{\hbox {SR}}\) in Sect. 4, our next theorem surprisingly unifies both.

Theorem 8.2

The eigenvalues of \(\widehat{H}_{\hbox {SR}}\) in (8.4) are the same as the finite eigenvalues of

$$\begin{aligned} \check{H}-\lambda \check{E}:&=\begin{bmatrix} U&0\\ 0&V \end{bmatrix}^{{{\mathrm{T}}}}(H-\lambda E)\begin{bmatrix} V&0\\ 0&U \end{bmatrix} \\&=\begin{bmatrix} 0&U^{{{\mathrm{T}}}}KU \\ V^{{{\mathrm{T}}}}MV&0 \end{bmatrix}-\lambda \begin{bmatrix} U^{{{\mathrm{T}}}}E_+V&\\&V^{{{\mathrm{T}}}}E_-U \end{bmatrix} \nonumber \end{aligned}$$
(8.10)

and the eigenvector \(\hat{z}=\begin{bmatrix} \hat{y} \\ \hat{x} \end{bmatrix}\) of \(\widehat{H}_{\hbox {SR}}\) and the eigenvector \(\check{z}=\begin{bmatrix} \check{y} \\ \check{x} \end{bmatrix}\) of the pencil (8.10) associated with a finite eigenvalue are related by

$$\begin{aligned} \check{x}=Q_1^{-{{\mathrm{T}}}}\begin{bmatrix} R_1^{-{{\mathrm{T}}}}\hat{x} \\ -K_{22}^{\dagger }K_{12}^{{{\mathrm{T}}}}R_1^{-{{\mathrm{T}}}}\hat{x}+g \end{bmatrix}, \quad \check{y}=Q_2^{-{{\mathrm{T}}}}\begin{bmatrix} R_2^{-{{\mathrm{T}}}}\hat{y} \\ -M_{22}^{\dagger }M_{12}^{{{\mathrm{T}}}}R_2^{-{{\mathrm{T}}}}\hat{y}+h \end{bmatrix}, \end{aligned}$$
(8.11)

where \(g\) is any vector in the kernel of \(K_{22}\) and \(h\) is any vector in the kernel of \(M_{22}\). In particluar, if \(\ell _1=\ell _2=r\), the relation in (8.11) is simplified to \(\hat{z}=(W_2\oplus W_1)\check{z}\) as in Theorem 4.2.

Proof

Let \(P_i=Q_i^{-{{\mathrm{T}}}}(R_i^{-{{\mathrm{T}}}}\oplus I_{\ell _i-r})\) for \(i=1,2\) and both are nonsingular. It can be verified that

$$\begin{aligned} (P_1\oplus P_2)^{{{\mathrm{T}}}}(\check{H}-\lambda \check{E})(P_2\oplus P_1) =\begin{bmatrix} 0&\widehat{K} \\ \widehat{M}&0 \end{bmatrix}-\lambda \begin{bmatrix} \widehat{I}&\\ 0&\widehat{I}^{\,{{\mathrm{T}}}} \end{bmatrix}, \end{aligned}$$

where

$$\begin{aligned} \widehat{M}&=\begin{bmatrix} R_2^{-1}&\\&I_{\ell _2-r} \end{bmatrix} \begin{bmatrix} M_{11}&M_{12} \\ M_{12}^{{{\mathrm{T}}}}&M_{22} \end{bmatrix} \begin{bmatrix} R_2^{-{{\mathrm{T}}}}&\\&I_{\ell _2-r} \end{bmatrix}, \end{aligned}$$
(8.12)
$$\begin{aligned} \widehat{K}&=\begin{bmatrix} R_1^{-1}&\\&I_{\ell _1-r} \end{bmatrix} \begin{bmatrix} K_{11}&K_{12} \\ K_{12}^{{{\mathrm{T}}}}&K_{22} \end{bmatrix} \begin{bmatrix} R_1^{-{{\mathrm{T}}}}&\\&I_{\ell _1-r} \end{bmatrix}, \end{aligned}$$
(8.13)
$$\begin{aligned} \widehat{I}&=\begin{bmatrix} I_r&\\&0 \end{bmatrix}\in {\mathbb R}^{\ell _1\times \ell _2}, \end{aligned}$$
(8.14)

and \(K_{ij}\) and \(M_{ij}\) are defined by 8.3. Since \(K\) and \(M\) are positive (semi)definite, we have \({{\mathrm{span}}}(K_{12}^{{{\mathrm{T}}}})\subseteq {{\mathrm{span}}}(K_{22})\) and \({{\mathrm{span}}}(M_{12}^{{{\mathrm{T}}}})\subseteq {{\mathrm{span}}}(M_{22})\) and consequently

$$\begin{aligned} K_{22}K_{22}^{\dagger }K_{12}^{{{\mathrm{T}}}}=K_{12}^{{{\mathrm{T}}}}, \quad M_{22}M_{22}^{\dagger }M_{12}^{{{\mathrm{T}}}}=M_{12}^{{{\mathrm{T}}}}. \end{aligned}$$
(8.15)

Let

$$\begin{aligned} Z_1=\begin{bmatrix} I_r&0 \\ -K_{22}^{\dagger }K_{12}^{{{\mathrm{T}}}}R_1^{-{{\mathrm{T}}}}&I_{\ell _1-r} \end{bmatrix}, \quad Z_2=\begin{bmatrix} I_r&0 \\ -M_{22}^{\dagger }M_{12}^{{{\mathrm{T}}}}R_2^{-{{\mathrm{T}}}}&I_{\ell _2-r} \end{bmatrix}. \end{aligned}$$

It can be verified that \(Z_1^{{{\mathrm{T}}}}\widehat{I} Z_2=\widehat{I}\) and, after using (8.15),

$$\begin{aligned} Z_1^{{{\mathrm{T}}}}\widehat{K} Z_1=\begin{bmatrix} R_1^{-1}\mathcal {K}_{11} R_1^{-{{\mathrm{T}}}}&0 \\ 0&K_{22} \end{bmatrix}, \quad Z_2^{{{\mathrm{T}}}}\widehat{M} Z_2=\begin{bmatrix} R_2^{-1}\mathcal {M}_{11} R_2^{-{{\mathrm{T}}}}&0 \\ 0&M_{22} \end{bmatrix}, \end{aligned}$$

where \(\mathcal {K}_{11}\) and \(\mathcal {M}_{11}\) are defined in (8.5). Hence \((P_1Z_1\oplus P_2Z_2)^{{{\mathrm{T}}}}(\check{H}-\lambda \check{E})(P_2Z_2\oplus P_1Z_1)\) is

whose finite eigenvalues are the eigenvalues of

$$\begin{aligned} \begin{bmatrix} 0&R_1^{-1}\mathcal {K}_{11} R_1^{-{{\mathrm{T}}}} \\ R_2^{-1}\mathcal {M}_{11} R_2^{-{{\mathrm{T}}}}&0 \end{bmatrix}-\lambda I_{2r}=\widehat{H}_{\hbox {SR}}-\lambda I_{2r}. \end{aligned}$$
(8.17)

Now we turn to look for the eigenvector relation. Given an eigenvector \(\hat{z}=\begin{bmatrix} \hat{y} \\ \hat{x} \end{bmatrix}\) of \(\widehat{H}_{\hbox {SR}}\), we conclude by comparing (8.16) and (8.17) that the corresponding eigenvector of the matrix pencil (8.16) is

$$\begin{aligned} \begin{bmatrix} \hat{y} \\ h \\ \hat{x} \\ g \end{bmatrix}, \end{aligned}$$

where \(g\) is any vector in the kernel of \(K_{22}\) and \(h\) is any vector in the kernel of \(M_{22}\). Therefore the corresponding eigenvector \(\check{z}=\begin{bmatrix} \check{y} \\ \check{x} \end{bmatrix}\) of \(\check{H}-\lambda \check{E}\) is given by

$$\begin{aligned} \check{x}=P_1Z_1\begin{bmatrix} \hat{x} \\ g \end{bmatrix}, \quad \check{y}=P_2Z_2\begin{bmatrix} \hat{y} \\ h \end{bmatrix} \end{aligned}$$

which, after simplification, yields (8.11). \(\square \)

The next theorem says that there are Cauchy-like interlacing inequalities for \(\widehat{H}_{\hbox {SR}}\), too. We omit its proof because its similarity to [3, Theorem 8.3] (see also [1, Appendix A]).

Theorem 8.3

Assume the conditions of Theorem 8.1. Then

$$\begin{aligned} \lambda _i\le \mu _i\le \,\lambda _{i+2n-(\ell _1+\ell _2)}\quad \hbox {for} 1\le i\le r, \end{aligned}$$
(8.18)

where \(\lambda _{i+2n-(\ell _1+\ell _2)}=\infty \) if \(i+2n-(\ell _1+\ell _2)>n\).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bai, Z., Li, RC. Minimization principles and computation for the generalized linear response eigenvalue problem. Bit Numer Math 54, 31–54 (2014). https://doi.org/10.1007/s10543-014-0472-6

Download citation

Keywords

  • Eigenvalue
  • Eigenvector
  • Minimization principle
  • Conjugate gradient
  • Linear response

Mathematics Subject Classification (2000)

  • Primary 65L15
  • Secondary 15A18
  • 81Q15