## Abstract

We present an efficient way to solve the Bethe–Salpeter equation (BSE), a method for the computation of optical absorption spectra in molecules and solids that includes electron–hole interactions. Standard approaches to construct and diagonalize the Bethe–Salpeter Hamiltonian require at least \(\mathcal {O}(N_e^5)\) operations, where \(N_e\) is the number of electrons in the system, limiting its application to smaller systems. Our approach is based on the interpolative separable density fitting (ISDF) technique to construct low rank approximations to the bare exchange and screened direct operators associated with the BSE Hamiltonian. This approach reduces the complexity of the Hamiltonian construction to \(\mathcal {O}(N_e^3)\) with a much smaller pre-constant, and allows for a faster solution of the BSE. Here, we implement the ISDF method for BSE calculations within the Tamm–Dancoff approximation (TDA) in the BerkeleyGW software package. We show that this novel approach accurately reproduces exciton energies and optical absorption spectra in molecules and solids with a significantly reduced computational cost.

### Keywords

- Lowest Energy Exciton
- Optical Absorption Spectra
- Bethe-Salpeter Equation (BSE)
- Hamiltonian Construction
- Tamm-Dancoff Approximation (TDA)

*These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.*

Download conference paper PDF

## 1 Introduction

Many-Body Perturbation Theory is a powerful tool to describe one-particle and two-particle excitations and to obtain exciton energies and absorption spectra in molecules and solids. In particular, Hedin’s GW approximation [9] has been successfully used to compute quasi-particle (one-particle) excitation energies [11]. However, the Bethe–Salpeter equation (BSE) [23] is further needed to describe the excitations of an electron–hole pair (a two-particle excitation) in optical absorption in molecules and solids [22] and is often necessary to obtain a good agreement between theory and experiment. Solving the BSE problem requires constructing and diagonalizing a structured matrix Hamiltonian. In the context of optical absorption, the eigenvalues are the exciton energies and the corresponding eigenfunctions yield the exciton wavefunctions.

The Bethe–Salpeter Hamiltonian (BSH) consists of bare exchange and screened direct interaction kernels that depend on single-particle orbitals obtained from a quasiparticle (usually at the GW level) or mean-field calculation. The evaluation of these kernels requires at least \(\mathcal {O}(N_e^5)\) operations in a conventional approach, which is very costly for large systems that contain hundreds or thousands of atoms. Recent efforts have actively explored methods to generate a reduced basis set, in order to decrease the high computational cost of BSE calculations [1, 12, 16, 19, 21].

In this paper, we present an efficient way to construct the BSH, which, when coupled to an iterative diagonalization scheme, allows for an efficient solution of the BSE. Our approach is based on the recently-developed Interpolative Separable Density Fitting (ISDF) decomposition [18]. The ISDF decomposition has been applied to accelerate a number of applications in computational chemistry and materials science, including the computation of two-electrons integrals [18], correlation energy in the random phase approximation [17], density functional perturbation theory [15], and hybrid density functional calculations [10]. In this scheme, a matrix consisting of products of single-particle orbital pairs is approximated as the product between a matrix built with a small number of auxiliary basis vectors and an expansion coefficient matrix [10]. This decomposition effectively allows us to construct low-rank approximations to the bare exchange and screened direct kernels. The construction of the ISDF-compressed BSE Hamiltonian matrix only requires \(\mathcal {O}(N_{e}^3)\) operations when the rank of the numerical auxiliary basis is kept at \(\mathcal {O}(N_{e})\) and when the kernels are kept in a low-rank factored form, resulting in considerably faster computation than the \(\mathcal {O}(N_{e}^5)\) complexity required in a conventional approach. By keeping the interaction kernel in a decomposed form, the matrix–vector multiplications required in the iterative diagonalization procedures of the Hamiltonian \(H_\text {BSE}\) can be performed efficiently. We can further use these efficient matrix–vector multiplications in a structure preserving Lanczos algorithm [24] to obtain an approximate absorption spectrum without an explicit diagonalization of the approximate \(H_\text {BSE}\). We have implemented the ISDF-based BSH construction in the BerkeleyGW software package [4], and verified that this approach can reproduce accurate exciton energies and optical absorption spectra for molecules and solids, while significant reducing the computational cost associated with the construction of the BSE Hamiltonian.

## 2 Bethe–Salpeter Equation

The Bethe–Salpeter equation is an eigenvalue problem of the form

where *X* is the exciton wavefunction, *E* the corresponding exciton energy. The Bethe–Salpeter Hamiltonian \(H_\text {BSE}\) has the following block structure

where \(D(i_vi_c,j_vj_c) = (\epsilon _{i_c} - \epsilon _{i_v})\delta _{i_vj_c}\delta _{i_cj_c}\) is an \((N_vN_c)\times (N_vN_c)\) diagonal matrix with \(-\epsilon _{i_v}\), \(i_v=1,2,\cdots ,N_v\) the quasi-particle energies associated with valence bands and \(\epsilon _{i_c}\), \(i_c=N_v+1,N_v+2,\cdots ,N_v+N_c\) the quasi-particle energies associated with conduction bands. These quasi-particle energies are typically obtained from a GW calculation [22]. The \(V_A\) and \(V_B\) matrices represent the bare *exchange* interaction of electron–hole pairs, and the \(W_A\) and \(W_B\) matrices are referred to as the screened *direct* interaction of electron–hole pairs. These matrices are defined as follows:

where \(\psi _{i_{v}}\) and \(\psi _{i_{c}}\) are the valence and conduction single-particle orbitals typically obtained from a Kohn–Sham density functional theory (KSDFT) calculation respectively, and \(V(\mathbf {r},\mathbf {r'})\) and \(W(\mathbf {r},\mathbf {r'})\) are the bare and screened Coulomb interactions. Both \(V_A\) and \(W_A\) are Hermitian, whereas \(V_B\) and \(W_B\) are complex symmetric. Within the so-called Tamm–Dancoff approximation (TDA) [20], both \(V_B\) and \(W_B\) are neglected in Eq. (2). In this case, the \(H_\text {BSE}\) becomes Hermitian and we can focus on computing the upper left block of \(H_\text {BSE}\).

Let \(M_{cc}(\mathbf {r}) = \{\psi _{i_c}\bar{\psi }_{j_c}\}\), \(M_{vc}(\mathbf {r}) = \{\psi _{i_c}\bar{\psi }_{i_v}\}\), and \(M_{vv}(\mathbf {r}) = \{\psi _{i_v}\bar{\psi }_{j_v}\}\) be matrices built as the product between orbital pairs in real space, and \(\hat{M}_{cc}(\mathbf {G})\), \(\hat{M}_{vc}(\mathbf {G})\), \(\hat{M}_{vv}(\mathbf {G})\) be the reciprocal space representation of these matrices. Equations (3) can then be written succinctly as

where \(\hat{V}\) and \(\hat{W}\) are reciprocal space representations of the operators *V* and *W* respectively, and the reshape function is used to map the \((i_cj_c,i_vj_v)\)th element on the right-hand side of (4) to the \((i_ci_v,j_cj_v)\)th element of \(W_A\). While in this paper we will focus, for simplicity, on the TDA model, we note that a similar set of equations can be derived for \(V_B\) and \(W_B\).

The reason to compute the right-hand sides of (4) in the reciprocal space is that \(\hat{V}\) is diagonal and an energy cutoff is often adopted to limit the number of the Fourier components of \(\psi _i\). As a result, the leading dimension of \(\hat{M}_{cc}\), \(\hat{M}_{vc}\) and \(\hat{M}_{cc}\), denoted by \(N_g\), is often much smaller than that of \(M_{cc}\), \(M_{vc}\) and \(M_{vv}\), which we denote by \(N_r\).

In addition to performing \(\mathcal {O}(N_e^2)\) Fast Fourier transforms (FFTs) to obtain \(\hat{M}_{cc}\), \(\hat{M}_{vc}\) and \(\hat{M}_{vv}\) from \(M_{cc}\), \(M_{vc}\) and \(M_{vv}\), respectively, we need to perform at least \(\mathcal {O}(N_gN_c^2N_v^2)\) floating-point operations to obtain \(V_A\) and \(W_A\) using matrix–matrix multiplications.

Note that, in order to achieve high accuracy with a large basis set, such as that of plane-waves, \(N_g\) is typically much larger than \(N_c\) or \(N_v\). The number of occupied bands is either \(N_{e}\) or \(N_e/2\) depending on how spin is counted. The number of conduction bands \(N_{c}\) included in the calculation is typically a small multiple of \(N_{v}\) (the precise number being a free parameter to be converged), whereas \(N_g\) is often as large as \(100{-}10000\times N_e\) (\(N_r \sim 10 \times N_g\)).

## 3 Interpolative Separable Density Fitting (ISDF) Decomposition

In order to reduce the computational complexity, we seek to minimize the number of integrals in Eq. (3). To this aim, we rewrite the matrix \(M_{ij}\), where the labels *i* and *j* are indices of either valence or conducting orbitals, as the product of a matrix \(\varTheta _{ij}\) that contains a set of \(N_{ij}^t\) linearly independent auxiliary basis vectors with \(N_{ij}^t \approx tN_e \ll \mathcal {O}(N_e^2)\) (*t* is a small constant referred as a rank truncation parameter) [10] and an expansion coefficient matrix \(C_{ij}\). For large problems, the number of columns of \(M_{ij}\) (i.e. \(\mathcal {O}(N_vN_c)\), or \(\mathcal {O}(N_v^2)\), or \(\mathcal {O}(N_c^2)\)) is typically larger than the number of grid points \(N_r\) on which \(\psi _n(\mathbf {r})\) is sampled, i.e., the number of rows in \(M_{ij}\). As a result, \(N_{ij}^t\) is much smaller than the number of columns of \(M_{ij}\). Even when a cutoff is used to limit the size of \(N_c\) or \(N_v\) so that the number of columns in \(M_{ij}\) is much less than \(N_g\), we can still approximate \(M_{ij}\) by \(\varTheta _{ij} C_{ij}\) with a \(\varTheta _{ij}\) that has a smaller rank \(N_{ij}^t \sim t\sqrt{N_iN_j}\).

To simplify our discussion, let us drop the subscript of *M*, \(\varTheta \) and *C* for the moment, and describe the basic idea of ISDF. The optimal low rank approximation of *M* can be obtained from a singular value decomposition. However, the complexity of this decomposition is at least \(\mathcal {O}(N_r^2 N_e^2)\) or \(\mathcal {O}(N_e^4)\). Recently, an alternative decomposition has been developed, which is close to optimal but with a more favorable complexity. This type of decomposition is called Interpolative Separable Density Fitting (ISDF) [10], which we describe below.

In ISDF, instead of computing \(\varTheta \) and *C* simultaneously, we first fix the coefficient matrix *C*, and determine the auxiliary basis matrix \(\varTheta \) by solving a linear least squares problem

where each column of *M* is given by \(\psi _{i}(\mathbf {r})\bar{\psi }_{j}(\mathbf {r})\) sampled on a dense real space grids \(\{\mathbf {r}_{i}\}_{i=1}^{N_r}\), and \(\varTheta = [\zeta _1, \zeta _2, \cdots , \zeta _{N^t}]\) contains the auxiliary basis vectors to be determined, \(\Vert \cdot \Vert _F\) denotes the Frobenius norm.

We choose *C* as a matrix consisting of \(\psi _{i}(\mathbf {r})\bar{\psi }_{j}(\mathbf {r})\) evaluated on a subset of \(N^t\) carefully chosen real space grid points, with \(N^t \ll N_r\) and \(N^t \ll N_e^2\), such that the (*i*, *j*)th column of *C* is given by

The least squares minimizer is given by

Because both multiplications in (7) can be carried out in \(\mathcal {O}(N_e^3)\) due to the separable structure of *M* and *C* [10], the computational complexity for computing the interpolation vectors is \(\mathcal {O}(N_{e}^{3})\).

The interpolating points required in (6) can be selected by a permutation produced from a QR factorization of \(M^{\mathsf {T}}\) with Column Pivoting (QRCP) [3]. In QRCP, we choose a permutation \(\varPi \) such that the factorization

yields a unitary matrix *Q* and an upper triangular matrix *R* with decreasing matrix elements along the diagonal of *R*. The magnitude of each diagonal element *R* indicates how important the corresponding column of the permuted \(M^{\mathsf {T}}\) is, and whether the corresponding grid point should be chosen as an interpolation point. The QRCP decomposition can be terminated when the \((N^t+1)\)-st diagonal element of *R* becomes less than a predetermined threshold, obtaining \(N^t\) leading columns of the permuted \(M^{\mathsf {T}}\) that are, within numerical accuracy, maximally linearly independent. The corresponding grid points are chosen as the interpolation points. The indices for the chosen interpolation points \(\hat{\mathbf {r}}_{N^t}\) can be obtained from indices of the nonzero entries of the first \(N^t\) columns of the permutation matrix \(\varPi \). Notice that the standard QRCP procedure has a high computational cost of \(\mathcal {O}(N_{e}^2N_{r}^2) \sim \mathcal {O}(N_{e}^4)\), however, this cost can be reduced to \(\mathcal {O}(N_{r}N_{e}^2) \sim \mathcal {O}(N_{e}^3)\) when QRCP is combined with the randomized sampling method [18].

## 4 Low Rank Representations of Bare and Screened Operators via ISDF

The ISDF decomposition applied to \(M_{cc}\), \(M_{vc}\) and \(M_{vv}\) yields

It follows from Eqs. (3), (4) and (9) that the exchange and direct terms of the BSE Hamiltonian can be written as

where \(\widetilde{V}_A=\hat{\varTheta }_{vc}^{*} \hat{V} \hat{\varTheta }_{vc}\) and \(\widetilde{W}_A = \hat{\varTheta }_{cc}^{*} \hat{W} \hat{\varTheta }_{vv}\) are the *projected* exchange and direct terms under the auxiliary basis \(\hat{\varTheta }_{vc}\), \(\hat{\varTheta }_{cc}\) and \(\hat{\varTheta }_{vv}\). Here, \(\hat{\varTheta }_{vc}\), \(\hat{\varTheta }_{cc}\) and \(\hat{\varTheta }_{vv}\) are reciprocal space representations of \(\varTheta _{vc}\), \(\varTheta _{cc}\) and \(\varTheta _{vv}\), respectively, that can be obtained via FFTs. Note that the dimension of the matrix \(C_{cc}^{*} \widetilde{W}_A C_{cc}\) on the right-hand side of Eq. (10) is \(N_c^2 \times N_v^2\). Therefore, it needs to be reshaped into a matrix of dimension \(N_v N_c \times N_vN_c\) according to the mapping \(W_A(i_cj_c,i_vj_v) \rightarrow W_A(i_vi_c,j_vj_c)\) before it can be used in the BSH together with the \(V_A\) matrix.

Once the ISDF approximations for \(M_{vc}\), \(M_{cc}\) and \(M_{vv}\) are available, the cost for constructing a low-rank approximation to the exchange and direct terms reduces to that of computing the projected exchange and direct kernels \(\hat{\varTheta }_{vc}^{*} \hat{V} \hat{\varTheta }_{vc}\) and \(\hat{\varTheta }_{cc}^{*} \hat{W} \hat{\varTheta }_{vv}\), respectively. If the ranks of \(\varTheta _{vc}\), \(\varTheta _{cc}\) and \(\varTheta _{vv}\) are \(N_{vc}^t\), \(N_{cc}^t\) and \(N_{vv}^t\), respectively, then the computational complexity for computing the compressed exchange and direct kernels is \(\mathcal {O}(N_{vc}^t N_{vc}^t N_g + N_{cc}^tN_{vv}^t N_g + N_{vv}^t N_g^2)\), which is significantly lower than the complexity of the conventional approach, which is \(\mathcal {O}(N_gN_c^2N_v^2)\). When \(N_{vc}^t \sim t\sqrt{N_vN_c}\), \(N_{cc}^t \sim t\sqrt{N_cN_c}\) and \(N_{vv}^t \sim t\sqrt{N_vN_v}\) are on the order of \(N_e\), the complexity of constructing the compressed kernels is \(\mathcal {O}(N_e^3)\).

## 5 Iterative Diagonalization of the BSE Hamiltonian

In the conventional approach, exciton energies and wavefunctions can be computed by using the recently developed BSEPACK library [25, 26] to diagonalize the BSE Hamiltonian \(H_\text {BSE}\).

When ISDF is used to construct low-rank approximations to the bare exchange and screened direct operators \(V_A\) and \(W_A\), we should keep both matrices in the factored form given by Eq. (10). We propose to use iterative methods to diagonalize the approximate BSH constructed via the ISDF decomposition.

Within the TDA, several iterative methods such as the Lanczos [14] and LOBPCG [13] algorithms can be used to compute a few desired eigenvalues of the \(H_\text {BSE}\). For each iterative step, we need to multiply \(H_\text {BSE}\) with a vector *x* of size \(N_vN_c\). When \(V_A\) is kept in the factored form given by (10), \(V_Ax\) can be evaluated as three matrix vector multiplications performed in sequence, i.e.,

The complexity of these calculations is \(\mathcal {O}(N_vN_c N_{vc}^t)\). If \(N_{vc}^t\) is on the order of \(N_e\), then each \(V_A x\) can be carried out in \(\mathcal {O}(N_e^3)\) operations.

Because \(C_{cc}^* \widetilde{W_A} C_{vv}\) cannot be multiplied with a vector *x* of size \(N_vN_c\) before it is reshaped, a different multiplication scheme must be used. It follows from the separable nature of \(C_{vv}\) and \(C_{cc}\) that this multiplication can be succinctly written as

where *X* is a \(N_c \times N_v\) matrix reshaped from the vector *x*, \(\varPsi _{c}\) is a \(N_{cc}^t \times N_c\) matrix containing \(\psi _{i_c}(\hat{r}_k)\) as its elements, \(\varPsi _{v}\) is a \(N_{vv}^t \times N_v\) matrix containing \(\psi _{i_v}(\hat{r}_k)\) as its elements, and \(\odot \) denotes componentwise multiplication (Hadamard product). The reshape function is used to turn the \(N_c\times N_v\) matrix–matrix product back into a size \(N_vN_c\) vector. If \(N_{vv}^t\) and \(N_{cc}^t\) are on the order of \(N_e\), then all matrix–matrix multiplications in Eq. (12) can be carried out in \(\mathcal {O}(N_e^3)\) operations. In this way, each step of the iterative method has a complexity \(\mathcal {O}(N_e^3)\) and, if the number of iterative steps required to reach convergence is small, the iterative diagonalization can be solved in \(\mathcal {O}(N_e^3)\) operations.

## 6 Estimating Optical Absorption Spectra Without Diagonalization

The optical absorption spectrum can be readily computed from the eigenpairs of \(H_\text {BSE}\) as

where \(\varOmega \) is the volume of the primitive cell, *e* is the elementary charge, \(d_r\) and \(d_l\) are the right and left optical transition vectors, and \(\eta \) is a broadening factor used to account for the exciton lifetime.

To observe the absorption spectrum and identify its main peaks, it is possible to use a structure preserving iterative method instead of explicitly computing all eigenpairs of \(H_\text {BSE}\). In Ref. [2, 24], we developed a structure preserving Lanczos algorithm that has been implemented in the BSEPACK [26] library. When TDA is adopted, the structure preserving Lanczos reduces to a standard Lanczos algorithm.

## 7 Numerical Results

In this section, we demonstrate the accuracy and efficiency of the ISDF method when it is used to compute exciton energies and optical absorption spectrum in the BSE framework. We implemented the ISDF based BSH construction in the BerkeleyGW software package [4]. We use the *ab initio* software package Quantum ESPRESSO (QE) [6] to compute the ground-state quantities required in the GW and BSE calculations. We use Hartwigsen–Goedecker–Hutter (HGH) norm-conserving pseudopotentials [8] and the LDA [7] exchange–correlation functional in Quantum ESPRESSO. We also check these calculations in the KSSLOV software [27], which is a MATLAB toolbox for solving the Kohn-Sham equations. All the calculations were carried out on a single core at the Cori^{Footnote 1} systems at the National Energy Research Scientific Computing Center (NERSC).

We performed calculations for three systems at the Gamma point. In particular, we choose a silicon Si\(_{8}\) system as a typical model of bulk crystals (in the \(\varvec{k}=0\) approximation, i.e. no sampling of the Brillouin zone) and two molecules: carbon monoxide (CO) and benzene (C\(_6\)H\(_6\)) as plotted in Fig. 1. All systems are closed shell systems, and the number of occupied bands is \(N_v = N_{e}/2\), where \(N_e\) is the valence electrons in the system. We compute the quasiparticle energies and the dielectric function of CO and C\(_6\)H\(_6\) in the BerkeleyGW [4], whereas for Si\(_8\) in the KSSLOV [27].

### 7.1 Accuracy

We first measure the accuracy of the ISDF method by comparing the eigenvalues of the BSH computed with and without the ISDF decomposition.

In our test, we set the plane wave energy cutoff required in the QE calculations to \(E_\text {cut} = 10\) Ha, which is relatively low. However, this is sufficient for assessing the effectiveness of ISDF. Such a choice of \(E_\text {cut}\) results in \(N_r = 35937\) and \(N_g = 2301\) for the Si\(_{8}\) system in a cubic supercell of size 10.22 Bohr\(^3\), \(N_r = 19683\) and \(N_g = 1237\) for the CO molecule (\(N_v = 5\)) in a cubic cell of size 13.23 Bohr, \(N_r = 91125\) and \(N_g = 6235\) for the benzene molecule in a cubic cell of size 22.67 Bohr. The number of active conduction bands (\(N_c\)) and valence bands (\(N_v\)), the number of reciprocal grids and the dimensions of the corresponding BSE Hamiltonian \(H_\text {BSE}\) for these three systems are listed in Table 1.

In Fig. 2, we plot the singular values of the matrices \(M_{vc}(\mathbf {r}) = \{\psi _{i_c}(\mathbf {r})\bar{\psi }_{i_v}(\mathbf {r})\}\), \(M_{cc}(\mathbf {r}) = \{ \psi _{i_c}(\mathbf {r})\bar{\psi }_{j_c}(\mathbf {r})\}\) and \(M_{vv}(\mathbf {r}) = \{\psi _{i_v}(\mathbf {r})\bar{\psi }_{j_v}(\mathbf {r})\}\) associated with the CO molecule. We observe that the singular values of these matrices decay rapidly. For example, the leading 500 (out of 3600) singular values of \(M_{cc}(\mathbf {r})\) decreases rapidly towards zero. All other singular values are below \(10^{-4}\). Therefore, the numerical rank \(N_{cc}^t\) of \(M_{cc}\) is roughly 500 (*t* = 8.3), or roughly 15% of the number of columns in \(M_{cc}\). Consequently, we expect that the rank of \(\varTheta _{cc}\) produced in ISDF decomposition can be set to 15% of \(N_c^2\) without sacrificing the accuracy of the computed eigenvalues.

This prediction is confirmed in Fig. 3, where we plot the absolute difference between the lowest exciton energy of model silicon system Si\(_{8}\) computed with and without using ISDF to construct \(H_{\text {BSE}}\). To be specific, the error in the desired eigenvalue is computed as \(\varDelta {E} = E_\text {ISDF} - E_\text {BGW}\), where \(E_\text {ISDF}\) is computed from the \(H_\mathrm{{BSE}}\) constructed with ISDF approximation, and \(E_\text {BGW}\) is computed from a standard \(H_{\text {BSE}}\) constructed without using ISDF. We first vary one of the ratios \(N^{t}_{cc}/N_{cc}\), \(N^t_{vc}/N_{vc}\) and \(N^t_{vv}/N_{vv}\) while holding the others at a constant of 1. We observe that the error in the lowest exciton energy (positive eigenvalue) is around \(10^{-3}\) Ha, when either \(N^{t}_{cc}/N_{cc}\) or \(N^t_{vc}/N_{vc}\) is set to 0.1 while the other ratios are held at 1. However, reducing \(N^t_{vv}/N_{vv}\) to 0.1 introduces a significant amount of error in the lowest exciton energy, likely because \(N_v=16\) is too small. We then hold \(N^t_{vv}/N_{vv}\) at 0.5 and let both \(N^{t}_{cc}/N_{cc}\) and \(N^t_{vc}/N_{vc}\) vary. The variation of \(\varDelta {E}\) with respect to these ratios is also plotted as in Fig. 3. We observe that the error in the lowest exciton energy is still around \(10^{-3}\) Ha even when both \(N^{t}_{cc}/N_{cc}\) and \(N^t_{vc}/N_{vc}\) are set to 0.1.

We then check the absolute error \(\varDelta {E}\) (Ha) of all the exciton energies computed with the ISDF method by comparing them with the ones obtained from a conventional BSE calculation implemented in BerkeleyGW for the CO and benzene molecules. As we can see from Fig. 4, the errors associated with these eigenvalues are all below 0.002 Ha when \(N^t_{cc}/N_{cc}\) is 0.1.

### 7.2 Efficiency

At the moment, our preliminary implementation of the ISDF method within the BerkeleyGW software package is sequential. Therefore, our efficiency test is limited by the size of the problem as well as the number of conducting bands (\(N_c\)) we can include in the bare and screened operators. As a result, our performance measurement does not fully reflect the computational complexity analysis presented in the previous sections. In particular, taking benzene as an example, \(N_g = 6235\) is much larger than \(N_v = 15\) and \(N_c = 60 \), therefore the computational cost of \(N_g^2N_v^2 \sim \mathcal {O}(N_e^4)\) term is much higher than the \(N_gN_v^2N_c^2 \sim \mathcal {O}(N_e^5)\) term in the conventional BSE calculations.

Nonetheless, in this section, we will demonstrate the benefit of using ISDF to reduce the cost for constructing the BSE Hamiltonian \(H_\text {BSE}\). In Table 2, we focus on the benzene example and report the wall-clock time required to construct the ISDF approximations of the \(M_{vc}\), \(M_{cc}\), and \(M_{vv}\) matrices at different rank truncation levels. Without using ISDF, it takes 746.0 s to construct the reciprocal space representations of \(M_{vc}\), \(M_{cc}\), and \(M_{vv}\) in BerkeleyGW. Most of the time is spent in the several FFTs applied to \(M_{vc}\), \(M_{cc}\), and \(M_{vv}\), in order to obtain the reciprocal space representation of these matrices. We can clearly see that by reducing \(N^t_{cc}/N_{cc}\) from 0.5 (\(t = 30.0\)) to 0.1 (\(t = 6.0\)), the wall-clock time used to construct the low-rank approximation to \(M_{cc}\) reduces from 578.9 to 34.3 s. Furthermore, the total cost of computing \(M_{vc}\), \(M_{cc}\) and \(M_{vv}\) is reduced by a factor 19 when compared with the cost of a conventional approach (39.3 vs. 746.0 s) if \(N^{t}_{vc}/N_{vc}\), \(N^{t}_{vv}/N_{vv}\) and \(N^{t}_{cc}/N_{cc}\) are all set to 0.1.

Since the ISDF decomposition is carried out on a real-space grid, most of the time is spent in performing the QRCP in real space. Even though QRCP with random sampling has \(\mathcal {O}(N_e^3)\) complexity, it has a relatively large pre-constant compared to the size of the problem. This cost can be further reduced by using the recently proposed centroidal Voronoi tessellation (CVT) method [5].

In Table 3, we report the wall-clock time required to construct the projected exchange and direct matrices \(\widetilde{V}_A\) and \(\widetilde{W}_A\) that appear in Eq. (10) from the ISDF approximations of \(M_{vc}\), \(M_{vv}\), and \(M_{cc}\). The current implementation in BerkeleyGW requires 103, 154 s (28.65 h) in a serial run for the full construction of \(H_\text {BSE}\). In the present reimplementation, without ISDF, it takes \(1.574 + 4.198 = 5.772\) s to construct both \(W_A\) and \(V_A\). Note that the original implementation in BerkeleyGW is much slower as it requires a complete integration over G vectors for each pair of bands. When \(N_{cc}^{t}\)/\(N_{cc}\) is set to 0.1, the cost for constructing the full \(W_A\), which has the largest complexity, is reduced by a factor 2.8. Furthermore, if \(N^t_{vc}/N_{vc}\), \(N^t_{vv}/N_{vv}\) and \(N^t_{cc}/N_{cc}\) are all set to 0.1, we reduce the cost for constructing \(\widetilde{V}_A\) and \(\widetilde{W}_A\) by a factor of 63.0 and 10.1 respectively.

### 7.3 Optical Absorption Spectra

One important application of BSE is to compute the optical absorption spectrum, which is determined by optical dielectric function in Eq. (13). Figure 5 plots the optical absorption spectra for both CO and benzene obtained from approximate \(H_\text {BSE}\) constructed with the ISDF method and the \(H_\text {BSE}\) constructed in a conventional approach implemented in BerkeleyGW. When the rank truncation ratio \(N_{cc}^t\)/\(N_{cc}\) is set to be only 0.10 (\(t = 6.0\)), the absorption spectrum obtained from the ISDF approximate \(H_\text {BSE}\) is nearly indistinguishable from that produced from the conventional approach. When \(N_{cc}^t\)/\(N_{cc}\) is set to 0.05 (\(t = 3.0\)), the absorption spectrum obtained from ISDF approximate \(H_\text {BSE}\) still preserves the main features (peaks) of the absorption spectrum obtained in a conventional approach even though some of the peaks are slightly shifted, and the height of some peaks are slightly off.

## 8 Conclusion and Outlook

In summary, we have demonstrated that the interpolative separable density fitting (ISDF) technique can be used to efficiently and accurately construct the Bethe–Salpeter Hamiltonian matrix. The ISDF method allows us to reduce the complexity of the Hamiltonian construction from \(\mathcal {O}(N_e^5)\) to \(\mathcal {O}(N_e^3)\) with a much smaller pre-constant. We show that the ISDF based BSE calculations in molecules and solids can efficiently produce accurate exciton energies and optical absorption spectrum in molecules and solids.

In the future, we plan to replace the costly QRCP procedure with the centroidal Voronoi tessellation (CVT) method [5] for selecting the interpolation points in the ISDF method. The CVT method is expected to significantly reduce the computational cost for selecting interpolating point in the ISDF procedure for the BSE calculations.

The performance results reported here are based on a sequential implementation of the ISDF method. In the near future, we will implement a parallel version suitable for large-scale distributed memory parallel computers. Such an implementation will allow us to tackle much larger problems for which the favorable scaling of the ISDF approach will be more pronounced.

## References

Benner, P., Dolgov, S., Khoromskaia, V., Khoromskij, B.N.: Fast iterative solution of the Bethe–Salpeter eigenvalue problem using low-rank and QTT tensor approximation. J. Comput. Phys.

**334**, 221–239 (2017)Brabec, J., Lin, L., Shao, M., Govind, N., Saad, Y., Yang, C., Ng, E.G.: Efficient algorithms for estimating the absorption spectrum within linear response TDDFT. J. Chem. Theory Comput.

**11**(11), 5197–5208 (2015)Chan, T.F., Hansen, P.C.: Some applications of the rank revealing QR factorization. SIAM J. Sci. Statist. Comput.

**13**, 727–741 (1992)Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: BerkeleyGW: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun.

**183**(6), 1269–1289 (2012)Dong, K., Hu, W., Lin, L.: Interpolative separable density fitting through centroidal Voronoi tessellation with applications to hybrid functional electronic structure calculations (2017). arXiv:1711.01531

Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Corso, A.D., de Gironcoli, S., Fabris, S., Fratesi, G., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. J. Phys.: Condens. Matter

**21**(39), 395502 (2009)Goedecker, S., Teter, M., Hutter, J.: Separable dual-space Gaussian pseudopotentials. Phys. Rev. B

**54**, 1703 (1996)Hartwigsen, C., Goedecker, S., Hutter, J.: Relativistic separable dual-space gaussian pseudopotentials from H to Rn. Phys. Rev. B

**58**, 3641 (1998)Hedin, L.: New method for calculating the one-particle Green’s function with application to the electron–gas problem. Phys. Rev.

**139**, A796 (1965)Hu, W., Lin, L., Yang, C.: Interpolative separable density fitting decomposition for accelerating hybrid density functional calculations with applications to defects in silicon. J. Chem. Theory Comput.

**13**(11), 5420–5431 (2017)Hybertsen, M.S., Louie, S.G.: Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B

**34**, 5390 (1986)Khoromskaia, P.B.V., Khoromskij, B.N.: A reduced basis approach for calculation of the Bethe–Salpeter excitation energies by using low-rank tensor factorisations. Mol. Phys.

**114**, 1148–1161 (2016)Knyazev, A.V.: Toward the optimal preconditioned eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput.

**23**(2), 517–541 (2001)Lanczos, C.: An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Nat. Bur. Stand.

**45**, 255–282 (1950)Lin, L., Xu, Z., Ying, L.: Adaptively compressed polarizability operator for accelerating large scale

*Ab initio*phonon calculations. Multiscale Model. Simul.**15**, 29–55 (2017)Ljungberg, M.P., Koval, P., Ferrari, F., Foerster, D., Sánchez-Portal, D.: Cubic-scaling iterative solution of the Bethe–Salpeter equation for finite systems. Phys. Rev. B

**92**, 075422 (2015)Lu, J., Thicke, K.: Cubic scaling algorithms for RPA correlation using interpolative separable density fitting. J. Comput. Phys.

**351**, 187–202 (2017)Lu, J., Ying, L.: Compression of the electron repulsion integral tensor in tensor hypercontraction format with cubic scaling cost. J. Comput. Phys.

**302**, 329–335 (2015)Marsili, M., Mosconi, E., Angelis, F.D., Umari, P.: Large-scale GW-BSE calculations with \(N^3\) scaling: excitonic effects in dye-sensitized solar cells. Phys. Rev. B

**95**, 075415 (2017)Onida, G., Reining, L., Rubio, A.: Electronic excitations: density-functional versus many-body Green’s-function approaches. Rev. Mod. Phys.

**74**, 601 (2002)Rocca, D., Lu, D., Galli, G.:

*Ab initio*calculations of optical absorption spectra: solution of the Bethe–Salpeter equation within density matrix perturbation theory. J. Chem. Phys.**133**, 164109 (2010)Rohlfing, M., Louie, S.G.: Electron-hole excitations and optical spectra from first principles. Phys. Rev. B

**62**, 4927 (2000)Salpeter, E.E., Bethe, H.A.: A relativistic equation for bound-state problems. Phys. Rev.

**84**, 1232 (1951)Shao, M., da Jornada, F.H., Lin, L., Yang, C., Deslippe, J., Louie, S.G.: A structure preserving Lanczos algorithm for computing the optical absorption spectrum. SIAM J. Matrix. Anal. Appl.

**39**(2), 683–711 (2018)Shao, M., da Jornada, F.H., Yang, C., Deslippe, J., Louie, S.G.: Structure preserving parallel algorithms for solving the Bethe–Salpeter eigenvalue problem. Linear Algebra Appl.

**488**, 148–167 (2016)Shao, M., Yang, C.: BSEPACK user’s guide (2016). https://sites.google.com/a/lbl.gov/bsepack/

Yang, C., Meza, J.C., Lee, B., Wang, L.-W.: KSSOLV—a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Softw.

**36**, 1–35 (2009)

## Acknowledgments

This work is supported by the Center for Computational Study of Excited-State Phenomena in Energy Materials (C2SEPEM) at the Lawrence Berkeley National Laboratory, which is funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231, as part of the Computational Materials Sciences Program, which provided support for developing, implementing and testing ISDF for BSE in BerkeleyGW. The Center for Applied Mathematics for Energy Research Applications (CAMERA) (L. L. and C. Y.) provided support for the algorithm development and mathematical analysis of ISDF. Finally, the authors acknowledge the computational resources of the National Energy Research Scientific Computing (NERSC) center.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Rights and permissions

## Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

## About this paper

### Cite this paper

Hu, W. *et al.* (2018). Accelerating Optical Absorption Spectra and Exciton Energy Computation via Interpolative Separable Density Fitting.
In: , *et al.* Computational Science – ICCS 2018. ICCS 2018. Lecture Notes in Computer Science(), vol 10861. Springer, Cham. https://doi.org/10.1007/978-3-319-93701-4_48

### Download citation

DOI: https://doi.org/10.1007/978-3-319-93701-4_48

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-319-93700-7

Online ISBN: 978-3-319-93701-4

eBook Packages: Computer ScienceComputer Science (R0)