Abstract
The exact diagonalization method is a high accuracy numerical approach for solving the Hubbard model of a system of electrons with strong correlation. The method solves for the eigenvalues and eigenvectors of the Hamiltonian matrix derived from the Hubbard model. Since the Hamiltonian is a huge sparse symmetric matrix, it was expected that the LOBPCG method with an appropriate preconditioner could be used to solve the problem in a short time. This turned out to be the case as the LOBPCG method with a suitable preconditioner succeeded in solving the ground state (the smallest eigenvalue and its corresponding eigenvector) of the Hamiltonian. In order to solve for multiple eigenvalues of the Hamiltonian in a short time, we use a preconditioner based on the Neumann expansion which uses approximate eigenvalues and eigenvectors given by LOBPCG iteration. We apply a communication avoiding strategy, which was developed considering the physical properties of the Hubbard model, to the preconditioner. Our numerical experiments on two parallel computers show that the LOBPCG method coupled with the Neumann preconditioner and the communication avoiding strategy improves convergence and achieves excellent scalability when solving for multiple eigenvalues.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Since the High-\(T_c\) superconductor was discovered many physicists have tried to understand the mechanism behind the superconductivity. It is believed that strong electron correlations underlie the phenomenon, however the exact mechanism is not yet fully understood. One of the numerical approaches to the problem is the exact diagonalization method. In this method the eigenvalue problem is solved for the Hamiltonian derived exactly from the Hubbard model [1, 2], which is a model of a strongly-correlated electron system. When we solve the ground state (the smallest eigenvalue and its corresponding eigenvector) of the Hamiltonian, we can understand its properties at absolute zero (−273.15 \(^\circ \mathrm {C}\)). Many computational methods for the problem have been proposed. Since the Hamiltonian from the Hubbard model is a huge sparse symmetric matrix, an iteration method, such as the Lanczos method [3] or the LOBPCG method (see Fig. 1) [4, 5], is usually utilized for solving the eigenvalue problem.
The convergence of the LOBPCG method depends strongly on the use of a preconditioner. We previously confirmed that the zero-shift point Jacobi preconditioner, which is a shift-and-invert preconditioner [6] using an approximate eigenvalue, gives excellent convergence properties for the Hubbard model with the trapping potential [7]. However, we also reported that the benefit of the preconditioner strongly depends on the characteristics of the non-zero elements of the Hamiltonian and that the preconditioner does not always improve the convergence [8]. Therefore we proposed a novel preconditioner using the Neumann expansion for solving the ground state of the Hamiltonian and demonstrated that this preconditioner improves convergence for a Hamiltonian that is difficult to solve with the zero-shift point Jacobi preconditioner [8]. Moreover we applied a communication avoiding strategy, which was developed considering the properties of the Hubbard model, to the preconditioner.
In order to understand more details of strongly correlated electron systems in particular properties at temperatures near absolute zero, we must solve for the several smallest eigenvalues and corresponding eigenvectors of the Hamiltonian. The LOBPCG method can solve multiple eigenvalues by using a block of vectors.
In this paper, we extend the Neumann expansion preconditioner to the LOBPCG method for solving multiple eigenvalues and corresponding eigenvectors. Moreover, we demonstrate that the preconditioner improves the convergence properties and can achieve excellent parallel performance.
The paper is structured as follows. In Sect. 2 we briefly introduce related work for solving the ground state of the Hubbard model using the LOBPCG method. Section 3 describes the use of the Neumann expansion preconditioner with the communication avoiding strategy for solving for multiple eigenvalues and their corresponding eigenvectors. Section 4 demonstrates the parallel performance of the algorithm on the SGI ICE X and K supercomputers. A summary and conclusions are given in Sect. 5.
2 Related Work
2.1 Hamiltonian-Vector Multiplication
When solving the ground state of a symmetric matrix using the LOBPCG method, the most time-consuming operation is the matrix-vector multiplication. The Hamiltonian derived from the Hubbard model (see Fig. 2) is
where t is the hopping parameter from one site to another, and \(U_i\) is the repulsive energy for double occupation of the i-th site by two electrons [1, 2, 7]. Quantities \(c_{i,\sigma }\), \(c_{i,\sigma }^{\dagger }\) and \(n_{i,\sigma }\) are the annihilation, creation, and number operator of an electron with pseudo-spin \(\sigma \) on the i-th site, respectively. The indices in formula (1) for the Hamiltonian denote the possible states for electrons in the model. The dimension of the Hamiltonian for the \(n_s\)-site Hubbard model is
where \(n_\uparrow \) and \(n_\downarrow \) are the number of the up-spin and down-spin electrons, respectively.
The diagonal element in formula (1) is derived from the repulsive energy \(U_i\) in the corresponding state. The hopping parameter t affects non-zero elements with column-index corresponding to the original state and row-index corresponding to the state after hopping. Since the ratio UÂ /Â t greatly affects the properties of the model, we have to execute many simulations varying this ratio to reveal the properties of the model.
When considering the physical properties of the model, we can split the Hamiltonian-vector multiplication as
where \(I_{\uparrow (\downarrow )}\), \(A_{\uparrow (\downarrow )}\) and D are the identity matrix, a sparse symmetric matrix derived from the hopping of an up-spin electron (a down-spin electron), and a diagonal matrix from the repulsive energy, respectively [7]. Since there is no regularity in the state change by electron hopping, the distribution of non-zero elements in matrix \(A_{\uparrow (\downarrow )}\) is irregular.
Next, a matrix V is constructed by the following procedures from a vector v. First, decompose the vector v into n blocks, and order in the two-dimensional manner as follows,
![](http://media.springernature.com/full/springer-static/image/chp%3A10.1007%2F978-3-319-69953-0_14/459530_1_En_14_Equ11_HTML.gif)
where \(m_{\uparrow }\) and \(m_{\downarrow }\) are the dimensions of the Hamiltonian for up-spin and down-spin electrons, i.e.
The subscripts on each element of v formally indicate the row and column within the matrix V. Therefore V is a dense matrix. The k-th elements of the matrix D, \(d_k\), are used in the same manner to define a new matrix \(\bar{D}\). The multiplication in Eq. (2) can then be written as
where the subscript i, j of the matrix is represented as the (i, j)-th element and V and \(\bar{D}\). Accordingly we can parallelize the multiplication \(Y=HV(\equiv Hv)\) as follows: Â
- CAL 1::
-
  \(Y^{c}=\bar{D}^{c}\odot V^{c}+A_{\uparrow } V^{c}\),
- COM 1::
-
  all-to-all communication from \(V^{c}\) to \(V^{r}\),
- CAL 2::
-
  \(W^{r}= V^{r} A^T_{\downarrow }\),
- COM 2::
-
  all-to-all communication from \(W^{r}\) to \(W^{c}\),
- CAL 3::
-
  \(Y^{c}=Y^{c}+W^{c}\).
 where superscripts c and r denote column wise and row wise partitioning of the matrix data for the parallel calculation. The operator \(\odot \) means an element wise multiplication. The parallelization strategy requires two all-to-all communication operations per multiplication.
2.2 Preconditioner of LOBPCG Method for Solving the Ground State of Hubbard Model
Zero-Shift Point Jacobi Preconditioner. A suitable preconditioner improves the convergence properties of the LOBPCG method. As a consequence many preconditioners have been proposed. Preconditioners for the Hamiltonian derived from the Hubbard model also have been proposed. For the Hubbard model, the zero-shift point Jacobi (ZSPJ) preconditioner, which is a shift-and-invert preconditioner using an approximate eigenvalue obtained during LOBPCG iteration, has excellent convergence properties for Hamiltonians where the diagonal elements predominate over the off-diagonal elements, i.e. cases where the repulsive energy U is large [7, 8].
Neumann Expansion Preconditioner. For the Hubbard model with a small repulsive energy, a preconditioner using the Neumann expansion was previously proposed [8]. The expansion is
The expansion converges when the operator norm of the matrix M is less than 1 (\(||M||_{op}<1\)) [9]. Here the matrix M is
where \(\lambda _{\min }\) and \(\lambda _{\max }\) are the smallest and largest eigenvalues, respectively. When the exact eigenvalues are utilized for \(\lambda _{\min }\) and \(\lambda _{\max }\), \(||M||_{op}\) is equal to 1. Since the LOBPCG method calculates an approximation of the smallest eigenvalue, we consider the residual error of this approximation and make a low estimate of \(\lambda _{\min }\). The Gershgorin circle theorem is used to assign a \(\lambda _{\max }\) that is an overestimate of the true value. The inequality \(||M||_{op}<1\) is hence obeyed and the expansion (4) can converge, i.e. the inverse matrix of \(\frac{2}{\lambda _{\max }-\lambda _{\min }}(H-\lambda _{\min }I)\). The expansion is an effective preconditioner for the smallest eigenvalue \(\lambda _{\min }\). In practice the Gershgorin circle theorem may give estimates for \(\lambda _{\max }\) that are much too large. Multiplying by a damping factor \(\alpha \) can help alleviate this inefficiency. We found 0.9 to be an appropriate \(\alpha \) in numerical tests.
2.3 Communication Avoiding Neumann Expansion Preconditioner for Hubbard Model
When we execute the LOBPCG method with the s-th order Neumann expansion preconditioner, we calculate \(s+1\) matrix-vector multiplications, Hv, \(H^2v\), \(\ldots \), and \(H^{s+1}v\), per iteration. As the multiplications \((I_{\downarrow } \otimes A_{\uparrow })\) and \((A_{\downarrow } \otimes I_{\uparrow })\) are commutative Yamada et al. proposed a communication avoiding strategy for the Hamiltonian-vector multiplication [8]. Then, \(H^2\) is given as
As a result, \(Y_1=Hv\) and \(Y_2=H^2v\) can be calculated by the following algorithm: Â
- CAL 1::
-
 \(Y^{c}=\bar{D}^{c}\odot V^{c}+A_{\uparrow } V^{c}\),
- COM 1::
-
 all-to-all communication from \(V^{c}\) to \(V^{r}\),
- CAL 2::
-
  \(W^{r}=V^{r}A^{T}_\downarrow \),
- COM 2::
-
 all-to-all communication from \(W^{r}\) to \(W^{c}\),
- CAL 3::
-
  \(Y_1^{c}=Y^{c}+W^{c}\),
- CAL 4::
-
  \(Y^{c}=Y_1^{c}+W^{c}\),
- CAL 5::
-
  \(Y^{c}=\bar{D}^{c}\odot Y_1^{c}+A_{\uparrow }Y^{c}\),
- CAL 6::
-
  \(W^{r}=\bar{D}^{r}\odot V^{r}+W^{r}\),
- CAL 7::
-
  \(W^{r}=W^{r} A^{T}_\downarrow \) ,
- COM 3::
-
 all-to-all communication from \(W^{r}\) to \(W^{c}\),
- CAL 8::
-
  \(Y_2^{c}=Y^{c}+W^{c}\).
 The algorithm requires three all-to-all communication operations. On the other hand, when using the original algorithm described in Sect. 2.1, four all-to-all communication operations are required to calculate the same multiplication. However, the new algorithm has extra calculations, CAL 4 and CAL 6, as compared to the original one. Therefore when the cost of one all-to-all communication operation is larger than that of the extra calculations, we expect to achieve speedup with the communication avoiding strategy. The algorithm can not be directly applied to the multiplication \(H^{s+1}\) for \(s\ge 2\). In this case, the multiplication \(H^{s+1}\) is calculated by appropriately combining Hv and \(H^2v\) operations.
3 Neumann Expansion Preconditioner for Multiple Eigenvalues of Hubbard Model
3.1 How to Calculate Multiple Eigenvalues Using LOBPCG Method
The LOBPCG method for solving the m smallest eigenvalues and corresponding eigenvectors carries out recurrence with m vectors simultaneously (see Fig. 3). In this algorithm, the generalized eigenvalue problem has to be solved. We can solve the problem using the LAPACK function dsyev, if the matrix \(S_B\) is a positive definite matrix. Although theoretically \(S_B\) is always a positive definite matrix, numerically this is not always the case. The reason is that the norms of the vectors \(w_k^{(i)}\) and \(p_k^{(i)}\) (\(i=1,2,\ldots ,m\)) become small as the LOBPCG iteration converges, and it is possible that trailing digits are lost in the calculation of \(S_B\). Therefore we set the matrix \(S_B\) to the identity matrix by orthogonalizing the vectors per iteration. In the following numerical tests, we utilize the TSQR method for the orthgonalization [10, 11].
3.2 Neumann Expansion Preconditioner of LOBPCG Method for Solving Multiple Eigenvalues
When we calculate multiple eigenvalues (and corresponding eigenvectors) using the LOBPCG method, we can individually apply a preconditioning operation to each vector corresponding to the eigenvectors. We set the matrix \(M_i\) using the Neumann expansion preconditioner for the i-th smallest eigenvalue \(\lambda _i\) of the Hamiltonian as
Since we obtain approximate eigenvalues after each iteration of the LOBPCG method, we consider the residual errors of these approximations to define an appropriate \(\lambda _i\). The matrix \(M_i\) has \((i-1)\) eigenvalues whose absolute values are greater than or equal to 1. In this case, the Neumann expansion using \(M_i\) can not converge. The eigenvectors corresponding to the eigenvalues agree with those corresponding to the eigenvalues \(\lambda _1\), \(\lambda _2\), \(\ldots \), \(\lambda _{i-1}\) of the Hamiltonian, and then, they are calculated during the LOBPCG iteration simultaneously. Accordingly, we orthogonalize the vectors \(x_k^{(i)}\), \(w_k^{(i)}\), and \(p_k^{(i)}\) (\(i=1,2,\ldots ,m\)) in the order that takes away the components of the vectors \(x_k^{(1)}\), \(x_k^{(2)}\), \(\ldots \), \(x_k^{(i-1)}\) from \(w_k^{(i)}\) given by the Neumann expansion preconditioner using \(M_i\). That is, we orthogonal the vectors utilizing the algorithm including the following operation
The formula (5) can approximately remove the components of the eigenvectors corresponding to the eigenvalues, whose absolute values are greater than or equal to 1, from the preconditioned vectors. Therefore we expect that the Neumann expansion using \(M_i\) becomes an appropriate preconditioner for solving for multiple eigenvalues.
4 Performance Result
4.1 Computational Performance and Convergence Property
We examined the computational performance and convergence properties of the LOBPCG method. We solved the 2-D 4 \(\times \) 5-site Hubbard model with 5 up-spin electrons and 5 down-spin ones. The dimension of the Hamiltonian derived from the model is about 240 million. The number of non-zero off-diagonal elements is about 1.6 billion. We solved for one, five and 10 eigenvalues (and corresponding eigenvectors) of the Hamiltonian on 768 cores (64 MPI processes \(\times \) 12 OpenMP threads) of the SGI ICE X supercomputer (see Table 1) in Japan Atomic Energy Agency (JAEA). Table 2 shows the results for a weak interaction case (\(U/t=1\)) and a strong one (\(U/t=10\)). Table 3 shows the elapsed times of some representative operations.
The results for \(U/t=1\) indicate that point Jacobi (PJ) and zero-shift point Jacobi (ZSPJ) preconditioners hardly improve the convergence compared to without using a preconditioner at all. When we solve for many eigenvalues, the PJ and ZSPJ preconditioners have little effect on the speed of the calculation. On the other hand, the Neumann expansion preconditioner can decrease the number of iterations required for convergence. Moreover, the larger the Neumann expansion series s, the fewer iterations required. When we solve for only the smallest eigenvalue, the total elapsed time increases as s increases. The reason is that the elapsed time of the Hamiltonian-vector multiplication operation is dominant over the whole calculation for solving the only smallest eigenvalue (see Table 3). When we solve multiple eigenvalues, the TSQR operation becomes dominant. Therefore when the series number s becomes large, it is possible to achieve speedup of the computation.
Next, we discuss the results for \(U/t=10\). The results indicate that the PJ preconditioner improves the convergence properties. On the other hand, ZSPJ for small m improves convergence, however, its convergence properties when solving for multiple eigenvalues are almost the same as those for the PJ preconditioner. When we solve for multiple eigenvalues using the Neumann expansion preconditioner, the solution is obtained faster than using the PJ or ZSPJ preconditioners. Moreover, as the Neumann expansion series s increases, the Neumann expansion preconditioner improves the convergence properties and the total elapsed time decreases, especially when m is large.
Finally, we talk about the effect of the communication avoiding strategy. Table 4 shows the speedup ratio for the elapsed time using the Neumann expansion preconditioner per iteration and the communication avoiding strategy. In all cases the communication avoiding strategy realizes speedup. When we solve for only the smallest eigenvalue (and its corresponding eigenvector), the speedup ratio is almost the same as that for the matrix-vector multiplication, because the multiplication cost is dominant. On the other hand, when we solve for multiple eigenvalues, the calculation cost except the multiplication becomes dominant. Therefore the speedup ratio is a little smaller than that for only the multiplication. Furthermore, when the Neumann expansion series s is equal to 3, we confirm that the ratio improves. In this case, since four multiplications (Hw, \(H^2w\), \(H^3w\) and \(H^4w\)) are executed per iteration, the ratio of the multiplication cost increases. Moreover, we can execute four multiplication operations by two communication avoiding multiplications. Therefore, the ratio for \(s=3\) is better than that for \(s=1\).
4.2 Parallel Performance
In order to examine the parallel performance of the LOBPCG method using the Neumann expansion preconditioner, we solved for the 10 smallest eigenvalues and corresponding eigenvectors of the Hamiltonian derived from the 4 \(\times \) 5-site Hubbard model for \(U/t=1\) with 6 up-spin and 6 down-spin electrons. We used the LOBPCG method with ZSPJ, NE, and CANE preconditioners using hybrid parallelization on SGI ICEX in JAEA and the K computer in RIKEN (see Table 5). The results are shown in Table 6. The results indicate that all preconditioners achieve excellent parallel efficiency. The communication avoiding strategy on SGI ICEX decreases the elapsed time per iteration by about 15%. On the other hand, the communication avoiding strategy on the K computer did not realize speedup when using a small number of cores. The ratio of the network bandwidth to FLOPS per node of the K computer is larger than that of SGI ICEX, so it is possible that the cost of the extra calculations (CAL 4 & CAL 6) is larger than that of the all-to-all communication operation. However since the cost of the all-to-all communication operation increases as the number of the cores increases, the strategy realizes speedup on 4096 cores. Therefore, the strategy has a potential of speedup for parallel computing using a sufficiently large number of cores, even if the ratio of the network bandwidth to FLOPS is large.
Although the LOBPCG method using NE has four times more Hamiltonian-vector multiplications per iteration than the method with ZSPJ, the former takes about twice the elapsed time of the latter. The reason is that the calculation operations except the multiplication is dominant in this case. Therefore, we conclude that in order to solve for multiple eigenvalues of the Hamiltonian derived from the Hubbard model using the LOBPCG method in a short computation time, it is crucial to reduce the number of the iterations for the convergence even if the calculation cost of the preconditioner is large.
5 Conclusions
In this paper we applied the Neumann expansion preconditioner to the LOBPCG method to solve for multiple eigenvalues and corresponding eigenvectors of the Hamiltonian derived from the Hubbard model. We examined the convergence properties and parallel performance of the algorithms. Since the norm of the matrix used in the Neumann expansion should be less than 1, we transform it using approximate eigenvalues calculated by the LOBPCG iteration and the upper bounds of the eigenvalues by the Gershgorin circle theorem. Moreover, we orthogonalize the iteration vectors in the order that removes the components of the eigenvectors corresponding to the eigenvalues, whose absolute values are greater than or equal to 1, from the preconditioned vectors.
The Neumann expansion preconditioner with the communication avoiding strategy can achieve speedup even for problems which are hardly improved by the conventional preconditioners. Furthermore, a numerical experiment indicated that the LOBPCG method using this preconditioner has excellent parallel efficiency on thousands cores, and the communication avoiding strategy based on the property of the Hubbard model realizes speedup for parallel computers if a sufficiently large number of cores are used. Therefore, we confirm that the preconditioner based on the Neumann expansion is suitable for solving the eigenvalue problem derived from the Hubbard model using the LOBPCG method.
References
Rasetti, M. (ed.): The Hubbard Model: Recent Results. World Scientific, Singapore (1991)
Montorsi, A. (ed.): The Hubbard Model. World Scientific, Singapore (1992)
Cullum, J.K., Willoughby, R.A.: Lanczos Algorithms for Large Symmetric Eigenvalue Computations, vol. 1: Theory. SIAM, Philadelphia (2002)
Knyazev, A.V.: Preconditioned eigensolvers - an oxymoron? Electron. Trans. Numer. Anal. 7, 104–123 (1998)
Knyazev, A.V.: Toward the optimal eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23, 517–541 (2001)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems: Revised Edition. SIAM (2011)
Yamada, S., Imamura, T., Machida, M.: 16.447 TFlops and 159-Billion-dimensional exact-diagonalization for trapped Fermion-Hubbard Model on the Earth Simulator. In: Proceedings of SC 2005 (2005)
Yamada, S., Imamura, T., Machida, M.: Communication avoiding Neumann expansion preconditioner for LOBPCG method: convergence property of exact diagonalization method for Hubbard model. In: Proceedings of ParCo 2017 (2017, accepted)
Barrett, R., et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia (1994)
Langou, J.: AllReduce algorithms: application to Householder QR factorization. In: Proceedings of the 2007 International Conference on Preconditioning Techniques for Large Sparse Matrix Problems in Scientific and Industrial Applications, pp. 103–106 (2007)
Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-avoiding paralleland sequential QR factorizations, Technical report, Electrical Engineering and Computer Sciences, University of California Berkeley (2008)
Acknowledgments
Computations in this study were performed on the SGI ICE X at the JAEA and the K computer at RIKEN Advanced Institute for Computational Science (project ID:ra000005). This research was partially supported by JSPS KAKENHI Grant Number 15K00178.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Author(s)
About this paper
Cite this paper
Yamada, S., Imamura, T., Machida, M. (2018). High Performance LOBPCG Method for Solving Multiple Eigenvalues of Hubbard Model: Efficiency of Communication Avoiding Neumann Expansion Preconditioner. In: Yokota, R., Wu, W. (eds) Supercomputing Frontiers. SCFA 2018. Lecture Notes in Computer Science(), vol 10776. Springer, Cham. https://doi.org/10.1007/978-3-319-69953-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-69953-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69952-3
Online ISBN: 978-3-319-69953-0
eBook Packages: Computer ScienceComputer Science (R0)