Abstract
This article introduces HODLR3D, a class of hierarchical matrices arising out of N-body problems in three dimensions. HODLR3D relies on the fact that certain off-diagonal matrix sub-blocks arising out of the N-body problems in three dimensions are numerically low rank. For the Laplace kernel in 3D, which is widely encountered, we prove that all the off-diagonal matrix sub-blocks are rank deficient in finite precision. We also obtain the growth of the rank as a function of the size of these matrix sub-blocks. For other kernels in three dimensions, we numerically illustrate a similar scaling in rank for the different off-diagonal sub-blocks. We leverage this hierarchical low-rank structure to construct HODLR3D representation, with which we accelerate matrix-vector products. The storage and computational complexity of the HODLR3D matrix-vector product scales almost linearly with system size. We demonstrate the computational performance of HODLR3D representation through various numerical experiments. Further, we explore the performance of the HODLR3D representation on distributed memory systems. HODLR3D, described in this article, is based on a weak admissibility condition. Among the hierarchical matrices with different weak admissibility conditions in 3D, only in HODLR3D did the rank of the admissible off-diagonal blocks not scale with any power of the system size. Thus, the storage and the computational complexity of the HODLR3D matrix-vector product remain tractable for N-body problems with large system sizes.
Similar content being viewed by others
Availability of data and materials
The code used to generate the results is available at the following repository. \(\bullet \) Code repository: https://github.com/SAFRAN-LAB/HODLR3D\(\bullet \) Documentation on reproducibility of results:https://hodlr3d.readthedocs.io/en/latest/reproducibility.html
Notes
We say a matrix algorithm has almost linear computational complexity if given \(A \in \mathbb {C}^{N \times N}\), the computational cost of the algorithm scales as \(\mathcal {O} \left( N^{1+\epsilon }\right) \) for all \(\epsilon >0\).
For an \(\epsilon >0\), the numerical rank of matrix K, \(r_{\epsilon }(K)\) is defined as \(r_{\epsilon }(K) = \max \{k\in \{1, 2,\dots N\}:\frac{\sigma _{k}}{\sigma _{1}}>\epsilon \}\) where \(\sigma _{1}\ge \sigma _{2}\ge \dots \sigma _{N}\) are the singular values of K.
Consider a hypercube \(\varvec{B}\subset \mathbb {R}^3\) contains N particles. The particles inside \(\varvec{B}\) are said to be quasi-uniform distributed if exactly one particle is located inside each smallest hypercube resulting from the hierarchical subdivision of the hypercube \(\varvec{B}\) using an \(\log _8 \left( N\right) \) level octree.
References
Gray, A., Moore, A.: N-body’ problems in statistical learning. Advances in neural information processing systems 13 (2000)
Litvinenko, A., Sun, Y., Genton, M.G., Keyes, D.E.: Likelihood approximation with hierarchical matrices for large spatial datasets. Computational Statistics & Data Analysis. 137, 115–132 (2019)
Coulier, P., Darve, E.: Efficient mesh deformation based on radial basis function interpolation by means of the inverse fast multipole method. Comput. Methods Appl. Mech. Eng. 308, 286–309 (2016)
Gumerov, N.A., Duraiswami, R.: Fast radial basis function interpolation via preconditioned Krylov iteration. SIAM J. Sci. Comput. 29(5), 1876–1899 (2007)
Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. part i:Introduction to H-matrices. Computing. 62(2), 89–108 (1999)
Grasedyck, L., Hackbusch, W.: Construction and arithmetics of H-matrices. Computing 70(4), 295–334 (2003)
Kandappan, V.A., Gujjula, V., Ambikasaran, S.: HODLR2D: a new class of hierarchical matrices. SIAM J. Sci. Comput. 45(5), 2382–2408 (2023). https://doi.org/10.1137/22M1491253
Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature. 324(6096), 446–449 (1986)
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)
Greengard, L.: The rapid evaluation of potential fields in particle systems. MIT Press, (1988)
Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numer 6, 229–269 (1997)
Ambikasaran, S.: Fast algorithms for dense numerical linear algebra and applications. PhD thesis, Stanford University (2013)
Ambikasaran, S., Darve, E.: An \(\cal{O} (n \log n)\)-fast direct solver for partial hierarchically semi-separable matrices. J. Sci. Comput. 57(3), 477–501 (2013). https://doi.org/10.1007/s10915-013-9714-z
Chandrasekaran, S., Dewilde, P., Gu, M., Pals, T., Sun, X., Veen, A.-J., White, D.: Some fast algorithms for sequentially semiseparable representations. SIAM J. Matrix Anal. Appl. 27(2), 341–364 (2005)
Vandebril, R., Barel, M.V., Golub, G., Mastronardi, N.: A bibliography on semiseparable matrices. Calcolo 42(3), 249–270 (2005)
Vandebril, R., Van Barel, M., Mastronardi, N.: A note on the representation and definition of semiseparable matrices. Numerical Linear Algebra with Applications. 12(8), 839–858 (2005)
Börm, S., Grasedyck, L., Hackbusch, W.: Hierarchical matrices. Lecture notes. 21, 2003 (2003)
Börm, S., Grasedyck, L., Hackbusch, W.: Introduction to hierarchical matrices with applications. Eng. Anal. Boundary Elem. 27(5), 405–422 (2003)
Hackbusch, W.: Hierarchical matrices: algorithms and analysis vol. 49. Springer (2015)
Yokota, R., Ibeid, H., Keyes, D.: Fast multipole method as a matrix-free hierarchical low-rank approximation. In: International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, pp. 267–286 (2015). Springer
Amestoy, P., Ashcraft, C., Boiteau, O., Buttari, A., l’Excellent, J.-Y., Weisbecker, C.: Improving multifrontal methods by means of block low-rank representations. SIAM Journal on Scientific Computing. 37(3), 1451–1474 (2015)
Amestoy, P., Buttari, A., l’Excellent, J.-Y., Mary, T.: On the complexity of the block low-rank multifrontal factorization. SIAM Journal on Scientific Computing. 39(4), 1710–1740 (2017)
Khan, R., Kandappan, V., Ambikasaran, S.: Numerical rank of singular kernel functions. arXiv:2209.05819 (2022)
Hackbusch, W., Khoromskij, B.N., Kriemann, R.: Hierarchical matrices based on a weak admissibility criterion. Computing 73(3), 207–243 (2004)
Beatson, R., Greengard, L.: A short course on fast multipole methods. Wavelets, multilevel methods and elliptic PDEs. 1, 1–37 (1997)
Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)
Zhao, K., Vouvakis, M.N., Lee, J.-F.: The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems. IEEE Trans. Electromagn. Compat. 47(4), 763–773 (2005)
Tyrtyshnikov, E.: Incomplete cross approximation in the mosaic-skeleton method. Computing 64(4), 367–380 (2000)
Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)
Bebendorf, M., Kunis, S.: Recompression techniques for adaptive cross approximation. The Journal of Integral Equations and Applications, 331–357 (2009)
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Vorst, H.: Templates for the solution of linear systems: building blocks for iterative methods. SIAM, (1994)
Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)
Izadi, M.: Hierarchical matrix techniques on massively parallel computers. Thesis (2012)
Li, Y., Poulson, J., Ying, L.: Distributed-memory \(\cal{H}\)-matrix algebra I: data distribution and matrix-vector multiplication. arXiv:2008.12441 (2020)
Ambikasaran, S., Darve, E.: The inverse fast multipole method. arXiv:1407.1572 (2014)
Gujjula, V., Ambikasaran, S.: Algebraic inverse fast multipole method: a fast direct solver that is better than HODLR based fast direct solver. arXiv:2301.12704 (2023)
Acknowledgements
The authors acknowledge HPCE, IIT Madras, for providing access to the AQUA cluster. The authors would like to thank Ritesh Khan for his valuable comments on the draft of this article.
Funding
Vaishnavi Gujjula acknowledges the support of Women Leading IITM (India) 2022 in Mathematics (SB22230053MAIITM008880). Sivaram Ambikasaran acknowledges the support of the Young Scientist Research Award from the Board of Research in Nuclear Sciences, Department of Atomic Energy, India (No. 34/20/03/2017-BRNS/34278), and MATRICS grant from the Science and Engineering Research Board, India (Sanction number: MTR/2019/001241).
Author information
Authors and Affiliations
Contributions
K.V.A., V.G., and S.A. wrote the main manuscript text and made substantial contributions to the development of the article. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Kandappan V. A and Vaishnavi Gujjula contributed equally to this work.
Appendices
Appendix A: Numerical experiment on HODLR3D matrix-vector product
In this section, we repeat the numerical experiment in Section 4.1 for the different hierarchical structures considered, viz., HODLR, HODLR3D, and \(\mathcal {H}_{\sqrt{3}}\) matrix such that in matrix-vector product the forward relative error is of the same order. This numerical experiment intends to understand the performance and scalability of different hierarchical structures for the same matrix-vector product forward relative error. We make sure that the relative forward error of the three algorithms that we compare, HODLR3D, HODLR, and \(\mathcal {H}_{\sqrt{3}}\) matrix, are nearly equal so that the rest of the benchmarks can be compared and an inference can be made. To achieve this, we use different values of \(\epsilon \) in the ACA routine of the three hierarchical structures, in the range of \(10^{-6}-10^{-10}\). We perform this incrementally and record various benchmarks of the hierarchical structures, such that they have a forward relative error of the same order. The kernels that we use to perform the numerical experiment are as follows:
-
Green’s function for Laplace equation in 3D which is \(\dfrac{1}{r}\)
-
\(\dfrac{1}{r^4}\)
-
Real part of Green’s function for Helmholtz equation in 3D, which is \(\dfrac{\cos \left( r\right) }{r}\)
From Figs. 14, 15, and 16, we observe that by maintaining the relative forward error to be nearly equal, the computational complexity for the matrix-vector product using HODLR3D and \(\mathcal {H}_{\sqrt{3}}\)-matrix representation still roughly scales \(\mathcal {O} \left( N\log \left( N\right) \right) \), which is not the case with HODLR in 3D.
Appendix B: HODLR3D initialization in distributed memory systems
As discussed in Section 6, we consider the nodes in a particular level of the hierarchical tree as data-independent computational units. For level l, where the number of nodes in a level \( \left( 8^l\right) \) is greater than \(n_p\) MPI processes, each MPI process has \(\left\lceil \dfrac{8^l}{n_p} \right\rceil \) computational units. For level l, where the number of nodes in that level is lesser than \(n_p\) MPI processes, each node in level l is shared by \(\left\lceil \dfrac{n_p}{8^l} \right\rceil \) MPI processes. The low-rank compression involved with the shared node is performed separately by each MPI process that shares that node. This is performed to eliminate the communication involved and reduce idle time. Table 9 in the Appendix shows the time taken by parallel HODLR3D to initialize the data structure. Additionally, the scalability is ideal when \(8^l>n_p\). However, the scalability of HODLR3D initialization is limited by the construction of the low-rank approximation corresponding to the node in the hierarchical tree at level l, where \(8^l<n_p\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
A, K.V., Gujjula, V. & Ambikasaran, S. HODLR3D: hierarchical matrices for N-body problems in three dimensions. Numer Algor (2024). https://doi.org/10.1007/s11075-024-01765-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11075-024-01765-4