HODLR3D: hierarchical matrices for N-body problems in three dimensions

A, Kandappan V.; Gujjula, Vaishnavi; Ambikasaran, Sivaram

doi:10.1007/s11075-024-01765-4

HODLR3D: hierarchical matrices for N-body problems in three dimensions

Original Paper
Published: 02 March 2024

(2024)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

Kandappan V. A¹,
Vaishnavi Gujjula¹ &
Sivaram Ambikasaran^1,2,3,4

94 Accesses
Explore all metrics

Abstract

This article introduces HODLR3D, a class of hierarchical matrices arising out of N-body problems in three dimensions. HODLR3D relies on the fact that certain off-diagonal matrix sub-blocks arising out of the N-body problems in three dimensions are numerically low rank. For the Laplace kernel in 3D, which is widely encountered, we prove that all the off-diagonal matrix sub-blocks are rank deficient in finite precision. We also obtain the growth of the rank as a function of the size of these matrix sub-blocks. For other kernels in three dimensions, we numerically illustrate a similar scaling in rank for the different off-diagonal sub-blocks. We leverage this hierarchical low-rank structure to construct HODLR3D representation, with which we accelerate matrix-vector products. The storage and computational complexity of the HODLR3D matrix-vector product scales almost linearly with system size. We demonstrate the computational performance of HODLR3D representation through various numerical experiments. Further, we explore the performance of the HODLR3D representation on distributed memory systems. HODLR3D, described in this article, is based on a weak admissibility condition. Among the hierarchical matrices with different weak admissibility conditions in 3D, only in HODLR3D did the rank of the admissible off-diagonal blocks not scale with any power of the system size. Thus, the storage and the computational complexity of the HODLR3D matrix-vector product remain tractable for N-body problems with large system sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards an Adaptive Treecode for N-body Problems

Article 05 March 2020

Improvement of hierarchical matrices for 3D elastodynamic problems with a complex wavenumber

Article 15 February 2022

Efficient Preconditioning of hp-FEM Matrices by Hierarchical Low-Rank Approximations

Article 06 January 2017

Availability of data and materials

The code used to generate the results is available at the following repository. \(\bullet \) Code repository: https://github.com/SAFRAN-LAB/HODLR3D\(\bullet \) Documentation on reproducibility of results:https://hodlr3d.readthedocs.io/en/latest/reproducibility.html

Notes

We say a matrix algorithm has almost linear computational complexity if given \(A \in \mathbb {C}^{N \times N}\), the computational cost of the algorithm scales as \(\mathcal {O} \left( N^{1+\epsilon }\right) \) for all \(\epsilon >0\).
For an \(\epsilon >0\), the numerical rank of matrix K, \(r_{\epsilon }(K)\) is defined as \(r_{\epsilon }(K) = \max \{k\in \{1, 2,\dots N\}:\frac{\sigma _{k}}{\sigma _{1}}>\epsilon \}\) where \(\sigma _{1}\ge \sigma _{2}\ge \dots \sigma _{N}\) are the singular values of K.
Consider a hypercube \(\varvec{B}\subset \mathbb {R}^3\) contains N particles. The particles inside \(\varvec{B}\) are said to be quasi-uniform distributed if exactly one particle is located inside each smallest hypercube resulting from the hierarchical subdivision of the hypercube \(\varvec{B}\) using an \(\log _8 \left( N\right) \) level octree.

References

Gray, A., Moore, A.: N-body’ problems in statistical learning. Advances in neural information processing systems 13 (2000)
Litvinenko, A., Sun, Y., Genton, M.G., Keyes, D.E.: Likelihood approximation with hierarchical matrices for large spatial datasets. Computational Statistics & Data Analysis. 137, 115–132 (2019)
Article MathSciNet Google Scholar
Coulier, P., Darve, E.: Efficient mesh deformation based on radial basis function interpolation by means of the inverse fast multipole method. Comput. Methods Appl. Mech. Eng. 308, 286–309 (2016)
Article ADS MathSciNet Google Scholar
Gumerov, N.A., Duraiswami, R.: Fast radial basis function interpolation via preconditioned Krylov iteration. SIAM J. Sci. Comput. 29(5), 1876–1899 (2007)
Article MathSciNet Google Scholar
Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. part i:Introduction to H-matrices. Computing. 62(2), 89–108 (1999)
Grasedyck, L., Hackbusch, W.: Construction and arithmetics of H-matrices. Computing 70(4), 295–334 (2003)
Article MathSciNet Google Scholar
Kandappan, V.A., Gujjula, V., Ambikasaran, S.: HODLR2D: a new class of hierarchical matrices. SIAM J. Sci. Comput. 45(5), 2382–2408 (2023). https://doi.org/10.1137/22M1491253
Article MathSciNet Google Scholar
Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature. 324(6096), 446–449 (1986)
Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)
Article ADS MathSciNet Google Scholar
Greengard, L.: The rapid evaluation of potential fields in particle systems. MIT Press, (1988)
Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numer 6, 229–269 (1997)
Article ADS MathSciNet Google Scholar
Ambikasaran, S.: Fast algorithms for dense numerical linear algebra and applications. PhD thesis, Stanford University (2013)
Ambikasaran, S., Darve, E.: An \(\cal{O} (n \log n)\)-fast direct solver for partial hierarchically semi-separable matrices. J. Sci. Comput. 57(3), 477–501 (2013). https://doi.org/10.1007/s10915-013-9714-z
Article MathSciNet Google Scholar
Chandrasekaran, S., Dewilde, P., Gu, M., Pals, T., Sun, X., Veen, A.-J., White, D.: Some fast algorithms for sequentially semiseparable representations. SIAM J. Matrix Anal. Appl. 27(2), 341–364 (2005)
Article MathSciNet Google Scholar
Vandebril, R., Barel, M.V., Golub, G., Mastronardi, N.: A bibliography on semiseparable matrices. Calcolo 42(3), 249–270 (2005)
Article MathSciNet Google Scholar
Vandebril, R., Van Barel, M., Mastronardi, N.: A note on the representation and definition of semiseparable matrices. Numerical Linear Algebra with Applications. 12(8), 839–858 (2005)
Article MathSciNet Google Scholar
Börm, S., Grasedyck, L., Hackbusch, W.: Hierarchical matrices. Lecture notes. 21, 2003 (2003)
Google Scholar
Börm, S., Grasedyck, L., Hackbusch, W.: Introduction to hierarchical matrices with applications. Eng. Anal. Boundary Elem. 27(5), 405–422 (2003)
Article Google Scholar
Hackbusch, W.: Hierarchical matrices: algorithms and analysis vol. 49. Springer (2015)
Yokota, R., Ibeid, H., Keyes, D.: Fast multipole method as a matrix-free hierarchical low-rank approximation. In: International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, pp. 267–286 (2015). Springer
Amestoy, P., Ashcraft, C., Boiteau, O., Buttari, A., l’Excellent, J.-Y., Weisbecker, C.: Improving multifrontal methods by means of block low-rank representations. SIAM Journal on Scientific Computing. 37(3), 1451–1474 (2015)
Amestoy, P., Buttari, A., l’Excellent, J.-Y., Mary, T.: On the complexity of the block low-rank multifrontal factorization. SIAM Journal on Scientific Computing. 39(4), 1710–1740 (2017)
Khan, R., Kandappan, V., Ambikasaran, S.: Numerical rank of singular kernel functions. arXiv:2209.05819 (2022)
Hackbusch, W., Khoromskij, B.N., Kriemann, R.: Hierarchical matrices based on a weak admissibility criterion. Computing 73(3), 207–243 (2004)
Article MathSciNet Google Scholar
Beatson, R., Greengard, L.: A short course on fast multipole methods. Wavelets, multilevel methods and elliptic PDEs. 1, 1–37 (1997)
MathSciNet Google Scholar
Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)
Article MathSciNet Google Scholar
Zhao, K., Vouvakis, M.N., Lee, J.-F.: The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems. IEEE Trans. Electromagn. Compat. 47(4), 763–773 (2005)
Article Google Scholar
Tyrtyshnikov, E.: Incomplete cross approximation in the mosaic-skeleton method. Computing 64(4), 367–380 (2000)
Article MathSciNet Google Scholar
Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)
Article MathSciNet Google Scholar
Bebendorf, M., Kunis, S.: Recompression techniques for adaptive cross approximation. The Journal of Integral Equations and Applications, 331–357 (2009)
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Vorst, H.: Templates for the solution of linear systems: building blocks for iterative methods. SIAM, (1994)
Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)
Article MathSciNet Google Scholar
Izadi, M.: Hierarchical matrix techniques on massively parallel computers. Thesis (2012)
Li, Y., Poulson, J., Ying, L.: Distributed-memory \(\cal{H}\)-matrix algebra I: data distribution and matrix-vector multiplication. arXiv:2008.12441 (2020)
Ambikasaran, S., Darve, E.: The inverse fast multipole method. arXiv:1407.1572 (2014)
Gujjula, V., Ambikasaran, S.: Algebraic inverse fast multipole method: a fast direct solver that is better than HODLR based fast direct solver. arXiv:2301.12704 (2023)

Download references

Acknowledgements

The authors acknowledge HPCE, IIT Madras, for providing access to the AQUA cluster. The authors would like to thank Ritesh Khan for his valuable comments on the draft of this article.

Funding

Vaishnavi Gujjula acknowledges the support of Women Leading IITM (India) 2022 in Mathematics (SB22230053MAIITM008880). Sivaram Ambikasaran acknowledges the support of the Young Scientist Research Award from the Board of Research in Nuclear Sciences, Department of Atomic Energy, India (No. 34/20/03/2017-BRNS/34278), and MATRICS grant from the Science and Engineering Research Board, India (Sanction number: MTR/2019/001241).

Author information

Authors and Affiliations

Department of Mathematics, Indian Institute of Technology Madras, Chennai, 600036, Tamil Nadu, India
Kandappan V. A, Vaishnavi Gujjula & Sivaram Ambikasaran
Wadhwani School of Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, 600036, Tamil Nadu, India
Sivaram Ambikasaran
Department of Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, 600036, Tamil Nadu, India
Sivaram Ambikasaran
Robert Bosch Centre for Data Science and Artificial Intelligence, Indian Institute of Technology Madras, Chennai, 600036, Tamil Nadu, India
Sivaram Ambikasaran

Authors

Kandappan V. A
View author publications
You can also search for this author in PubMed Google Scholar
Vaishnavi Gujjula
View author publications
You can also search for this author in PubMed Google Scholar
Sivaram Ambikasaran
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.V.A., V.G., and S.A. wrote the main manuscript text and made substantial contributions to the development of the article. All authors reviewed the manuscript.

Corresponding author

Correspondence to Kandappan V. A.

Ethics declarations

Ethics approval

Not applicable

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Kandappan V. A and Vaishnavi Gujjula contributed equally to this work.

Appendices

Appendix A: Numerical experiment on HODLR3D matrix-vector product

In this section, we repeat the numerical experiment in Section 4.1 for the different hierarchical structures considered, viz., HODLR, HODLR3D, and \(\mathcal {H}_{\sqrt{3}}\) matrix such that in matrix-vector product the forward relative error is of the same order. This numerical experiment intends to understand the performance and scalability of different hierarchical structures for the same matrix-vector product forward relative error. We make sure that the relative forward error of the three algorithms that we compare, HODLR3D, HODLR, and \(\mathcal {H}_{\sqrt{3}}\) matrix, are nearly equal so that the rest of the benchmarks can be compared and an inference can be made. To achieve this, we use different values of \(\epsilon \) in the ACA routine of the three hierarchical structures, in the range of \(10^{-6}-10^{-10}\). We perform this incrementally and record various benchmarks of the hierarchical structures, such that they have a forward relative error of the same order. The kernels that we use to perform the numerical experiment are as follows:

Green’s function for Laplace equation in 3D which is \(\dfrac{1}{r}\)
\(\dfrac{1}{r^4}\)
Real part of Green’s function for Helmholtz equation in 3D, which is \(\dfrac{\cos \left( r\right) }{r}\)

From Figs. 14, 15, and 16, we observe that by maintaining the relative forward error to be nearly equal, the computational complexity for the matrix-vector product using HODLR3D and \(\mathcal {H}_{\sqrt{3}}\)-matrix representation still roughly scales \(\mathcal {O} \left( N\log \left( N\right) \right) \), which is not the case with HODLR in 3D.

Appendix B: HODLR3D initialization in distributed memory systems

As discussed in Section 6, we consider the nodes in a particular level of the hierarchical tree as data-independent computational units. For level l, where the number of nodes in a level \( \left( 8^l\right) \) is greater than \(n_p\) MPI processes, each MPI process has \(\left\lceil \dfrac{8^l}{n_p} \right\rceil \) computational units. For level l, where the number of nodes in that level is lesser than \(n_p\) MPI processes, each node in level l is shared by \(\left\lceil \dfrac{n_p}{8^l} \right\rceil \) MPI processes. The low-rank compression involved with the shared node is performed separately by each MPI process that shares that node. This is performed to eliminate the communication involved and reduce idle time. Table 9 in the Appendix shows the time taken by parallel HODLR3D to initialize the data structure. Additionally, the scalability is ideal when \(8^l>n_p\). However, the scalability of HODLR3D initialization is limited by the construction of the low-rank approximation corresponding to the node in the hierarchical tree at level l, where \(8^l<n_p\).

Table 9 Parallel HODLR3D initialization

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

A, K.V., Gujjula, V. & Ambikasaran, S. HODLR3D: hierarchical matrices for N-body problems in three dimensions. Numer Algor (2024). https://doi.org/10.1007/s11075-024-01765-4

Download citation

Received: 28 September 2023
Accepted: 15 January 2024
Published: 02 March 2024
DOI: https://doi.org/10.1007/s11075-024-01765-4

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HODLR3D: hierarchical matrices for N-body problems in three dimensions

Abstract

Access this article

Similar content being viewed by others

Towards an Adaptive Treecode for N-body Problems

Improvement of hierarchical matrices for 3D elastodynamic problems with a complex wavenumber

Efficient Preconditioning of hp-FEM Matrices by Hierarchical Low-Rank Approximations

Availability of data and materials

Notes

References

Acknowledgements

Funding