Skip to main content
Log in

HODLR3D: hierarchical matrices for N-body problems in three dimensions

  • Original Paper
  • Published:
Numerical Algorithms Aims and scope Submit manuscript

Abstract

This article introduces HODLR3D, a class of hierarchical matrices arising out of N-body problems in three dimensions. HODLR3D relies on the fact that certain off-diagonal matrix sub-blocks arising out of the N-body problems in three dimensions are numerically low rank. For the Laplace kernel in 3D, which is widely encountered, we prove that all the off-diagonal matrix sub-blocks are rank deficient in finite precision. We also obtain the growth of the rank as a function of the size of these matrix sub-blocks. For other kernels in three dimensions, we numerically illustrate a similar scaling in rank for the different off-diagonal sub-blocks. We leverage this hierarchical low-rank structure to construct HODLR3D representation, with which we accelerate matrix-vector products. The storage and computational complexity of the HODLR3D matrix-vector product scales almost linearly with system size. We demonstrate the computational performance of HODLR3D representation through various numerical experiments. Further, we explore the performance of the HODLR3D representation on distributed memory systems. HODLR3D, described in this article, is based on a weak admissibility condition. Among the hierarchical matrices with different weak admissibility conditions in 3D, only in HODLR3D did the rank of the admissible off-diagonal blocks not scale with any power of the system size. Thus, the storage and the computational complexity of the HODLR3D matrix-vector product remain tractable for N-body problems with large system sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Algorithm 2
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

The code used to generate the results is available at the following repository. \(\bullet \) Code repository: https://github.com/SAFRAN-LAB/HODLR3D\(\bullet \) Documentation on reproducibility of results:https://hodlr3d.readthedocs.io/en/latest/reproducibility.html

Notes

  1. We say a matrix algorithm has almost linear computational complexity if given \(A \in \mathbb {C}^{N \times N}\), the computational cost of the algorithm scales as \(\mathcal {O} \left( N^{1+\epsilon }\right) \) for all \(\epsilon >0\).

  2. For an \(\epsilon >0\), the numerical rank of matrix K, \(r_{\epsilon }(K)\) is defined as \(r_{\epsilon }(K) = \max \{k\in \{1, 2,\dots N\}:\frac{\sigma _{k}}{\sigma _{1}}>\epsilon \}\) where \(\sigma _{1}\ge \sigma _{2}\ge \dots \sigma _{N}\) are the singular values of K.

  3. Consider a hypercube \(\varvec{B}\subset \mathbb {R}^3\) contains N particles. The particles inside \(\varvec{B}\) are said to be quasi-uniform distributed if exactly one particle is located inside each smallest hypercube resulting from the hierarchical subdivision of the hypercube \(\varvec{B}\) using an \(\log _8 \left( N\right) \) level octree.

References

  1. Gray, A., Moore, A.: N-body’ problems in statistical learning. Advances in neural information processing systems 13 (2000)

  2. Litvinenko, A., Sun, Y., Genton, M.G., Keyes, D.E.: Likelihood approximation with hierarchical matrices for large spatial datasets. Computational Statistics & Data Analysis. 137, 115–132 (2019)

    Article  MathSciNet  Google Scholar 

  3. Coulier, P., Darve, E.: Efficient mesh deformation based on radial basis function interpolation by means of the inverse fast multipole method. Comput. Methods Appl. Mech. Eng. 308, 286–309 (2016)

    Article  ADS  MathSciNet  Google Scholar 

  4. Gumerov, N.A., Duraiswami, R.: Fast radial basis function interpolation via preconditioned Krylov iteration. SIAM J. Sci. Comput. 29(5), 1876–1899 (2007)

    Article  MathSciNet  Google Scholar 

  5. Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. part i:Introduction to H-matrices. Computing. 62(2), 89–108 (1999)

  6. Grasedyck, L., Hackbusch, W.: Construction and arithmetics of H-matrices. Computing 70(4), 295–334 (2003)

    Article  MathSciNet  Google Scholar 

  7. Kandappan, V.A., Gujjula, V., Ambikasaran, S.: HODLR2D: a new class of hierarchical matrices. SIAM J. Sci. Comput. 45(5), 2382–2408 (2023). https://doi.org/10.1137/22M1491253

    Article  MathSciNet  Google Scholar 

  8. Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature. 324(6096), 446–449 (1986)

  9. Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)

    Article  ADS  MathSciNet  Google Scholar 

  10. Greengard, L.: The rapid evaluation of potential fields in particle systems. MIT Press, (1988)

  11. Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numer 6, 229–269 (1997)

    Article  ADS  MathSciNet  Google Scholar 

  12. Ambikasaran, S.: Fast algorithms for dense numerical linear algebra and applications. PhD thesis, Stanford University (2013)

  13. Ambikasaran, S., Darve, E.: An \(\cal{O} (n \log n)\)-fast direct solver for partial hierarchically semi-separable matrices. J. Sci. Comput. 57(3), 477–501 (2013). https://doi.org/10.1007/s10915-013-9714-z

    Article  MathSciNet  Google Scholar 

  14. Chandrasekaran, S., Dewilde, P., Gu, M., Pals, T., Sun, X., Veen, A.-J., White, D.: Some fast algorithms for sequentially semiseparable representations. SIAM J. Matrix Anal. Appl. 27(2), 341–364 (2005)

    Article  MathSciNet  Google Scholar 

  15. Vandebril, R., Barel, M.V., Golub, G., Mastronardi, N.: A bibliography on semiseparable matrices. Calcolo 42(3), 249–270 (2005)

    Article  MathSciNet  Google Scholar 

  16. Vandebril, R., Van Barel, M., Mastronardi, N.: A note on the representation and definition of semiseparable matrices. Numerical Linear Algebra with Applications. 12(8), 839–858 (2005)

    Article  MathSciNet  Google Scholar 

  17. Börm, S., Grasedyck, L., Hackbusch, W.: Hierarchical matrices. Lecture notes. 21, 2003 (2003)

    Google Scholar 

  18. Börm, S., Grasedyck, L., Hackbusch, W.: Introduction to hierarchical matrices with applications. Eng. Anal. Boundary Elem. 27(5), 405–422 (2003)

    Article  Google Scholar 

  19. Hackbusch, W.: Hierarchical matrices: algorithms and analysis vol. 49. Springer (2015)

  20. Yokota, R., Ibeid, H., Keyes, D.: Fast multipole method as a matrix-free hierarchical low-rank approximation. In: International Workshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, pp. 267–286 (2015). Springer

  21. Amestoy, P., Ashcraft, C., Boiteau, O., Buttari, A., l’Excellent, J.-Y., Weisbecker, C.: Improving multifrontal methods by means of block low-rank representations. SIAM Journal on Scientific Computing. 37(3), 1451–1474 (2015)

  22. Amestoy, P., Buttari, A., l’Excellent, J.-Y., Mary, T.: On the complexity of the block low-rank multifrontal factorization. SIAM Journal on Scientific Computing. 39(4), 1710–1740 (2017)

  23. Khan, R., Kandappan, V., Ambikasaran, S.: Numerical rank of singular kernel functions. arXiv:2209.05819 (2022)

  24. Hackbusch, W., Khoromskij, B.N., Kriemann, R.: Hierarchical matrices based on a weak admissibility criterion. Computing 73(3), 207–243 (2004)

    Article  MathSciNet  Google Scholar 

  25. Beatson, R., Greengard, L.: A short course on fast multipole methods. Wavelets, multilevel methods and elliptic PDEs. 1, 1–37 (1997)

    MathSciNet  Google Scholar 

  26. Bebendorf, M., Rjasanow, S.: Adaptive low-rank approximation of collocation matrices. Computing 70(1), 1–24 (2003)

    Article  MathSciNet  Google Scholar 

  27. Zhao, K., Vouvakis, M.N., Lee, J.-F.: The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems. IEEE Trans. Electromagn. Compat. 47(4), 763–773 (2005)

    Article  Google Scholar 

  28. Tyrtyshnikov, E.: Incomplete cross approximation in the mosaic-skeleton method. Computing 64(4), 367–380 (2000)

    Article  MathSciNet  Google Scholar 

  29. Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589 (2000)

    Article  MathSciNet  Google Scholar 

  30. Bebendorf, M., Kunis, S.: Recompression techniques for adaptive cross approximation. The Journal of Integral Equations and Applications, 331–357 (2009)

  31. Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Vorst, H.: Templates for the solution of linear systems: building blocks for iterative methods. SIAM, (1994)

  32. Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7(3), 856–869 (1986)

    Article  MathSciNet  Google Scholar 

  33. Izadi, M.: Hierarchical matrix techniques on massively parallel computers. Thesis (2012)

  34. Li, Y., Poulson, J., Ying, L.: Distributed-memory \(\cal{H}\)-matrix algebra I: data distribution and matrix-vector multiplication. arXiv:2008.12441 (2020)

  35. Ambikasaran, S., Darve, E.: The inverse fast multipole method. arXiv:1407.1572 (2014)

  36. Gujjula, V., Ambikasaran, S.: Algebraic inverse fast multipole method: a fast direct solver that is better than HODLR based fast direct solver. arXiv:2301.12704 (2023)

Download references

Acknowledgements

The authors acknowledge HPCE, IIT Madras, for providing access to the AQUA cluster. The authors would like to thank Ritesh Khan for his valuable comments on the draft of this article.

Funding

Vaishnavi Gujjula acknowledges the support of Women Leading IITM (India) 2022 in Mathematics (SB22230053MAIITM008880). Sivaram Ambikasaran acknowledges the support of the Young Scientist Research Award from the Board of Research in Nuclear Sciences, Department of Atomic Energy, India (No. 34/20/03/2017-BRNS/34278), and MATRICS grant from the Science and Engineering Research Board, India (Sanction number: MTR/2019/001241).

Author information

Authors and Affiliations

Authors

Contributions

K.V.A., V.G., and S.A. wrote the main manuscript text and made substantial contributions to the development of the article. All authors reviewed the manuscript.

Corresponding author

Correspondence to Kandappan V. A.

Ethics declarations

Ethics approval

Not applicable

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Kandappan V. A and Vaishnavi Gujjula contributed equally to this work.

Appendices

Appendix A: Numerical experiment on HODLR3D matrix-vector product

Fig. 14
figure 14

Various benchmarks of HODLR3D matrix-vector product in comparison with those of HODLR and \(\mathcal {H}_{\sqrt{3}}\) matrix-vector products for the kernel \(\frac{1}{r}\) with the relative forward errors of the three algorithms to be of the same order

Fig. 15
figure 15

Various benchmarks of HODLR3D matrix-vector product in comparison with those of HODLR and \(\mathcal {H}_{\sqrt{3}}\) matrix-vector products for the kernel \(\frac{1}{r^4}\) with the relative forward errors of the three algorithms to be of the same order

In this section, we repeat the numerical experiment in Section 4.1 for the different hierarchical structures considered, viz., HODLR, HODLR3D, and \(\mathcal {H}_{\sqrt{3}}\) matrix such that in matrix-vector product the forward relative error is of the same order. This numerical experiment intends to understand the performance and scalability of different hierarchical structures for the same matrix-vector product forward relative error. We make sure that the relative forward error of the three algorithms that we compare, HODLR3D, HODLR, and \(\mathcal {H}_{\sqrt{3}}\) matrix, are nearly equal so that the rest of the benchmarks can be compared and an inference can be made. To achieve this, we use different values of \(\epsilon \) in the ACA routine of the three hierarchical structures, in the range of \(10^{-6}-10^{-10}\). We perform this incrementally and record various benchmarks of the hierarchical structures, such that they have a forward relative error of the same order. The kernels that we use to perform the numerical experiment are as follows:

  • Green’s function for Laplace equation in 3D which is \(\dfrac{1}{r}\)

  • \(\dfrac{1}{r^4}\)

  • Real part of Green’s function for Helmholtz equation in 3D, which is \(\dfrac{\cos \left( r\right) }{r}\)

From Figs. 14, 15, and 16, we observe that by maintaining the relative forward error to be nearly equal, the computational complexity for the matrix-vector product using HODLR3D and \(\mathcal {H}_{\sqrt{3}}\)-matrix representation still roughly scales \(\mathcal {O} \left( N\log \left( N\right) \right) \), which is not the case with HODLR in 3D.

Fig. 16
figure 16

Various benchmarks of HODLR3D matrix-vector product in comparison with those of HODLR and \(\mathcal {H}_{\sqrt{3}}\) matrix-vector products for the kernel \(\frac{\cos \left( r\right) }{r}\) with the relative forward errors of the three algorithms to be of the same order

Appendix B: HODLR3D initialization in distributed memory systems

As discussed in Section 6, we consider the nodes in a particular level of the hierarchical tree as data-independent computational units. For level l, where the number of nodes in a level \( \left( 8^l\right) \) is greater than \(n_p\) MPI processes, each MPI process has \(\left\lceil \dfrac{8^l}{n_p} \right\rceil \) computational units. For level l, where the number of nodes in that level is lesser than \(n_p\) MPI processes, each node in level l is shared by \(\left\lceil \dfrac{n_p}{8^l} \right\rceil \) MPI processes. The low-rank compression involved with the shared node is performed separately by each MPI process that shares that node. This is performed to eliminate the communication involved and reduce idle time. Table 9 in the Appendix shows the time taken by parallel HODLR3D to initialize the data structure. Additionally, the scalability is ideal when \(8^l>n_p\). However, the scalability of HODLR3D initialization is limited by the construction of the low-rank approximation corresponding to the node in the hierarchical tree at level l, where \(8^l<n_p\).

Table 9 Parallel HODLR3D initialization

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

A, K.V., Gujjula, V. & Ambikasaran, S. HODLR3D: hierarchical matrices for N-body problems in three dimensions. Numer Algor (2024). https://doi.org/10.1007/s11075-024-01765-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11075-024-01765-4

Keywords

Mathematics Subject Classification (2010)

Navigation