# A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm

Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 109)

## Abstract

The sparse grid combination technique provides a framework to solve high-dimensional numerical problems with standard solvers by assembling a sparse grid from many coarse and anisotropic full grids called component grids. Hierarchization is one of the most fundamental tasks for sparse grids. It describes the transformation from the nodal basis to the hierarchical basis. In settings where the component grids have to be frequently combined and distributed in a massively parallel compute environment, hierarchization on component grids is relevant to minimize communication overhead.

We present a cache-oblivious hierarchization algorithm for component grids of the combination technique. It causes $$\left \vert \mathbf{G}_{\boldsymbol{\ell}}\right \vert \cdot \left ( \frac{1} {B} + \mathcal{O}\left ( \frac{1} {\root{d}\of{M}}\right )\right )$$ cache misses under the tall cache assumption $$M =\omega \left (B^{d}\right )$$. Here, $$\mathbf{G}_{\boldsymbol{\ell}}$$ denotes the component grid, d the dimension, M the size of the cache and B the cache line size. This algorithm decreases the leading term of the cache misses by a factor of d compared to the unidirectional algorithm which is the common standard up to now. The new algorithm is also optimal in the sense that the leading term of the cache misses is reduced to scanning complexity, i.e., every degree of freedom has to be touched once. We also present a variant of the algorithm that causes $$\left \vert \mathbf{G}_{\boldsymbol{\ell}}\right \vert \cdot \left ( \frac{2} {B} + \mathcal{O}\left ( \frac{1} {\root{d-1}\of{M\cdot B^{d-2}}} \right )\right )$$ cache misses under the assumption $$M =\omega \left (B\right )$$. The new algorithms have been implemented and outperform previously existing software. In several cases the measured performance is close to the best possible.

### References

1. 1.
A. Aggarwal, J.S. Vitter, The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)
2. 2.
G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32(3), 866–901 (2011)
3. 3.
H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)
4. 4.
H.-J. Bungartz, A. Heinecke, D. Pflüger, S. Schraufstetter, Option pricing with a direct adaptive sparse grid approach. J. Comput. Appl. Math. 236(15), 3741–3750 (2011). Online Okt. 2011Google Scholar
5. 5.
H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modelling, Simulation and Optimization of Complex Processes 2006, Proceedings of the International Conference on HPSC, Hanoi, ed. by H. Bock, E. Kostina, X. Hoang, R. Rannacher (Springer, 2008), pp. 121–130Google Scholar
6. 6.
G. Buse, R. Jacob, D. Pflüger, A. Murarasu, A non-static data layout enhancing parallelism and vectorization in sparse grid algorithms, in Proceedings of the 11th International Symposium on Parallel and Distributed Computing (ISPDC), Munich, 25–29 June 2012 (IEEE, 2012), pp. 195–202Google Scholar
7. 7.
D. Butnaru, D. Pflüger, H.-J. Bungartz, Towards high-dimensional computational steering of precomputed simulation data using sparse grids, in Proceedings of the International Conference on Computational Science (ICCS), Tsukaba. Volume 4 of Procedia CS (Springer, 2011), pp. 56–65Google Scholar
8. 8.
P. Butz, Effiziente verteilte Hierarchisierung und Dehierarchisierung auf vollen Gittern, Bachelor’s thesis, University of Stuttgart, 2014, http://d-nb.info/1063333806 Google Scholar
9. 9.
C. Feuersänger, Sparse grid methods for higher dimensional approximation, PhD thesis, Universität Bonn, 2010Google Scholar
10. 10.
M. Frigo, C. E. Leiserson, H. Prokop, S. Ramachandran, Cache-oblivious algorithms, in Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99), New York (IEEE Computer Society Press, 1999), pp. 285–297Google Scholar
11. 11.
J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern, PhD thesis, Universität Bonn, 2004Google Scholar
12. 12.
J. Garcke, M. Griebel, On the parallelization of the sparse grid approach for data mining, in Large-Scale Scientific Computing, ed. by S. Margenov, J. Waśniewski, P. Yalamov. Volume 2179 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg, 2001), pp. 22–32Google Scholar
13. 13.
E. Georganas, J. González-Domínguez, E. Solomonik, Y. Zheng, J. Touriño, K. Yelick, Communication avoiding and overlapping for numerical linear algebra, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12), Salt Lake City (IEEE Computer Society Press, Los Alamitos, 2012), pp. 100:1–100:11Google Scholar
14. 14.
M. Griebel, The combination technique for the sparse grid solution of PDE’s on multiprocessor machines. Parallel Process. Lett. 2, 61–70 (1992)
15. 15.
M. Griebel, H. Harbrecht, On the convergence of the combination technique, in Sparse Grids and Applications. Volume 97 of Lecture Notes in Computational Science and Engineering (Springer, Cham/New York, 2014), pp. 55–74Google Scholar
16. 16.
M. Griebel, W. Huber, Turbulence simulation on sparse grids using the combination method, in ed. by N. Satofuka, J. Periaux, A. Ecer, Proceedings Parallel Computational Fluid Dynamics, New Algorithms and Applications (CFD’94), Kyoto, Wiesbaden Braunschweig (Vieweg, 1995), pp. 75–84Google Scholar
17. 17.
M. Griebel, W. Huber, C. Zenger, Numerical turbulence simulation on a parallel computer using the combination method, in Flow Simulation on High Performance Computers II, Notes on Numerical Fluid Mechanics 52, pp. 34–47 (Vieweg, Wiesbaden 1996) DOI:10.1007/978-3-322-89849-4_4Google Scholar
18. 18.
M. Griebel, M. Schneider, C. Zenger, A combination technique for the solution of sparse grid problems, in Iterative Methods in Linear Algebra (IMACS/Elsevier, Amsterdam 1992), pp. 263–281
19. 19.
M. Griebel, V. Thurner, The efficient solution of fluid dynamics problems by the combination technique. Int. J. Numer. Methods Heat Fluid Flow 5, 51–69 (1995)
20. 20.
B. Harding, M. Hegland, A robust combination technique, in CTAC-2012. Volume 54 of ANZIAM Journal, 2013, pp. C394–C411Google Scholar
21. 21.
M. Holtz, Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance. Volume 77 of Lecture Notes in Computational Science and Engineering (Springer, Heidelberg, 2011)Google Scholar
22. 22.
J.-W. Hong, H.-T. Kung, I/O complexity: The red-blue pebble game, in Proceedings of STOC’81, New York (ACM, 1981), pp. 326–333Google Scholar
23. 23.
P. Hupp, Communication efficient algorithms for numerical problems on full and sparse grids, PhD thesis, ETH Zurich, 2014Google Scholar
24. 24.
P. Hupp, Performance of unidirectional hierarchization for component grids virtually maximized, in International Conference on Computational Science. Volume 29 of Procedia Computer Science (Elsevier, Amsterdam 2014), pp. 2272–2283Google Scholar
25. 25.
P. Hupp, M. Heene, R. Jacob, D. Pflüger, Global communication schemes for the numerical solution of high-dimensional PDEs. Parallel Comput. (2016). DOI:10.1016/j.parco.2015.12.006 Google Scholar
26. 26.
P. Hupp, R. Jacob, M. Heene, D. Pflüger, M. Hegland, Global communication schemes for the sparse grid combination technique. in Parallel Computing – Accelerating Computational Science and Engineering (CSE). Volume 25 of Advances in Parallel Computing (IOS Press, 2014), pp. 564–573Google Scholar
27. 27.
D. Irony, S. Toledo, A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64(9), 1017–1026 (2004)
28. 28.
R. Jacob, Efficient regular sparse grid hierarchization by a dynamic memory layout, in Sparse Grids and Applications 2012, Munich, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering (Springer, Cham/New York, 2014)pp. 195–219Google Scholar
29. 29.
C. Kowitz, M. Hegland, The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Comput. Sci. 18, 449–458 (2013). International Conference on Computational Science.Google Scholar
30. 30.
M.D. Lam, E.E. Rothberg, M.E. Wolf, The cache performance and optimizations of blocked algorithms. SIGPLAN Not. 26(4), 63–74 (1991)
31. 31.
A. Maheshwari, N. Zeh, A survey of techniques for designing I/O-efficient algorithms, in Algorithms for Memory Hierarchies. ed. by U. Meyer, P. Sanders, J. Sibeyn. Volume 2625 of Lecture Notes in Computer Science, pp. 36–61 (Springer, Berlin/Heidelberg, 2003)Google Scholar
32. 32.
A. Murarasu, J. Weidendorfer, G. Buse, D. Butnaru, D. Pflüger, Compact data structure and scalable algorithms for the sparse grid technique, in Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), San Antonio (ACM, 2011), pp. 25–34Google Scholar
33. 33.
A. F. Murarasu, G. Buse, D. Pflüger, J. Weidendorfer, A. Bode, fastsg: A fast routines library for sparse grids. Procedia CS 9, 354–363 (2012)Google Scholar
34. 34.
C. Pflaum, Convergence of the combination technique for second-order elliptic differential equations. SIAM J. Numer. Anal. 34(6), 2431–2455 (1997)
35. 35.
C. Pflaum, A. Zhou, Error analysis of the combination technique. Numer. Math. 84(2), 327–350 (1999)
36. 36.
D. Pflüger, Spatially adaptive sparse grids for high-dimensional problems, PhD thesis, Institut für Informatik, Technische Universität München, 2010Google Scholar
37. 37.
D. Pflüger, H.-J. Bungartz, M. Griebel, F. Jenko, T. Dannert, M. Heene, A. Parra Hinojosa, C. Kowitz, and P. Zaspel, Exahd: An exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond, in Euro-Par 2014: Parallel Processing Workshops. Volume 8806 of Lecture Notes in Computer Science (Springer, Cham 2014), pp. 565–576Google Scholar
38. 38.
H. Prokop, Cache-oblivious algorithms, Master’s thesis, Massachusetts Institute of Technology, 1999
39. 39.
C. Reisinger, Analysis of linear difference schemes in the sparse grid combination technique. IMA J. Numer. Anal. 33(2), 544–581 (2013)
40. 40.
S. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl. 4, 240–243 (1963)
41. 41.
C. Zenger, Sparse grids, in Parallel Algorithms for Partial Differential Equations. Volume 31 of Notes on Numerical Fluid Mechanics (Vieweg, Wiesbaden 1991), pp. 241–251Google Scholar