Abstract
Stencil computations on regular grids are widely used in scientific simulations. Optimization techniques for such stencil computations typically exploit temporal locality across time steps. More complex stencil applications, like those in meteorology and seismic simulations, cannot easily take advantage of these techniques, since the number of physical fields and computation stages to consider at each time step flush all data present in the cache at the beginning of the next time step. In this paper we present a technique for improving performance of such computations, based only on spatial tiling, which is implemented as a generic algorithm.
More specifically, we investigate how to take advantage of producer-consumer relations of stencil loops, in a single time step, to improve memory hierarchy utilization. This approach makes it possible to balance computation and communication to improve resource usage. We implement our methods using generic programming constructs of C++, which we compare with hand-tuned implementations of the stencils. The results show that this technique can improve both single-threaded and multi-threaded performance to closely match that of hand-tuned implementations, with the convenience of a high-level specification.
Chapter PDF
References
Bandishti, V., Pananilath, I., Bondhugula, U.: Tiling stencil computations to maximize parallelism. In: Proc. of the 2012 ACM/IEEE Conference on Supercomputing, SC 2012, pp. 40:1–40:11. IEEE Computer Society Press, Los Alamitos (2012)
Bianco, M., Varetto, U.: A generic library for stencil computations. CoRR, abs/1207.1746 (2012)
Christen, M., Schenk, O., Cui, Y.: Patus for convenient high-performance stencils: Evaluation in earthquake simulations. In: SC, p. 11 (2012)
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Rev. 51, 129–159 (2009)
Doms, G., Schätter, U.: A description of the nonhydrostatic regional model lm, part i, dynamics and numerics (2002)
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: Proc. of the 19th Annual International Conference on Supercomputing, ICS 2005, pp. 361–366. ACM, New York (2005)
Kamil, S., Chan, C., Oliker, L., Shalf, J., Williams, S.: An auto-tuning framework for parallel multicore stencil computations. In: IPDPS, IPPS 2010, pp. 1–12 (2010)
Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: An implicitly parallel programming model for stencil computations on large-scale gpu-accelerated supercomputers. In: Proc. of 2011 ACM/IEEE Conference on Supercomputing, SC 2011, pp. 11:1–11:12. ACM, New York (2011)
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-d blocking optimization for stencil computations on modern cpus and gpus. In: Proc. of the 2010 ACM/IEEE Conference on Supercomputing, SC 2010, pp. 1–13. IEEE Computer Society, Washington, DC (2010)
Rivera, G., Tseng, C.-W.: Tiling optimizations for 3D scientific computations. In: Proc. of the 2000 ACM/IEEE Conference on Supercomputing, SC 2000. IEEE Computer Society, Washington, DC (2000)
Rojas, O., Dunham, E.M., Day, S.M., Dalguer, L.A., Castillo, J.E.: Finite difference modelling of rupture propagation with strong velocity-weakening friction. Geophysical Journal International 179(3), 1831–1858 (2009)
Shimokawabe, T., Aoki, T., Takaki, T., Endo, T., Yamanaka, A., Maruyama, N., Nukada, A., Matsuoka, S.: Peta-scale phase-field simulation for dendritic solidification on the tsubame 2.0 supercomputer. In: Proc. of 2011 ACM/IEEE Conference on Supercomputing, SC 2011, pp. 3:1–3:11. ACM, New York (2011)
Strzodka, R., Shaheen, M., Pajak, D., Seidel, H.-P.: Cache oblivious parallelograms in iterative stencil computations. In: Proc. of the 24th ACM International Conference on Supercomputing, ICS 2010, pp. 49–59. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bianco, M., Cumming, B. (2014). A Generic Strategy for Multi-stage Stencils. In: Silva, F., Dutra, I., Santos Costa, V. (eds) Euro-Par 2014 Parallel Processing. Euro-Par 2014. Lecture Notes in Computer Science, vol 8632. Springer, Cham. https://doi.org/10.1007/978-3-319-09873-9_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-09873-9_49
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09872-2
Online ISBN: 978-3-319-09873-9
eBook Packages: Computer ScienceComputer Science (R0)