Abstract
In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for “Parallel Autotuned Stencils,” generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and optimization to be applied, and leverages the autotuning methodology to optimize strategy-specific parameters for the given hardware architecture.
Similar content being viewed by others
References
Christen M, Schenk O, Neufeld E, Paulides M, Burkhart H (2010) Manycore stencil computations in hyperthermia applications. In: Scientific computing with multicore and accelerators. CRC Press, Boca Raton, pp 255–277
Datta K, Kamil S, Williams S, Oliker L, Shalf J, Yelick K (2008, to appear) Optimization and performance modeling of stencil computations on modern microprocessors, SIAM Rev
Frigo M, Strumpen V (2005) Cache oblivious stencil computations. In: ICS’05: proceedings of the 19th annual international conference on supercomputing. ACM, New York, pp 361–366
Goumas G, Athanasaki M, Koziris N (2003) An efficient code generation technique for tiled iteration spaces. IEEE Trans Parallel Distrib Syst 14:1021–1034
Hall M, Chame J, Chen C, Shin J, Rudy G, Khan M (2010) Loop transformation recipes for code generation and auto-tuning. In: Gao G, Pollock L, Cavazos J, Li X (eds) Languages and compilers for parallel computing. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, pp 50–64
Kamil S, Chan C, Oliker L, Shalf J, Williams S (2010) An auto-tuning framework for parallel multicore stencil computations. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp 1–12
Li Z, Song Y (2004) Automatic tiling of iterative stencil loops. ACM Trans Program Lang Syst 26(6):975–1028
Meng J, Skadron K (2011) A performance study for iterative stencil loops on GPUs with ghost zone optimizations. Int J Parallel Program 39:115–142. doi:10.1007/s10766-010-0142-5
Renganarayanan L, Kim D, Rajopadhye S, Strout M (2007) Parameterized tiled loops for free. ACM SIGPLAN Not 42:405–414
Rivera G, Tseng C (2000) Tiling optimizations for 3D scientific computations. In: Supercomputing, ACM/IEEE 2000 conference
Strzodka R, Shaheen M, Pajak D, Seidel H (2010) Cache oblivious parallelograms in iterative stencil computations. In: ICS’10: proceedings of the 24th ACM international conference on supercomputing, pp 49–59
Wellein G, Hager G, Zeiser T, Wittmann M, Fehske H (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: COMPSAC(1), pp 579–586
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Christen, M., Schenk, O. & Burkhart, H. Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Comput Sci Res Dev 26, 205–210 (2011). https://doi.org/10.1007/s00450-011-0160-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-011-0160-6
