Skip to main content

Advertisement

Log in

Automatic code generation and tuning for stencil kernels on modern shared memory architectures

  • Special Issue Paper
  • Published:
Computer Science - Research and Development

Abstract

In this paper, we present Patus, a code generation and auto-tuning framework for stencil computations targeted at multi- and manycore processors, such as multicore CPUs and graphics processing units. Patus, which stands for “Parallel Autotuned Stencils,” generates a compute kernel from a specification of the stencil operation and a strategy which describes the parallelization and optimization to be applied, and leverages the autotuning methodology to optimize strategy-specific parameters for the given hardware architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Christen M, Schenk O, Neufeld E, Paulides M, Burkhart H (2010) Manycore stencil computations in hyperthermia applications. In: Scientific computing with multicore and accelerators. CRC Press, Boca Raton, pp 255–277

    Chapter  Google Scholar 

  2. Datta K, Kamil S, Williams S, Oliker L, Shalf J, Yelick K (2008, to appear) Optimization and performance modeling of stencil computations on modern microprocessors, SIAM Rev

  3. Frigo M, Strumpen V (2005) Cache oblivious stencil computations. In: ICS’05: proceedings of the 19th annual international conference on supercomputing. ACM, New York, pp 361–366

    Chapter  Google Scholar 

  4. Goumas G, Athanasaki M, Koziris N (2003) An efficient code generation technique for tiled iteration spaces. IEEE Trans Parallel Distrib Syst 14:1021–1034

    Article  Google Scholar 

  5. Hall M, Chame J, Chen C, Shin J, Rudy G, Khan M (2010) Loop transformation recipes for code generation and auto-tuning. In: Gao G, Pollock L, Cavazos J, Li X (eds) Languages and compilers for parallel computing. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, pp 50–64

    Chapter  Google Scholar 

  6. Kamil S, Chan C, Oliker L, Shalf J, Williams S (2010) An auto-tuning framework for parallel multicore stencil computations. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp 1–12

    Chapter  Google Scholar 

  7. Li Z, Song Y (2004) Automatic tiling of iterative stencil loops. ACM Trans Program Lang Syst 26(6):975–1028

    Article  Google Scholar 

  8. Meng J, Skadron K (2011) A performance study for iterative stencil loops on GPUs with ghost zone optimizations. Int J Parallel Program 39:115–142. doi:10.1007/s10766-010-0142-5

    Article  Google Scholar 

  9. Renganarayanan L, Kim D, Rajopadhye S, Strout M (2007) Parameterized tiled loops for free. ACM SIGPLAN Not 42:405–414

    Article  Google Scholar 

  10. Rivera G, Tseng C (2000) Tiling optimizations for 3D scientific computations. In: Supercomputing, ACM/IEEE 2000 conference

    Google Scholar 

  11. Strzodka R, Shaheen M, Pajak D, Seidel H (2010) Cache oblivious parallelograms in iterative stencil computations. In: ICS’10: proceedings of the 24th ACM international conference on supercomputing, pp 49–59

    Chapter  Google Scholar 

  12. Wellein G, Hager G, Zeiser T, Wittmann M, Fehske H (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: COMPSAC(1), pp 579–586

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Christen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Christen, M., Schenk, O. & Burkhart, H. Automatic code generation and tuning for stencil kernels on modern shared memory architectures. Comput Sci Res Dev 26, 205–210 (2011). https://doi.org/10.1007/s00450-011-0160-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-011-0160-6

Keywords