Skip to main content

Advertisement

Log in

A parallel pattern for iterative stencil + reduce

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. We omit the dimension n in \(\sigma ^n_k\) here, as we assume the dimension n is the same as that of the array a: a single dimensional array will have \(n=1\), a 2D matrix \(n=2\), and so on.

  2. The current implementation does not allow mixing of CPU and GPUs (or other accelerators) for deploying a single Loop-of-stencil-reduce instance.

  3. A n-GPU pattern is a pattern deployed onto n GPU devices.

  4. We implicitly define a FastFlowtask as the computation to be performed over a single stream item by a FastFlowpattern.

References

  1. Aldinucci M, Coppola M, Danelutto M, Vanneschi M, Zoccolo C (2006) ASSIST as a research framework for high-performance grid programming environments. In: Grid computing: software environments and tools, chap. 10. Springer, pp 230–256

  2. Aldinucci M, Danelutto M, Drocco M, Kilpatrick P, Peretti Pezzi G, Torquati M (2015) The loop-of-stencil-reduce paradigm. In: Proceedings of International Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms. IEEE, Helsinki

  3. Aldinucci M, Danelutto M, Kilpatrick P, Meneghin M, Torquati M (2011) Accelerating code on multi-cores with FastFlow. In: Proceedings of 17th International Euro-Par 2011 Parallel Processing, LNCS, vol 6853. Springer, Bordeaux, pp 170–181

    Chapter  Google Scholar 

  4. Aldinucci M, Danelutto M, Meneghin M, Torquati M, Kilpatrick P (2010) Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed, Advances in Parallel Computing, vol 19. Elsevier, Amsterdam

  5. Aldinucci M, Peretti Pezzi G, Drocco M, Spampinato C, Torquati M (2015) Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern. Int J High Perform Comput Appl 29(4):461–472. doi:10.1177/1094342014567907

    Article  Google Scholar 

  6. Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198

    Article  Google Scholar 

  7. Breuer S, Steuwer M, Gorlatch S (2014) Extending the SkelCL skeleton library for stencil computations on multi-GPU systems. In: Proceedings of the 1st International Workshop on High-performance Stencil Computations, Vienna, pp 15–21

  8. Bueno-Hedo J, Planas J, Duran A, Badia RM, Martorell X, Ayguadé E, Labarta J (2012) Productive programming of GPU clusters with OmpSs. In: 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012), pp 557–568

  9. Danelutto M, Torquati M (2015) Structured parallel programming with “core” fastFlow. In: Central European Functional Programming School, LNCS, vol 8606. Springer, pp 29–75

  10. Enmyren J, Kessler CW (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP ’10. ACM, New York, pp 5–14

  11. Ernsting S, Kuchen H (2011) Data parallel skeletons for GPU clusters and multi-GPU systems. In: Proceedings of PARCO 2011. IOS Press

  12. Garcia JD REPARA C++ open specification. Tech. Rep. ICT-609666-D2.1, REPARA EU FP7 project (2-14)

  13. Gardner M (1970) Mathematical games: the fantastic combinations of John Conway’s new solitaire game ‘Life’. Sci Am 223(4):120–123

    Article  Google Scholar 

  14. González-Vélez H, Leyton M (2010) A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software Pract Exp 40:12

    Article  Google Scholar 

  15. Khronos Compute Working Group: OpenACC Directives for Accelerators (2012). http://www.openacc-standard.org

  16. Lutz T, Fensch C, Cole M (2013) Partans: an autotuning framework for stencil computation on multi-gpu systems. ACM Trans Archit Code Optim 9(4):59:1–59:24

    Article  Google Scholar 

  17. Owens J (2007) SC 07, high performance computing with CUDA tutorial

  18. Steuwer M, Gorlatch S (2013) Skelcl: Enhancing opencl for high-level programming of multi-gpu systems. In: Proceedings of the 12th International Conference on Parallel Computing Technologies, St. Petersburg, pp 258–272

Download references

Acknowledgments

This work was supported by EU FP7 project REPARA (No. 609666), the EU H2020 Project RePhrase (No. 644235), and by the NVidia GPU Research Center at the University of Torino.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Drocco.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aldinucci, M., Danelutto, M., Drocco, M. et al. A parallel pattern for iterative stencil + reduce. J Supercomput 74, 5690–5705 (2018). https://doi.org/10.1007/s11227-016-1871-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1871-z

Keywords

Navigation