A parallel pattern for iterative stencil + reduce

Aldinucci, M.; Danelutto, M.; Drocco, M.; Kilpatrick, P.; Misale, C.; Peretti Pezzi, G.; Torquati, M.

doi:10.1007/s11227-016-1871-z

A parallel pattern for iterative stencil + reduce

Published: 08 September 2016

Volume 74, pages 5690–5705, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

M. Aldinucci²,
M. Danelutto¹,
M. Drocco²,
P. Kilpatrick³,
C. Misale²,
G. Peretti Pezzi⁴ &
…
M. Torquati¹

333 Accesses
9 Citations
8 Altmetric
Explore all metrics

Abstract

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured Parallel Programming with “core” FastFlow

Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Article Open access 23 July 2022

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Article Open access 14 January 2023

Notes

We omit the dimension n in \(\sigma ^n_k\) here, as we assume the dimension n is the same as that of the array a: a single dimensional array will have \(n=1\), a 2D matrix \(n=2\), and so on.
The current implementation does not allow mixing of CPU and GPUs (or other accelerators) for deploying a single Loop-of-stencil-reduce instance.
A n-GPU pattern is a pattern deployed onto n GPU devices.
We implicitly define a FastFlowtask as the computation to be performed over a single stream item by a FastFlowpattern.

References

Aldinucci M, Coppola M, Danelutto M, Vanneschi M, Zoccolo C (2006) ASSIST as a research framework for high-performance grid programming environments. In: Grid computing: software environments and tools, chap. 10. Springer, pp 230–256
Aldinucci M, Danelutto M, Drocco M, Kilpatrick P, Peretti Pezzi G, Torquati M (2015) The loop-of-stencil-reduce paradigm. In: Proceedings of International Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms. IEEE, Helsinki
Aldinucci M, Danelutto M, Kilpatrick P, Meneghin M, Torquati M (2011) Accelerating code on multi-cores with FastFlow. In: Proceedings of 17th International Euro-Par 2011 Parallel Processing, LNCS, vol 6853. Springer, Bordeaux, pp 170–181
Chapter Google Scholar
Aldinucci M, Danelutto M, Meneghin M, Torquati M, Kilpatrick P (2010) Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed, Advances in Parallel Computing, vol 19. Elsevier, Amsterdam
Aldinucci M, Peretti Pezzi G, Drocco M, Spampinato C, Torquati M (2015) Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern. Int J High Perform Comput Appl 29(4):461–472. doi:10.1177/1094342014567907
Article Google Scholar
Augonnet C, Thibault S, Namyst R, Wacrenier PA (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198
Article Google Scholar
Breuer S, Steuwer M, Gorlatch S (2014) Extending the SkelCL skeleton library for stencil computations on multi-GPU systems. In: Proceedings of the 1st International Workshop on High-performance Stencil Computations, Vienna, pp 15–21
Bueno-Hedo J, Planas J, Duran A, Badia RM, Martorell X, Ayguadé E, Labarta J (2012) Productive programming of GPU clusters with OmpSs. In: 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2012), pp 557–568
Danelutto M, Torquati M (2015) Structured parallel programming with “core” fastFlow. In: Central European Functional Programming School, LNCS, vol 8606. Springer, pp 29–75
Enmyren J, Kessler CW (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-level Parallel Programming and Applications, HLPP ’10. ACM, New York, pp 5–14
Ernsting S, Kuchen H (2011) Data parallel skeletons for GPU clusters and multi-GPU systems. In: Proceedings of PARCO 2011. IOS Press
Garcia JD REPARA C++ open specification. Tech. Rep. ICT-609666-D2.1, REPARA EU FP7 project (2-14)
Gardner M (1970) Mathematical games: the fantastic combinations of John Conway’s new solitaire game ‘Life’. Sci Am 223(4):120–123
Article Google Scholar
González-Vélez H, Leyton M (2010) A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software Pract Exp 40:12
Article Google Scholar
Khronos Compute Working Group: OpenACC Directives for Accelerators (2012). http://www.openacc-standard.org
Lutz T, Fensch C, Cole M (2013) Partans: an autotuning framework for stencil computation on multi-gpu systems. ACM Trans Archit Code Optim 9(4):59:1–59:24
Article Google Scholar
Owens J (2007) SC 07, high performance computing with CUDA tutorial
Steuwer M, Gorlatch S (2013) Skelcl: Enhancing opencl for high-level programming of multi-gpu systems. In: Proceedings of the 12th International Conference on Parallel Computing Technologies, St. Petersburg, pp 258–272

Download references

Acknowledgments

This work was supported by EU FP7 project REPARA (No. 609666), the EU H2020 Project RePhrase (No. 644235), and by the NVidia GPU Research Center at the University of Torino.

Author information

Authors and Affiliations

Department of Computer Science, University of Pisa, Pisa, Italy
M. Danelutto & M. Torquati
Department of Computer Science, University of Turin, Turin, Italy
M. Aldinucci, M. Drocco & C. Misale
Department of Computer Science, Queen’s University Belfast, Belfast, UK
P. Kilpatrick
Swiss National Supercomputing Centre, Lugano, Switzerland
G. Peretti Pezzi

Authors

M. Aldinucci
View author publications
You can also search for this author in PubMed Google Scholar
M. Danelutto
View author publications
You can also search for this author in PubMed Google Scholar
M. Drocco
View author publications
You can also search for this author in PubMed Google Scholar
P. Kilpatrick
View author publications
You can also search for this author in PubMed Google Scholar
C. Misale
View author publications
You can also search for this author in PubMed Google Scholar
G. Peretti Pezzi
View author publications
You can also search for this author in PubMed Google Scholar
M. Torquati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Drocco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aldinucci, M., Danelutto, M., Drocco, M. et al. A parallel pattern for iterative stencil + reduce. J Supercomput 74, 5690–5705 (2018). https://doi.org/10.1007/s11227-016-1871-z

Download citation

Published: 08 September 2016
Issue Date: November 2018
DOI: https://doi.org/10.1007/s11227-016-1871-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parallel pattern for iterative stencil + reduce

Abstract

Access this article

Similar content being viewed by others

Structured Parallel Programming with “core” FastFlow

Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parallel pattern for iterative stencil + reduce

Abstract

Access this article

Similar content being viewed by others

Structured Parallel Programming with “core” FastFlow

Stencil Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

EPSILOD: efficient parallel skeleton for generic iterative stencil computations in distributed GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation