High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Fraguela, Basilio B.; Andrade, Diego

doi:10.1007/s11227-020-03607-1

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Published: 08 January 2021

Volume 77, pages 7676–7689, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

208 Accesses
1 Citation
Explore all metrics

Abstract

Dataflow computing is a very attractive paradigm for high-performance computing, given its ability to trigger computations as soon as their inputs are available. UPC++ DepSpawn is a novel task-based library that supports this model in hybrid shared/distributed memory systems on top of a Partitioned Global Address Space environment. While the initial version of the library provided good results, it suffered from a key restriction that heavily limited its performance and scalability. Namely, each process had to consider all the tasks in the application rather than only those of interest to it, an overhead that naturally grows with both the number of processes and tasks in the system. In this paper, this restriction is lifted, enabling our library to provide higher levels of performance. This way, in experiments using 768 cores the performance improved up to 40.1%, the average improvement being 16.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Article 15 February 2018

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

Article 24 February 2022

OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks

References

Agullo E, Aumage O, Faverge M, Furmento N, Pruvost F, Sergent M, Thibault S (2014) Harnessing clusters of hybrid nodes with a sequential task-based programming model. In: International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2014)
Augonnet C, Thibault S, Namyst R, Wacrenier P (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198
Article Google Scholar
Bachan J, Baden SB, Hofmeyr S, Jacquelin M, Kamil A, Bonachea D, Hargrove PH, Ahmed H (2019) UPC++: a high-performance communication framework for asynchronous computation. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 963–973
Bauer M, Treichler S, Slaughter E, Aiken A (2012) Legion: expressing locality and independence with logical regions. In: International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp 1–11
Bosilca G, Bouteiller A, Danalis A, Faverge M, Haidar A, Herault T, Kurzak J, Langou J, Lemarinier P, Ltaief H, Luszczek P, YarKhan A, Dongarra J (2011) Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1432–1441. https://doi.org/10.1109/IPDPS.2011.299
Bosilca G, Bouteiller A, Danalis A, Hérault T, Lemarinier P, Dongarra J (2012) DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput 38(1–2):37–51. https://doi.org/10.1016/j.parco.2011.10.003
Article Google Scholar
Bueno J, Martorell X, Badia RM, Ayguadé E, Labarta J (2013) Implementing OmpSs support for regions of data in architectures with multiple address spaces. In: 27th International Conference on Supercomputing, ICS ’13, pp 359–368
Burke MG, Knobe K, Newton R, Sarkar V (2005) UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab
Chamberlain B, Callahan D, Zima H (2007) Parallel programmability and the Chapel language. Int J High Perform Comput Appl 21(3):291–312. https://doi.org/10.1177/1094342007078442
Article Google Scholar
Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pp 519–538
Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: 28th Annual Hawaii International Conference on System Sciences, HICSS’28, vol 2, pp 113–122. https://doi.org/10.1109/HICSS.1995.375471
Danalis A, Jagode H, Bosilca G, Dongarra J (2015) PaRSEC in practice: optimizing a legacy chemistry application through distributed task-based execution. In: 2015 IEEE International Conference on Cluster Computing, pp 304–313. https://doi.org/10.1109/CLUSTER.2015.50
Fraguela BB (2017) A comparison of task parallel frameworks based on implicit dependencies in multi-core environments. In: 50th Hawaii International Conference on System Sciences, HICSS’50, pp 6202–6211. https://doi.org/10.24251/HICSS.2017.750
Fraguela BB, Andrade D (2019) Easy dataflow programming in clusters with UPC++ DepSpawn. IEEE Trans Parallel Distrib Syst 30(6):1267–1282. https://doi.org/10.1109/TPDS.2018.2884716
Article Google Scholar
González CH, Fraguela BB (2013) A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput 39(9):475–489. https://doi.org/10.1016/j.parco.2013.04.012
Article Google Scholar
Cray Inc (2017) Chapel language specification version 0.984
Koniges A, Cook B, Deslippe J, Kurth T, Shan H (2016) MPI usage at NERSC: present and future. In: 23rd European MPI Users’ Group Meeting, EuroMPI 2016, p 217. https://doi.org/10.1145/2966884.2966894
Nieplocha J, Palmer B, Tipparaju V, Krishnan M, Trease H, Aprà E (2006) Advances, applications and performance of the global arrays shared memory programming toolkit. Int J High Perform Comput Appl 20(2):203–231. https://doi.org/10.1177/1094342006064503
Article Google Scholar
Numrich RW, Reid J (1998) Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2):1–31. https://doi.org/10.1145/289918.289920
Article Google Scholar
Pugh W (1991) The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: 1991 ACM/IEEE Conference on Supercomputing, Supercomputing ’91, pp 4–13. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/125826.125848
Slaughter E, Lee W, Treichler S, Bauer M, Aiken A (2015) Regent: a high-productivity programming language for HPC with logical regions. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp 1–12 . https://doi.org/10.1145/2807591.2807629
Tejedor E, Farreras M, Grove D, Badia RM, Almasi G, Labarta J (2012) A high-productivity task-based programming model for clusters. Concurr Comput Pract Exp 24(18):2421–2448
Article Google Scholar
Wozniak JM, Armstrong TG, Wilde M, Katz DS, Lusk E, Foster IT (2013) Swift/T: large-scale application composition via distributed-memory dataflow processing. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp 95–102. https://doi.org/10.1109/CCGrid.2013.99
Yelick K, Bonachea D, Chen WY, Colella P, Datta K, Duell J, Graham SL, Hargrove P, Hilfinger P, Husbands P, Iancu C, Kamil A, Nishtala R, Su J, Welcome M, Wen T (2007) Productivity and performance using partitioned global address space languages. In: Proceedings 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp 24–32. https://doi.org/10.1145/1278177.1278183
Yelick KA, Graham SL, Hilfinger PN, Bonachea D, Su J, Kamil A, Datta K, Colella P, Wen T (2011) Titanium. In: Encyclopedia of Parallel Computing, pp 2049–2055. Springer US
Zheng Y, Kamil A, Driscoll MB, Shan H, Yelick K (2014) UPC++: a PGAS extension for C++. In: IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS 2014), pp 1105–1114

Download references

Acknowledgements

This research was supported by the Ministry of Science and Innovation of Spain (TIN2016-75845-P and PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033), and by the Xunta de Galicia co-funded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04). We acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union (European Regional Development Fund—Galicia 2014–2020 Program), by Grant ED431G 2019/01. We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.

Author information

Authors and Affiliations

Universidade da Coruña, CITIC-Research Center of Information and Communication Technologies, A Coruña, 15071, Spain
Basilio B. Fraguela & Diego Andrade

Authors

Basilio B. Fraguela
View author publications
You can also search for this author in PubMed Google Scholar
Diego Andrade
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Basilio B. Fraguela.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fraguela, B.B., Andrade, D. High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn. J Supercomput 77, 7676–7689 (2021). https://doi.org/10.1007/s11227-020-03607-1

Download citation

Accepted: 28 December 2020
Published: 08 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11227-020-03607-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Abstract

Access this article

Similar content being viewed by others

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

Abstract

Access this article

Similar content being viewed by others

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

The Impact of Cache and Dynamic Memory Management in Static Dataflow Applications

OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation