Skip to main content
Log in

High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Dataflow computing is a very attractive paradigm for high-performance computing, given its ability to trigger computations as soon as their inputs are available. UPC++ DepSpawn is a novel task-based library that supports this model in hybrid shared/distributed memory systems on top of a Partitioned Global Address Space environment. While the initial version of the library provided good results, it suffered from a key restriction that heavily limited its performance and scalability. Namely, each process had to consider all the tasks in the application rather than only those of interest to it, an overhead that naturally grows with both the number of processes and tasks in the system. In this paper, this restriction is lifted, enabling our library to provide higher levels of performance. This way, in experiments using 768 cores the performance improved up to 40.1%, the average improvement being 16.1%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Agullo E, Aumage O, Faverge M, Furmento N, Pruvost F, Sergent M, Thibault S (2014) Harnessing clusters of hybrid nodes with a sequential task-based programming model. In: International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2014)

  2. Augonnet C, Thibault S, Namyst R, Wacrenier P (2011) StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr Comput Pract Exp 23(2):187–198

    Article  Google Scholar 

  3. Bachan J, Baden SB, Hofmeyr S, Jacquelin M, Kamil A, Bonachea D, Hargrove PH, Ahmed H (2019) UPC++: a high-performance communication framework for asynchronous computation. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp 963–973

  4. Bauer M, Treichler S, Slaughter E, Aiken A (2012) Legion: expressing locality and independence with logical regions. In: International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp 1–11

  5. Bosilca G, Bouteiller A, Danalis A, Faverge M, Haidar A, Herault T, Kurzak J, Langou J, Lemarinier P, Ltaief H, Luszczek P, YarKhan A, Dongarra J (2011) Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 1432–1441. https://doi.org/10.1109/IPDPS.2011.299

  6. Bosilca G, Bouteiller A, Danalis A, Hérault T, Lemarinier P, Dongarra J (2012) DAGuE: a generic distributed DAG engine for high performance computing. Parallel Comput 38(1–2):37–51. https://doi.org/10.1016/j.parco.2011.10.003

    Article  Google Scholar 

  7. Bueno J, Martorell X, Badia RM, Ayguadé E, Labarta J (2013) Implementing OmpSs support for regions of data in architectures with multiple address spaces. In: 27th International Conference on Supercomputing, ICS ’13, pp 359–368

  8. Burke MG, Knobe K, Newton R, Sarkar V (2005) UPC language specifications, v1.2. Technical Report LBNL-59208, Lawrence Berkeley National Lab

  9. Chamberlain B, Callahan D, Zima H (2007) Parallel programmability and the Chapel language. Int J High Perform Comput Appl 21(3):291–312. https://doi.org/10.1177/1094342007078442

    Article  Google Scholar 

  10. Charles P, Grothoff C, Saraswat V, Donawa C, Kielstra A, Ebcioglu K, von Praun C, Sarkar V (2005) X10: an object-oriented approach to non-uniform cluster computing. In: 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pp 519–538

  11. Cosnard M, Loi M (1995) Automatic task graph generation techniques. In: 28th Annual Hawaii International Conference on System Sciences, HICSS’28, vol 2, pp 113–122. https://doi.org/10.1109/HICSS.1995.375471

  12. Danalis A, Jagode H, Bosilca G, Dongarra J (2015) PaRSEC in practice: optimizing a legacy chemistry application through distributed task-based execution. In: 2015 IEEE International Conference on Cluster Computing, pp 304–313. https://doi.org/10.1109/CLUSTER.2015.50

  13. Fraguela BB (2017) A comparison of task parallel frameworks based on implicit dependencies in multi-core environments. In: 50th Hawaii International Conference on System Sciences, HICSS’50, pp 6202–6211. https://doi.org/10.24251/HICSS.2017.750

  14. Fraguela BB, Andrade D (2019) Easy dataflow programming in clusters with UPC++ DepSpawn. IEEE Trans Parallel Distrib Syst 30(6):1267–1282. https://doi.org/10.1109/TPDS.2018.2884716

    Article  Google Scholar 

  15. González CH, Fraguela BB (2013) A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput 39(9):475–489. https://doi.org/10.1016/j.parco.2013.04.012

    Article  Google Scholar 

  16. Cray Inc (2017) Chapel language specification version 0.984

  17. Koniges A, Cook B, Deslippe J, Kurth T, Shan H (2016) MPI usage at NERSC: present and future. In: 23rd European MPI Users’ Group Meeting, EuroMPI 2016, p 217. https://doi.org/10.1145/2966884.2966894

  18. Nieplocha J, Palmer B, Tipparaju V, Krishnan M, Trease H, Aprà E (2006) Advances, applications and performance of the global arrays shared memory programming toolkit. Int J High Perform Comput Appl 20(2):203–231. https://doi.org/10.1177/1094342006064503

    Article  Google Scholar 

  19. Numrich RW, Reid J (1998) Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2):1–31. https://doi.org/10.1145/289918.289920

    Article  Google Scholar 

  20. Pugh W (1991) The Omega test: a fast and practical integer programming algorithm for dependence analysis. In: 1991 ACM/IEEE Conference on Supercomputing, Supercomputing ’91, pp 4–13. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/125826.125848

  21. Slaughter E, Lee W, Treichler S, Bauer M, Aiken A (2015) Regent: a high-productivity programming language for HPC with logical regions. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’15, pp 1–12 . https://doi.org/10.1145/2807591.2807629

  22. Tejedor E, Farreras M, Grove D, Badia RM, Almasi G, Labarta J (2012) A high-productivity task-based programming model for clusters. Concurr Comput Pract Exp 24(18):2421–2448

    Article  Google Scholar 

  23. Wozniak JM, Armstrong TG, Wilde M, Katz DS, Lusk E, Foster IT (2013) Swift/T: large-scale application composition via distributed-memory dataflow processing. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp 95–102. https://doi.org/10.1109/CCGrid.2013.99

  24. Yelick K, Bonachea D, Chen WY, Colella P, Datta K, Duell J, Graham SL, Hargrove P, Hilfinger P, Husbands P, Iancu C, Kamil A, Nishtala R, Su J, Welcome M, Wen T (2007) Productivity and performance using partitioned global address space languages. In: Proceedings 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp 24–32. https://doi.org/10.1145/1278177.1278183

  25. Yelick KA, Graham SL, Hilfinger PN, Bonachea D, Su J, Kamil A, Datta K, Colella P, Wen T (2011) Titanium. In: Encyclopedia of Parallel Computing, pp 2049–2055. Springer US

  26. Zheng Y, Kamil A, Driscoll MB, Shan H, Yelick K (2014) UPC++: a PGAS extension for C++. In: IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS 2014), pp 1105–1114

Download references

Acknowledgements

This research was supported by the Ministry of Science and Innovation of Spain (TIN2016-75845-P and PID2019-104184RB-I00, AEI/FEDER/EU, 10.13039/501100011033), and by the Xunta de Galicia co-funded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2017/04). We acknowledge also the support from the Centro Singular de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union (European Regional Development Fund—Galicia 2014–2020 Program), by Grant ED431G 2019/01. We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Basilio B. Fraguela.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fraguela, B.B., Andrade, D. High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn. J Supercomput 77, 7676–7689 (2021). https://doi.org/10.1007/s11227-020-03607-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03607-1

Keywords

Navigation