Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures

Shingde, Nitish; Berzins, Martin; Blattner, Timothy; Keyrouz, Walid; Bardakoff, Alexandre

doi:10.1007/978-3-031-32316-4_1

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13861))

Included in the following conference series:

Workshop on Asynchronous Many-Task Systems and Applications

102 Accesses

Abstract

Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the National Institute of Standards and Technology’s Hedgehog dataflow graph models, which target a single high-end compute node, to run on a cluster by borrowing aspects of Uintah’s cluster-scale task graphs and applying them to a sample implementation of matrix multiplication. These results are compared to implementations using the leading libraries, SLATE and DPLASMA, for illustrative purposes only. The motivation behind this work is to demonstrate that using general purpose high-level abstractions, such as Hedgehog’s dataflow graphs, does not negatively impact performance.

Supported by organization NIST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bardakoff, A., Bachelet, B., Blattner, T., Keyrouz, W., Kroiz, G.C., Yon, L.: Hedgehog: understandable scheduler-free heterogeneous asynchronous multithreaded data-flow graphs. In: 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), pp. 1–15 (2020). https://doi.org/10.1109/PAWATM51920.2020.00006
Herault, T., Robert, Y., Bosilca, G., Dongarra, J.: Generic matrix multiplication for multi-GPU accelerated distributed-memory platforms over PaRSEC. In: 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 33–41 (2019). https://doi.org/10.1109/ScalA49573.2019.00010
Kurzak, J., Gates, M., Charara, A., YarKhan, A., Yamazaki, I., Dongarra, J.: Linear systems solvers for distributed-memory machines with GPU accelerators. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 495–506. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_35
Chapter Google Scholar
Gates, M., Kurzak, J., Charara, A., YarKhan, A., Dongarra, J.: SLATE: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2019), Article 26, pp. 1–18. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3295500.3356223
Bosilca, G., et al.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 1432–1441 (2011). https://doi.org/10.1109/IPDPS.2011.299
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press (2012)
Google Scholar
Berzins, M., et al.: Extending the uintah framework through the petascale modeling of detonation in arrays of high explosive devices. SIAM J. Sci. Comput. 38(5), 101–122 (2016)
Article MathSciNet MATH Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)
Article Google Scholar
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Article Google Scholar
Holmen, J.K., Sahasrabudhe, D., Berzins, M., Bardakoff, A., Blattner, T.J., Keyrouz, W.: Uintah+hedgehog: combining parallelism models for end-to-end large-scale simulation performance. Scientific Computing and Imaging Institute (2021)
Google Scholar
Holmen, J.K., Sahasrabudhe, D., Berzins, M.: A heterogeneous MPI+PPL task scheduling approach for asynchronous many-task runtime systems. In: Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC 2021). ACM (2021)
Google Scholar
Holmen, J.K., Peterson, B., Berzins, M.: An approach for indirectly adopting a performance portability layer in large legacy codes. In: 2nd International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), SC 2019 (2019)
Google Scholar
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, OR, USA) (PGAS 2014), Article 6. ACM, New York (2014)
Google Scholar
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications (Washington, D.C., USA) (OOPSLA 1993), pp. 91–108. ACM, New York (1993)
Google Scholar
Meng, Q., Humphrey, A., Berzins, M.: The uintah framework: a unified heterogeneous task scheduling and runtime system. In: Digital Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, WOLFHPC 2012 Workshop, pp. 2441–2448 (2012)
Google Scholar
Holmen, J.K., Sahasrabudhe, D., Berzins, M.: Porting uintah to heterogeneous systems. In: Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22) Best Paper Award. ACM (2022)
Google Scholar
Vandevoorde, D., Josuttis, N.M., Gregor, D.: C++ Templates: The Complete Guide, 2nd edn. Addison-Wesley Professional (2017). ISBN 0321714121
Google Scholar
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_80
Chapter Google Scholar
Blumofe, R.D., Leiserson, C.E.: Space-efficient scheduling of multithreaded computations. SIAM J. Comput. 27(1), 202–229 (1998)
Article MathSciNet MATH Google Scholar
Bardakoff, A.: Analysis and Execution of a Data-Flow Graph Explicit Model Using Static Metaprogramming. Université Clermont Auvergne (2021). https://theses.hal.science/tel-03813645

Download references

Author information

Authors and Affiliations

University of Utah, Salt Lake City, UT, 84112, USA
Nitish Shingde & Martin Berzins
National Institute of Standards and Technology, Gaithersburg, MD, USA
Timothy Blattner, Walid Keyrouz & Alexandre Bardakoff

Authors

Nitish Shingde
View author publications
You can also search for this author in PubMed Google Scholar
Martin Berzins
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Blattner
View author publications
You can also search for this author in PubMed Google Scholar
Walid Keyrouz
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bardakoff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitish Shingde .

Editor information

Editors and Affiliations

Louisiana State University, CCT, Baton Rouge, LA, USA
Patrick Diehl
University of Innsbruck, Innsbruck, Austria
Peter Thoman
Louisiana State University, CCT, Baton Rouge, LA, USA
Hartmut Kaiser
University of Illinois at Urbana-Champaign, Urbana, IL, USA
Laxmikant Kale

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shingde, N., Berzins, M., Blattner, T., Keyrouz, W., Bardakoff, A. (2023). Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures. In: Diehl, P., Thoman, P., Kaiser, H., Kale, L. (eds) Asynchronous Many-Task Systems and Applications. WAMTA 2023. Lecture Notes in Computer Science, vol 13861. Springer, Cham. https://doi.org/10.1007/978-3-031-32316-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-32316-4_1
Published: 11 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32315-7
Online ISBN: 978-3-031-32316-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures