Abstract
Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the National Institute of Standards and Technology’s Hedgehog dataflow graph models, which target a single high-end compute node, to run on a cluster by borrowing aspects of Uintah’s cluster-scale task graphs and applying them to a sample implementation of matrix multiplication. These results are compared to implementations using the leading libraries, SLATE and DPLASMA, for illustrative purposes only. The motivation behind this work is to demonstrate that using general purpose high-level abstractions, such as Hedgehog’s dataflow graphs, does not negatively impact performance.
Supported by organization NIST.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bardakoff, A., Bachelet, B., Blattner, T., Keyrouz, W., Kroiz, G.C., Yon, L.: Hedgehog: understandable scheduler-free heterogeneous asynchronous multithreaded data-flow graphs. In: 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), pp. 1–15 (2020). https://doi.org/10.1109/PAWATM51920.2020.00006
Herault, T., Robert, Y., Bosilca, G., Dongarra, J.: Generic matrix multiplication for multi-GPU accelerated distributed-memory platforms over PaRSEC. In: 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 33–41 (2019). https://doi.org/10.1109/ScalA49573.2019.00010
Kurzak, J., Gates, M., Charara, A., YarKhan, A., Yamazaki, I., Dongarra, J.: Linear systems solvers for distributed-memory machines with GPU accelerators. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 495–506. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_35
Gates, M., Kurzak, J., Charara, A., YarKhan, A., Dongarra, J.: SLATE: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2019), Article 26, pp. 1–18. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3295500.3356223
Bosilca, G., et al.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 1432–1441 (2011). https://doi.org/10.1109/IPDPS.2011.299
Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press (2012)
Berzins, M., et al.: Extending the uintah framework through the petascale modeling of detonation in arrays of high explosive devices. SIAM J. Sci. Comput. 38(5), 101–122 (2016)
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Holmen, J.K., Sahasrabudhe, D., Berzins, M., Bardakoff, A., Blattner, T.J., Keyrouz, W.: Uintah+hedgehog: combining parallelism models for end-to-end large-scale simulation performance. Scientific Computing and Imaging Institute (2021)
Holmen, J.K., Sahasrabudhe, D., Berzins, M.: A heterogeneous MPI+PPL task scheduling approach for asynchronous many-task runtime systems. In: Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC 2021). ACM (2021)
Holmen, J.K., Peterson, B., Berzins, M.: An approach for indirectly adopting a performance portability layer in large legacy codes. In: 2nd International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), SC 2019 (2019)
Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, OR, USA) (PGAS 2014), Article 6. ACM, New York (2014)
Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications (Washington, D.C., USA) (OOPSLA 1993), pp. 91–108. ACM, New York (1993)
Meng, Q., Humphrey, A., Berzins, M.: The uintah framework: a unified heterogeneous task scheduling and runtime system. In: Digital Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, WOLFHPC 2012 Workshop, pp. 2441–2448 (2012)
Holmen, J.K., Sahasrabudhe, D., Berzins, M.: Porting uintah to heterogeneous systems. In: Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22) Best Paper Award. ACM (2022)
Vandevoorde, D., Josuttis, N.M., Gregor, D.: C++ Templates: The Complete Guide, 2nd edn. Addison-Wesley Professional (2017). ISBN 0321714121
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_80
Blumofe, R.D., Leiserson, C.E.: Space-efficient scheduling of multithreaded computations. SIAM J. Comput. 27(1), 202–229 (1998)
Bardakoff, A.: Analysis and Execution of a Data-Flow Graph Explicit Model Using Static Metaprogramming. Université Clermont Auvergne (2021). https://theses.hal.science/tel-03813645
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shingde, N., Berzins, M., Blattner, T., Keyrouz, W., Bardakoff, A. (2023). Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures. In: Diehl, P., Thoman, P., Kaiser, H., Kale, L. (eds) Asynchronous Many-Task Systems and Applications. WAMTA 2023. Lecture Notes in Computer Science, vol 13861. Springer, Cham. https://doi.org/10.1007/978-3-031-32316-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-32316-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32315-7
Online ISBN: 978-3-031-32316-4
eBook Packages: Computer ScienceComputer Science (R0)