Skip to main content

Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures

  • Conference paper
  • First Online:
Asynchronous Many-Task Systems and Applications (WAMTA 2023)

Abstract

Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the National Institute of Standards and Technology’s Hedgehog dataflow graph models, which target a single high-end compute node, to run on a cluster by borrowing aspects of Uintah’s cluster-scale task graphs and applying them to a sample implementation of matrix multiplication. These results are compared to implementations using the leading libraries, SLATE and DPLASMA, for illustrative purposes only. The motivation behind this work is to demonstrate that using general purpose high-level abstractions, such as Hedgehog’s dataflow graphs, does not negatively impact performance.

Supported by organization NIST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bardakoff, A., Bachelet, B., Blattner, T., Keyrouz, W., Kroiz, G.C., Yon, L.: Hedgehog: understandable scheduler-free heterogeneous asynchronous multithreaded data-flow graphs. In: 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+X (PAW-ATM), pp. 1–15 (2020). https://doi.org/10.1109/PAWATM51920.2020.00006

  2. Herault, T., Robert, Y., Bosilca, G., Dongarra, J.: Generic matrix multiplication for multi-GPU accelerated distributed-memory platforms over PaRSEC. In: 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 33–41 (2019). https://doi.org/10.1109/ScalA49573.2019.00010

  3. Kurzak, J., Gates, M., Charara, A., YarKhan, A., Yamazaki, I., Dongarra, J.: Linear systems solvers for distributed-memory machines with GPU accelerators. In: Yahyapour, R. (ed.) Euro-Par 2019. LNCS, vol. 11725, pp. 495–506. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29400-7_35

    Chapter  Google Scholar 

  4. Gates, M., Kurzak, J., Charara, A., YarKhan, A., Dongarra, J.: SLATE: design of a modern distributed and accelerated linear algebra library. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2019), Article 26, pp. 1–18. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3295500.3356223

  5. Bosilca, G., et al.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 1432–1441 (2011). https://doi.org/10.1109/IPDPS.2011.299

  6. Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press (2012)

    Google Scholar 

  7. Berzins, M., et al.: Extending the uintah framework through the petascale modeling of detonation in arrays of high explosive devices. SIAM J. Sci. Comput. 38(5), 101–122 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  8. Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Herault, T., Dongarra, J.J.: PaRSEC: exploiting heterogeneity to enhance scalability. Comput. Sci. Eng. 15(6), 36–45 (2013)

    Article  Google Scholar 

  9. Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)

    Article  Google Scholar 

  10. Holmen, J.K., Sahasrabudhe, D., Berzins, M., Bardakoff, A., Blattner, T.J., Keyrouz, W.: Uintah+hedgehog: combining parallelism models for end-to-end large-scale simulation performance. Scientific Computing and Imaging Institute (2021)

    Google Scholar 

  11. Holmen, J.K., Sahasrabudhe, D., Berzins, M.: A heterogeneous MPI+PPL task scheduling approach for asynchronous many-task runtime systems. In: Proceedings of the Practice and Experience in Advanced Research Computing 2021 on Sustainability, Success and Impact (PEARC 2021). ACM (2021)

    Google Scholar 

  12. Holmen, J.K., Peterson, B., Berzins, M.: An approach for indirectly adopting a performance portability layer in large legacy codes. In: 2nd International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), SC 2019 (2019)

    Google Scholar 

  13. Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (Eugene, OR, USA) (PGAS 2014), Article 6. ACM, New York (2014)

    Google Scholar 

  14. Kale, L.V., Krishnan, S.: CHARM++: a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-oriented Programming Systems, Languages, and Applications (Washington, D.C., USA) (OOPSLA 1993), pp. 91–108. ACM, New York (1993)

    Google Scholar 

  15. Meng, Q., Humphrey, A., Berzins, M.: The uintah framework: a unified heterogeneous task scheduling and runtime system. In: Digital Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2012, WOLFHPC 2012 Workshop, pp. 2441–2448 (2012)

    Google Scholar 

  16. Holmen, J.K., Sahasrabudhe, D., Berzins, M.: Porting uintah to heterogeneous systems. In: Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22) Best Paper Award. ACM (2022)

    Google Scholar 

  17. Vandevoorde, D., Josuttis, N.M., Gregor, D.: C++ Templates: The Complete Guide, 2nd edn. Addison-Wesley Professional (2017). ISBN 0321714121

    Google Scholar 

  18. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03869-3_80

    Chapter  Google Scholar 

  19. Blumofe, R.D., Leiserson, C.E.: Space-efficient scheduling of multithreaded computations. SIAM J. Comput. 27(1), 202–229 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  20. Bardakoff, A.: Analysis and Execution of a Data-Flow Graph Explicit Model Using Static Metaprogramming. Université Clermont Auvergne (2021). https://theses.hal.science/tel-03813645

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitish Shingde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shingde, N., Berzins, M., Blattner, T., Keyrouz, W., Bardakoff, A. (2023). Extending Hedgehog’s Dataflow Graphs to Multi-node GPU Architectures. In: Diehl, P., Thoman, P., Kaiser, H., Kale, L. (eds) Asynchronous Many-Task Systems and Applications. WAMTA 2023. Lecture Notes in Computer Science, vol 13861. Springer, Cham. https://doi.org/10.1007/978-3-031-32316-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32316-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32315-7

  • Online ISBN: 978-3-031-32316-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics