An Illustration of Extending Hedgehog to Multi-Node GPU Architectures Using GEMM

  • Original Research
  Published:
Asynchronous task-based systems offer the possibility of making it easier to take advantage of scalable heterogeneous architectures. This paper extends the previous work, demonstrating how Hedgehog, a dataflow graph-based model developed at the National Institute of Standards and Technology, can be used to obtain high performance for numerical linear algebraic operations as a starting point for complex algorithms. While the results were promising, it was unclear how to scale them to larger matrices and compute node counts. The aim here is to show how the new, improved algorithm inspired by DPLASMA performs equally well using Hedgehog. The results are compared against the leading library DPLASMA to illustrate the performance of different asynchronous dataflow models. The work demonstrates that using general-purpose, high-level abstractions, such as Hedgehog’s dataflow graphs, makes it possible to achieve similar performance to the specialized linear algebra codes such as DPLASMA.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Data availability

The data for the matrices is generated using a random generator function, which is included as code in the repository linked in the Code Availability section.

Code Availability

Our code is available here The application v3_benchmark2 of the commit tagged as also v3_benchmark2 was used for benchmarking on all the systems.


  1. Compared to other AMT systems, HPX brings a "future-proof C++ conforming API" and an exposed asynchronous programming model.


Author information

Authors and Affiliations


Corresponding author

Correspondence to Nitish Shingde.

Certain equipment, instruments, software, or materials, commercial or non-commercial, are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement of any product or service by NIST, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

Shingde, N., Blattner, T., Bardakoff, A. et al. An Illustration of Extending Hedgehog to Multi-Node GPU Architectures Using GEMM. SN COMPUT. SCI. 5, 654 (2024).

