Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

Diehl, Patrick; Daiß, Gregor; Huck, Kevin; Marcello, Dominic; Shiber, Sagiv; Kaiser, Hartmut; Pflüger, Dirk

doi:10.1007/s11227-024-06113-w

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

Published: 18 April 2024

(2024)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Patrick Diehl^1,2,
Gregor Daiß^1,4,
Kevin Huck³,
Dominic Marcello¹,
Sagiv Shiber²,
Hartmut Kaiser¹ &
…
Dirk Pflüger⁴

25 Accesses
Explore all metrics

Abstract

The increasing availability of machines relying on non-GPU architectures, such as ARM A64FX in high-performance computing, provides a set of interesting challenges to application developers. In addition to requiring code portability across different parallelization schemes, programs targeting these architectures have to be highly adaptable in terms of compute kernel sizes to accommodate different execution characteristics for various heterogeneous workloads. In this paper, we demonstrate an approach to write compute kernels using Kokko’s abstraction layer to be executed on x86 and A64FX CPUs and NVIDIA GPUs. In addition to applying Kokkos as an abstraction over the execution of compute kernels on different heterogeneous execution environments, we show that the use of standard C++ constructs, as exposed by the HPX runtime system, enables platform portability based on the real-world Octo-Tiger astrophysics application. We report our experience with porting Octo-Tiger to the ARM A64FX architecture provided by Stony Brook’s Ookami and Riken’s Supercomputer Fugaku and compare the resulting performance with that achieved on well-established GPU-oriented HPC machines such as ORNL’s Summit, NERSC’s Perlmutter, and CSCS’s Piz Daint systems. Octo-Tiger scaled well on Supercomputer Fugaku without any major code changes due to the abstraction levels provided by HPX and Kokkos. Adding vectorization support for ARM’s SVE to Octo-Tiger was trivial thanks to using standard C++ interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multicore and Accelerator Development for a Leadership-Class Stellar Astrophysics Code

The SuperN-Project: Porting and Optimizing VERTEX-PROMETHEUS on the Cray XE6 at HLRS for Three-Dimensional Simulations of Core-Collapse Supernova Explosions of Massive Stars

Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment

Article Open access 19 March 2024

Notes

https://www.top500.org/lists/hpcg/2023/11/.
https://github.com/STEllAR-GROUP/hpx.
Available at https://github.com/STEllAR-GROUP/hpx-kokkos.
https://github.com/G-071/octotiger-spack.
https://github.com/STEllAR-GROUP/hpx/pull/5870.
https://github.com/kokkos/kokkos/pull/5628.
https://github.com/khuck/zerosum.
https://github.com/gperftools/gperftools.
https://github.com/STEllAR-GROUP/OctoTigerBuildChain.
https://github.com/G-071/octotiger-spack.
https://doi.org/10.5281/zenodo.5213015.

References

Almgren A, Sazo MB, Bell J, Harpole A, Katz M, Sexton J, Willcox D, Zhang W, Zingale M (2020) CASTRO: a massively parallel compressible astrophysics simulation code. J Open Sour Softw 5(54):2513. https://doi.org/10.21105/joss.02513
Article Google Scholar
Bauer M, Treichler S, Slaughter E, Aiken A (2012) Legion: expressing locality and independence with logical regions. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, pp 1–11
Beckingsale DA, Burmark J, Hornung R, Jones H, Killian W, Kunen AJ, Pearce O, Robinson P, Ryujin BS, Scogland TR (2019) Raja: portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (p3hpc). IEEE, pp 71–81
Bosilca G, Bouteiller A, Danalis A, Faverge M, Haidar A, Herault T, Kurzak J, Langou J, Lemariner P, Ltaeif H, Luszczek P, YarKhan A, Dongarra J (2011) 2011-05. Flexible development of dense linear algebra algorithms on massively parallel architectures with dplasma. Anchorage, Alaska, USA. IEEE, pp 1432–1441
Bosilca G, Bouteiller A, Danalis A, Faverge M, Hérault T, Dongarra JJ (2013) Parsec: exploiting heterogeneity to enhance scalability. Comput Sci Eng 15(6):36–45
Article Google Scholar
Chamberlain BL, Callahan D, Zima HP (2007) Parallel programmability and the chapel language. Int J High Perform Comput Appl 21(3):291–312
Article Google Scholar
Clayton GC (2012) What are the R coronae borealis stars? J Am Assoc Var Star Obs 40(1): 539. https://doi.org/10.48550/arXiv.1206.3448. arXiv:1206.3448 [astro-ph.SR]
Crawford CL, Clayton GC, Munson B, Chatzopoulos E, Frank J (2020) Modelling R Coronae Borealis Stars: effects of He-burning shell temperature and metallicity. Mon Not R Astron Soc 498(2):2912–2924. https://doi.org/10.1093/mnras/staa2526. arXiv:2007.03076 [astro-ph.SR]
Article Google Scholar
Daiß G (2018) Octo-Tiger: Binary star systems with HPX on Nvidia P100. Master’s thesis
Daiß G, et al (2021) Beyond fork-join: integration of performance portable Kokkos kernels with HPX. In: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE, pp 377–386
Daiß G, Amini P, Biddiscombe J, Diehl P, Frank J, Huck K, Kaiser H, Marcello D, Pfander D, Pfüger D (2019) From Piz Daint to the stars: simulation of stellar mergers using high-level abstractions. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’19, New York, NY, USA. Association for Computing Machinery
Daiß G, Diehl P, Kaiser H, Pflüger D (2023) Stellar Mergers with HPX-Kokkos and SYCL: methods of using an asynchronous many-task runtime system with sycl. In: International Workshop on OpenCL. https://doi.org/10.1145/3585341.3585354
Daiß G, Singanaboina SY, Diehl P, Kaiser H, Pflüger D (2022) From merging frameworks to merging stars: experiences using HPX, Kokkos and SIMD Types. In: 2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, pp 10–19
Daiß G, Diehl P, Marcello D, Kheirkhahan A, Kaiser H, Pflüger D (2022) From task-based GPU work aggregation to stellar mergers: turning fine-grained CPU tasks into portable GPU Kernels. In: 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Los Alamitos, CA, USA. IEEE Computer Society, pp 89–99
Di Renzo M, Fu L, Urzay J (2020) Htr solver: an open-source exascale-oriented task-based multi-gpu high-order code for hypersonic aerothermodynamics. Comput Phys Commun 255:107262
Article MathSciNet Google Scholar
Diehl P, Brandt SR, Morris M, Gupta N, Kaiser H (2023) Benchmarking the parallel 1d heat equation solver in chapel, charm++, c++, hpx, go, julia, python, rust, swift, and java. arXiv:2307.01117
Diehl P, Daiss G, Huck K, Marcello D, Shiber S, Kaiser H, Frank J, Clayton GC, Pflueger D (2022) Distributed, combined CPU and GPU profiling within HPX using APEX. arXiv https://doi.org/10.48550/ARXIV.2210.06437
Diehl P, Daiß G, Marcello D, Huck K, Shiber S, Kaiser H, Frank J, Clayton GC, Pflüger D (2021) Octo-tiger’s new hydro module and performance using HPX+ CUDA on ORNL’s summit. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 204–214
Gamblin T, LeGendre M, Collette MR, Lee GL, Moody A, De Supinski BR, Futral S (2015) The Spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp 1–12
Germain JDdS, McCorquodale J, Parker SG, Johnson CR (2000) Uintah: a massively parallel problem solving environment. In: Proceedings the Ninth International Symposium on High-Performance Distributed Computing. IEEE, pp 33–41
Grant RE, Levenhagen M, Olivier SL, DeBonis D, Pedretti KT, Laros JH III (2016) Standardizing power monitoring and control at exascale. Computer 49(10):38–46
Article Google Scholar
Guilkey J, Harman T, Banerjee B (2007) An Eulerian–Lagrangian approach for simulating explosions of energetic devices. Comput Struct 85(11–14):660–674
Article Google Scholar
Gupta N, Brandt SR, Wagle B, Wu N, Kheirkhahan A, Diehl P, Baumann FW, Kaiser H (2020) Deploying a task-based runtime system on Raspberry Pi clusters. In: 2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). IEEE, pp 11–20
Heller T, Kaiser H, Diehl P, Fey D, Schweitzer MA (2016) Closing the performance gap with modern C++. In: High Performance Computing: ISC High Performance 2016 International Workshops, ExaComm, E-MuCoCoS, HPC-IODC, IXPUG, IWOPH, P${^{\hat{\,}}}$ 3MA, VHPC, WOPSSS, Frankfurt, Germany, June 19–23, 2016, Revised Selected Papers 31. Springer, pp 18–31
Huck KA (2022) Broad performance measurement support for asynchronous multi-tasking with apex. In: 2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp 20–29
Huck KA, Porterfield A, Chaimov N, Kaiser H, Malony AD, Sterling T, Fowler R (2015) An autonomic performance environment for exascale. Supercomput Front Innov 2(3):49–66
Google Scholar
Jetley P, Gioachin F, Mendes C, Kale LV, Quinn T (2008) Massively parallel cosmological simulations with changa. In: 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, pp 1–12
Kaiser H, Brodowicz M, Sterling T (2009) Parallex an advanced parallel execution model for scaling-impaired applications. In: 2009 International Conference on Parallel Processing Workshops. IEEE, pp 394–401
Kaiser H, Diehl P, Lemoine AS, Lelbach BA, Amini P, Berge A, Biddiscombe J, Brandt SR, Gupta N, Heller T et al (2020) HPX-the C++ standard library for parallelism and concurrency. J Open Sour Softw 5(53):2352
Article Google Scholar
Kale LV, Krishnan S (1993) Charm++ a portable concurrent object oriented system based on C++. In: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, pp 91–108
Kodama Y, Odajima T, Arima E, Sato M (2020) Evaluation of power management control on the supercomputer Fugaku. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 484–493
Kretz M, Lindenstruth V (2012) Vc: a c++ library for explicit vectorization. Softw Pract Exp 42(11):1409–1430
Article Google Scholar
Luitjens J, Worthen B, Berzins M, Henderson T (2007) Scalable parallel amr for the uintah multiphysics code. Petascale Comput Algorithms Appl 67–82
Marcello DC, Shiber S, De Marco O, Frank J, Clayton GC, Motl PM, Diehl P, Kaiser H (2021) Octo-Tiger: a new, 3D hydrodynamic code for stellar mergers that uses HPX parallelization. Mon Not R Astron Soc 504(4):5345–5382
Article Google Scholar
Mason E, Diaz M, Williams RE, Preston G, Bensby T (2010) The peculiar nova V1309 Scorpii/nova Scorpii 2008. A candidate twin of V838 Monocerotis. Astron Astrophys 516:A108. https://doi.org/10.1051/0004-6361/200913610. arXiv:1004.3600 [astro-ph.SR]
Article Google Scholar
Munson et al (2021) R Coronae Borealis star evolution: simulating 3D merger events to 1D stellar evolution including large scale nucleosynthesis. Astrophys J. https://doi.org/10.3847/1538-4357/abeb6c
Article Google Scholar
Nandez JLA, Ivanova N, Lombardi JC Jr (2014) V1309 Sco understanding a merger. Astrophys J 786:39. https://doi.org/10.1088/0004-637X/786/1/39. arXiv:1311.6522 [astro-ph.SR]
Article Google Scholar
Padmanabhan N, Ronaghan E, Zagorac JL, Easther R (2019) Simulating ultralight dark matter with chapel: an experience report. In: SC19 Proceedings
Parenteau M, Bourgault-Cote S, Plante F, Laurendeau E (2020) Development of parallel cfd applications on distributed memory with chapel. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 651–658
Pfander D, Daiß G, Marcello D, Kaiser H, Pflüger D (2018) Accelerating Octo-Tiger: Stellar Mergers on Intel Knights Landing with HPX. In: Proceedings of the International Workshop on OpenCL, IWOCL’18, New York, NY, USA. ACM, pp 19:1–19:8
Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K (2005) Scalable molecular dynamics with namd. J Comput Chem 26(16):1781–1802
Article Google Scholar
Sahasrabudhe D, Phipps ET, Rajamanickam S, Berzins M (2019) A portable SIMD primitive using Kokkos for heterogeneous architectures. In: International Workshop on Accelerator Programming Using Directives. Springer, pp 140–163
Saio H (2008) Radial and nonradial pulsations in RCB and EHe-B stars. In: Werner A, Rauch T (eds) Hydrogen-Deficient Stars, Volume 391 of Astronomical Society of the Pacific Conference Series, p 69
Soi R, Mamidi NR, Slaughter E, Prasun K, Nemili A, Deshpande S (2020) An implicitly parallel meshfree solver in regent. In: 2020 IEEE/ACM 3rd Annual Parallel Applications Workshop: Alternatives To MPI+ X (PAW-ATM). IEEE, pp 40–54
Spinti J, Thornock J, Eddings E, Smith P, Sarofim A (2008) Heat transfer to objects in pool fires. Transp Phenom Fires 20:69
Article Google Scholar
Sreepathi S, Taylor M (2021) Early evaluation of Fugaku A64FX architecture using climate workloads. In: 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp 719–727
Srinivas Yadav S (2023) sve::experimental::simd header-only library for SVE vectorization on A64FX. https://github.com/srinivasyadav18/sve
Sunderland D, Peterson B, Schmidt J, Humphrey A, Thornock J, Berzins M (2016) An overview of performance portability in the uintah runtime system through the use of kokkos. In: 2016 Second International Workshop on Extreme Scale Programming Models and Middleware (ESPM2), pp 44–47
Thoman P, Dichev K, Heller T, Iakymchuk R, Aguilar X, Hasanov K, Gschwandtner P, Lemarinier P, Markidis S, Jordan H et al (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. J Supercomput 74(4):1422–1434
Article Google Scholar
Treichler S, Bauer M, Bhagatwala A, Borghesi G, Sankaran R, Kolla H, McCormick PS, Slaughter E, Lee W, Aiken A, et al (2017) S3d-legion: an exascale software for direct numerical simulation of turbulent combustion with complex multicomponent chemistry. In: Exascale Scientific Applications. Chapman and Hall/CRC, pp 257–278
Trott CR, Lebrun-Grandié D, Arndt D, Ciesko J, Dang V, Ellingwood N, Gayatri R, Harvey E, Hollman DS, Ibanez D, Liber N, Madsen J, Miles J, Poliakoff D, Powell A, Rajamanickam S, Simberg M, Sunderland D, Turcksin B, Wilke J (2022) Kokkos 3: programming model extensions for the exascale era. IEEE Trans Parallel Distrib Syst 33(4):805–817
Article Google Scholar
Tylenda R, Hajduk M, Kamiński T, Udalski A, Soszyński I, Szymański MK, Kubiak M, Pietrzyński G, Poleski R, Wyrzykowski Ł, Ulaczyk K (2011) V1309 Scorpii: merger of a contact binary. Astron Astrophys 528:A114. https://doi.org/10.1051/0004-6361/201016221. arXiv:1012.0163 [astro-ph.SR]
Article Google Scholar
Wu N, Gonidelis I, Liu S, Fink Z, Gupta N, Mohammadiporshokooh K, Diehl P, Kaiser H, Kale LV (2022) Quantifying overheads in charm++ and hpx using task bench. In: European Conference on Parallel Processing. Springer, pp 5–16

Download references

Acknowledgments

This research used resources of the National Energy Research Scientific Computing Center, the U.S. Department of Energy, Office of Science User Facility, operated under Contract No. DE-AC02-05CH11231. This work used computational resources of the Supercomputer Fugaku provided by Riken through the HPCI System Research Project (Project ID: hp210311). A grant from the Swiss National Supercomputing Center (CSCS) supported this work under Project ID: s1078. The authors would like to thank Stony Brook Research Computing and Cyberinfrastructure and the Institute for Advanced Computational Science at Stony Brook University for access to the innovative high-performance Ookami computing system, which was made possible by a $5M National Science Foundation Grant (#1927880).

Author information

Authors and Affiliations

Center of Computation and Technology, Louisiana State University, 888 S Stadium Dr, Baton Rouge, LA, 70808, USA
Patrick Diehl, Gregor Daiß, Dominic Marcello & Hartmut Kaiser
Department of Physics and Astronomy, Louisiana State University, 202 Nicholson Hall, Baton Rouge, LA, 70803, USA
Patrick Diehl & Sagiv Shiber
OACISS, University of Oregon, 1585 E 13th Ave, Eugene, OR, 97403, USA
Kevin Huck
IPVS, University of Stuttgart, Universitätsstraße 38, 70569, Stuttgart, BW, Germany
Gregor Daiß & Dirk Pflüger

Authors

Patrick Diehl
View author publications
You can also search for this author in PubMed Google Scholar
Gregor Daiß
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Huck
View author publications
You can also search for this author in PubMed Google Scholar
Dominic Marcello
View author publications
You can also search for this author in PubMed Google Scholar
Sagiv Shiber
View author publications
You can also search for this author in PubMed Google Scholar
Hartmut Kaiser
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Pflüger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Diehl.

Ethics declarations

Disclaimer

The results on NERSC’s Perlmutter were conducted in phase 1; such results should not reflect or imply that they are the final results of the system. Numerous upgrades will be made for Phase 2 that will substantially change Perlmutter’s final size and network capabilities.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (zip 2 KB)

Supplementary file 2 (zip 0 KB)

Supplementary file 3 (zip 0 KB)

Supplementary file 4 (zip 0 KB)

Supplementary file 5 (zip 0 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Diehl, P., Daiß, G., Huck, K. et al. Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku. J Supercomput (2024). https://doi.org/10.1007/s11227-024-06113-w

Download citation

Accepted: 27 March 2024
Published: 18 April 2024
DOI: https://doi.org/10.1007/s11227-024-06113-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

Abstract

Access this article

Similar content being viewed by others

Multicore and Accelerator Development for a Leadership-Class Stellar Astrophysics Code

The SuperN-Project: Porting and Optimizing VERTEX-PROMETHEUS on the Cray XE6 at HLRS for Three-Dimensional Simulations of Core-Collapse Supernova Explosions of Massive Stars

Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclaimer

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (zip 2 KB)

Supplementary file 2 (zip 0 KB)

Supplementary file 3 (zip 0 KB)

Supplementary file 4 (zip 0 KB)

Supplementary file 5 (zip 0 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

Abstract

Access this article

Similar content being viewed by others

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclaimer

Additional information

Publisher's Note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation