A C++ Library for Memory Layout and Performance Portability of Scientific Applications

Incardona, Pietro; Gupta, Aryaman; Yaskovets, Serhii; Sbalzarini, Ivo F.

doi:10.1007/978-3-031-31209-0_8

Pietro Incardona^13,14,15,
Aryaman Gupta^13,14,15,
Serhii Yaskovets¹³ &
…
Ivo F. Sbalzarini ORCID: orcid.org/0000-0003-4414-4340^13,14,15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13835))

Included in the following conference series:

European Conference on Parallel Processing

444 Accesses
4 Altmetric

Abstract

We present a C++14 library for performance portability of scientific computing codes across CPU and GPU architectures. Our library combines generic data structures like vectors, multi-dimensional arrays, maps, graphs, and sparse grids with basic, reusable algorithms like convolutions, sorting, prefix sum, reductions, and scan. The memory layout of the data structures is adapted at compile-time using tuples with optional memory mirroring between CPU and GPU. We combine this transparent memory mapping with generic algorithms under two alternative programming interfaces: a CUDA-like kernel interface for multi-core CPUs, Nvidia GPUs, and AMD GPUs, as well as a lambda interface. We validate and benchmark the presented library using micro-benchmarks, showing that the abstractions introduce negligible performance overhead, and we compare performance against the current state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.boost.org/.

References

Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81 (2019). https://doi.org/10.1109/P3HPC49587.2019.00012
Gruber, B.M., Amadio, G., Blomer, J., Matthes, A., Widera, R., Bussmann, M.: LLAMA: the low-level abstraction for memory access. In: Software: Practice and Experience, pp. 1–27 (2022). https://doi.org/10.1002/spe.3077
Incardona, P., Bianucci, T., Sbalzarini, I.F.: Distributed sparse block grids on GPUs. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 272–290. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_15
Chapter Google Scholar
Incardona, P., Leo, A., Zaluzhnyi, Y., Ramaswamy, R., Sbalzarini, I.F.: OpenFPM: a scalable open framework for particle and particle-mesh codes on parallel computers. Comput. Phys. Commun. 241, 155–177 (2019). https://doi.org/10.1016/j.cpc.2019.03.007
Article Google Scholar
Poenaru, A., Lin, W.-C., McIntosh-Smith, S.: A performance analysis of modern parallel programming models using a compute-bound application. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 332–350. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_18
Chapter Google Scholar
Reyes, R., Lomüller, V.: SYCL: Single-source C++ accelerator programming. In: Parallel Computing: On the Road to Exascale, pp. 673–682. IOS Press (2016). https://doi.org/10.3233/978-1-61499-621-7-673
Sbalzarini, I.F.: Abstractions and middleware for petascale computing and beyond. Intl. J. Distr. Syst. Technol. 1(2), 40–56 (2010). https://doi.org/10.4018/jdst.2010040103
Article Google Scholar
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2022). https://doi.org/10.1109/TPDS.2021.3097283
Article Google Scholar
Zenker, E., et al.: Alpaka-an abstraction library for parallel kernel acceleration. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631–640. IEEE (2016). https://doi.org/10.1109/IPDPSW.2016.50

Download references

Acknowledgments

We thank Christian Trott from the Kokkos project for his help and advise in tuning the Kokkos benchmarks for optimal performance. The authors are grateful to the Centre for Information Services and High Performance Computing (ZIH) of TU Dresden and the Scientific Computing Facility of MPI-CBG for providing their facilities for the benchmarks. This work was supported by the Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF) under grants 01/S18026A-F (competence center for Big Data and AI “ScaDS.AI Dresden/Leipzig”) and 031L0160 (project “SPlaT-DM – computer simulation platform for topology-driven morphogenesis”).

Author information

Authors and Affiliations

Technische Universität Dresden, Faculty of Computer Science, Dresden, Germany
Pietro Incardona, Aryaman Gupta, Serhii Yaskovets & Ivo F. Sbalzarini
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
Pietro Incardona, Aryaman Gupta & Ivo F. Sbalzarini
Center for Systems Biology Dresden, 01307, Dresden, Germany
Pietro Incardona, Aryaman Gupta & Ivo F. Sbalzarini
Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
Ivo F. Sbalzarini

Authors

Pietro Incardona
View author publications
You can also search for this author in PubMed Google Scholar
Aryaman Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Serhii Yaskovets
View author publications
You can also search for this author in PubMed Google Scholar
Ivo F. Sbalzarini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivo F. Sbalzarini .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
Jeremy Singer
University of Glasgow, Glasgow, UK
Yehia Elkhatib
University of Santiago de Compostela, Santiago de Compostela, La Coruña, Spain
Dora Blanco Heras
Louisiana State University, Baton Rouge, LA, USA
Patrick Diehl
University of Edinburgh, Edinburgh, UK
Nick Brown
Universidade de Lisboa, Lisbon, Portugal
Aleksandar Ilic

Ethics declarations

Code availability

The source code of the presented library is available under the GPLv3 license as part of the OpenFPM project for scalable scientific computing (http://openfpm.mpi-cbg.de/) at: https://git.mpi-cbg.de/mosaic/software/parallel-computing/openfpm.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Incardona, P., Gupta, A., Yaskovets, S., Sbalzarini, I.F. (2023). A C++ Library for Memory Layout and Performance Portability of Scientific Applications. In: Singer, J., Elkhatib, Y., Blanco Heras, D., Diehl, P., Brown, N., Ilic, A. (eds) Euro-Par 2022: Parallel Processing Workshops. Euro-Par 2022. Lecture Notes in Computer Science, vol 13835. Springer, Cham. https://doi.org/10.1007/978-3-031-31209-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-31209-0_8
Published: 02 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31208-3
Online ISBN: 978-3-031-31209-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A C++ Library for Memory Layout and Performance Portability of Scientific Applications