Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA

Zhang, Bo; Davis, Philip E.; Morales, Nicolas; Zhang, Zhao; Teranishi, Keita; Parashar, Manish

doi:10.1007/978-3-031-39698-4_22

Bo Zhang¹²,
Philip E. Davis¹²,
Nicolas Morales¹³,
Zhao Zhang¹⁴,
Keita Teranishi¹⁵ &
…
Manish Parashar¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14100))

Included in the following conference series:

European Conference on Parallel Processing

1533 Accesses

Abstract

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

ADIOS 2 Documentation (2022). https://adios2.readthedocs.io/en/latest/advanced/gpu_aware.html
AMD ROCm Information Portal - v4.5 (2022). https://rocmdocs.amd.com/en/latest/Remote_Device_Programming/Remote-Device-Programming.html
NVIDIA GPUDirect RDMA Documentation (2022). https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
Zhang, B., Davis, P.E., Morales, N., Zhang, Z., Teranishi, K., Parashar, M. Artifact and instructions to generate experimental results for Euro-Par 2023 paper: SymED: Adaptive and Online Symbolic Representation of Data on the Edge, 2023. https://doi.org/10.6084/m9.figshare.23535855
Ahrens, J., Rhyne, T.M.: Increasing scientific data insights about exascale class simulations under power and storage constraints. IEEE Comput. Graph. Appl. 35(2), 8–11 (2015)
Article Google Scholar
Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32(4), 435–479 (2018)
Article Google Scholar
Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)
Google Scholar
Bethel, E., et al.: In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report (2021)
Google Scholar
Brown, W.M.: GPU acceleration in LAMMPS. In: LAMMPS User’s Workshop and Symposium (2011)
Google Scholar
Docan, C., Parashar, M., Klasky, S.: Dataspaces: an interaction and coordination framework for coupled simulation workflows. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 25–36 (2010)
Google Scholar
Godoy, W.F., et al.: Adios 2: the adaptable input output system. a framework for high-performance data management. SoftwareX 12, 100561 (2020)
Google Scholar
Goswami, A., et al.: Landrush: rethinking in-situ analysis for GPGPU workflows. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 32–41. IEEE (2016)
Google Scholar
Jeaugey, S.: Nccl 2.0. In: GPU Technology Conference (GTC), vol. 2 (2017)
Google Scholar
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013
Google Scholar
Kress, J., Klasky, S., Podhorszki, N., Choi, J., Childs, H., Pugmire, D.: Loosely coupled in situ visualization: a perspective on why it’s here to stay. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 1–6 (2015)
Google Scholar
Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)
Article Google Scholar
Moreland, K.: The tensions of in situ visualization. IEEE Comput. Graph. Appl. 36(2), 5–9 (2016)
Article Google Scholar
Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 80–89. IEEE (2013)
Google Scholar
Pulatov, D., Zhang, B., Suresh, S., Miller, C.: Porting IDL programs into Python for GPU-Accelerated In-situ Analysis (2021)
Google Scholar
Reyes, R., Brown, G., Burns, R., Wong, M.: SYCL 2020: more than meets the eye. In: Proceedings of the International Workshop on OpenCL, p. 1 (2020)
Google Scholar
Ross, R.B., et al.: Mochi: composing data services for high-performance computing environments. J. Comput. Sci. Technol. 35(1), 121–144 (2020)
Article Google Scholar
Shi, R., et al.: Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2014)
Google Scholar
Strohmaier, E., Dongarra, J., Simon, H., Meuer, M.: TOP500 List, November 2022. https://www.top500.org/lists/top500/2022/11/
Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2021)
Article Google Scholar
Wang, D., Foran, D.J., Qi, X., Parashar, M.: Enabling asynchronous coupled data intensive analysis workflows on GPU-accelerated platforms via data staging
Google Scholar
Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput. Sci.-Res. Dev. 26(3), 257–266 (2011)
Article Google Scholar
Zhang, B., Subedi, P., Davis, P.E., Rizzi, F., Teranishi, K., Parashar, M.: Assembling portable in-situ workflow from heterogeneous components using data reorganization. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 41–50. IEEE (2022)
Google Scholar

Download references

Acknowledgements and Data Availability

This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration (NNSA) under contract DE- NA0003525. This work was funded by NNSA’s Advanced Simulation and Computing (ASC) Program. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). This work is also based upon work by the RAPIDS2 Institute supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research through the Advanced Computing (SciDAC) program under Award Number DE-SC0023130. The datasets and code generated during and/or analysed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.23535855 [4]. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Author information

Authors and Affiliations

Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, 84112, USA
Bo Zhang, Philip E. Davis & Manish Parashar
Sandia National Laboratories, Livermore, CA, 94551, USA
Nicolas Morales
Texas Advanced Computing Center, Austin, TX, 78758, USA
Zhao Zhang
Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
Keita Teranishi

Authors

Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Philip E. Davis
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Morales
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Keita Teranishi
View author publications
You can also search for this author in PubMed Google Scholar
Manish Parashar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Zhang .

Editor information

Editors and Affiliations

University of Glasgow, Glasgow, UK
José Cano
University of Cyprus, Nicosia, Cyprus
Marios D. Dikaiakos
University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
Chalmers University of Technology, Gothenburg, Sweden
Miquel Pericàs
University of Manchester, Manchester, UK
Rizos Sakellariou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B., Davis, P.E., Morales, N., Zhang, Z., Teranishi, K., Parashar, M. (2023). Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-39698-4_22
Published: 24 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39697-7
Online ISBN: 978-3-031-39698-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA