Skip to main content

Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA

  • Conference paper
  • First Online:
Euro-Par 2023: Parallel Processing (Euro-Par 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14100))

Included in the following conference series:

  • 1533 Accesses

Abstract

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ADIOS 2 Documentation (2022). https://adios2.readthedocs.io/en/latest/advanced/gpu_aware.html

  2. AMD ROCm Information Portal - v4.5 (2022). https://rocmdocs.amd.com/en/latest/Remote_Device_Programming/Remote-Device-Programming.html

  3. NVIDIA GPUDirect RDMA Documentation (2022). https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

  4. Zhang, B., Davis, P.E., Morales, N., Zhang, Z., Teranishi, K., Parashar, M. Artifact and instructions to generate experimental results for Euro-Par 2023 paper: SymED: Adaptive and Online Symbolic Representation of Data on the Edge, 2023. https://doi.org/10.6084/m9.figshare.23535855

  5. Ahrens, J., Rhyne, T.M.: Increasing scientific data insights about exascale class simulations under power and storage constraints. IEEE Comput. Graph. Appl. 35(2), 8–11 (2015)

    Article  Google Scholar 

  6. Asch, M., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. Comput. Appl. 32(4), 435–479 (2018)

    Article  Google Scholar 

  7. Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)

    Google Scholar 

  8. Bethel, E., et al.: In Situ Methods, Infrastructures, and Applications on High Performance Computing Platforms, a State-of-the-art (STAR) Report (2021)

    Google Scholar 

  9. Brown, W.M.: GPU acceleration in LAMMPS. In: LAMMPS User’s Workshop and Symposium (2011)

    Google Scholar 

  10. Docan, C., Parashar, M., Klasky, S.: Dataspaces: an interaction and coordination framework for coupled simulation workflows. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 25–36 (2010)

    Google Scholar 

  11. Godoy, W.F., et al.: Adios 2: the adaptable input output system. a framework for high-performance data management. SoftwareX 12, 100561 (2020)

    Google Scholar 

  12. Goswami, A., et al.: Landrush: rethinking in-situ analysis for GPGPU workflows. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 32–41. IEEE (2016)

    Google Scholar 

  13. Jeaugey, S.: Nccl 2.0. In: GPU Technology Conference (GTC), vol. 2 (2017)

    Google Scholar 

  14. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013

    Google Scholar 

  15. Kress, J., Klasky, S., Podhorszki, N., Choi, J., Childs, H., Pugmire, D.: Loosely coupled in situ visualization: a perspective on why it’s here to stay. In: Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, pp. 1–6 (2015)

    Google Scholar 

  16. Lindstrom, P.: Fixed-rate compressed floating-point arrays. IEEE Trans. Vis. Comput. Graph. 20(12), 2674–2683 (2014)

    Article  Google Scholar 

  17. Moreland, K.: The tensions of in situ visualization. IEEE Comput. Graph. Appl. 36(2), 5–9 (2016)

    Article  Google Scholar 

  18. Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using GPUDirect RDMA for InfiniBand clusters with NVIDIA GPUs. In: 2013 42nd International Conference on Parallel Processing, pp. 80–89. IEEE (2013)

    Google Scholar 

  19. Pulatov, D., Zhang, B., Suresh, S., Miller, C.: Porting IDL programs into Python for GPU-Accelerated In-situ Analysis (2021)

    Google Scholar 

  20. Reyes, R., Brown, G., Burns, R., Wong, M.: SYCL 2020: more than meets the eye. In: Proceedings of the International Workshop on OpenCL, p. 1 (2020)

    Google Scholar 

  21. Ross, R.B., et al.: Mochi: composing data services for high-performance computing environments. J. Comput. Sci. Technol. 35(1), 121–144 (2020)

    Article  Google Scholar 

  22. Shi, R., et al.: Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2014)

    Google Scholar 

  23. Strohmaier, E., Dongarra, J., Simon, H., Meuer, M.: TOP500 List, November 2022. https://www.top500.org/lists/top500/2022/11/

  24. Trott, C.R., et al.: Kokkos 3: programming model extensions for the exascale era. IEEE Trans. Parallel Distrib. Syst. 33(4), 805–817 (2021)

    Article  Google Scholar 

  25. Wang, D., Foran, D.J., Qi, X., Parashar, M.: Enabling asynchronous coupled data intensive analysis workflows on GPU-accelerated platforms via data staging

    Google Scholar 

  26. Wang, H., Potluri, S., Luo, M., Singh, A.K., Sur, S., Panda, D.K.: MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput. Sci.-Res. Dev. 26(3), 257–266 (2011)

    Article  Google Scholar 

  27. Zhang, B., Subedi, P., Davis, P.E., Rizzi, F., Teranishi, K., Parashar, M.: Assembling portable in-situ workflow from heterogeneous components using data reorganization. In: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 41–50. IEEE (2022)

    Google Scholar 

Download references

Acknowledgements and Data Availability

This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration (NNSA) under contract DE- NA0003525. This work was funded by NNSA’s Advanced Simulation and Computing (ASC) Program. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). This work is also based upon work by the RAPIDS2 Institute supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research through the Advanced Computing (SciDAC) program under Award Number DE-SC0023130. The datasets and code generated during and/or analysed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.23535855 [4]. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B., Davis, P.E., Morales, N., Zhang, Z., Teranishi, K., Parashar, M. (2023). Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39698-4_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39697-7

  • Online ISBN: 978-3-031-39698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics