Automatic Code Motion to Extend MPI Nonblocking Overlap Window

Nguyen, Van Man; Saillard, Emmanuelle; Jaeger, Julien; Barthou, Denis; Carribault, Patrick

doi:10.1007/978-3-030-59851-8_4

Van Man Nguyen^12,13,14,15,
Emmanuelle Saillard¹³,
Julien Jaeger^12,14,
Denis Barthou^13,15 &
…
Patrick Carribault^12,14

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12321))

Included in the following conference series:

International Conference on High Performance Computing

1448 Accesses
4 Citations

Abstract

HPC applications rely on a distributed-memory parallel programming model to improve the overall execution time. This leads to spawning multiple processes that need to communicate with each other to make the code progress. But these communications involve overheads caused by network latencies or synchronizations between processes. One possible approach to reduce those overheads is to overlap communications with computations. MPI allows this solution through its nonblocking communication mode: a nonblocking communication is composed of an initialization and a completion call. It is then possible to overlap the communication by inserting computations between these two calls. The use of nonblocking collective calls is however still marginal and adds a new layer of complexity. In this paper we propose an automatic static optimization that (i) transforms blocking MPI communications into their nonblocking counterparts and (ii) performs extensive code motion to increase the size of overlapping intervals between initialization and completion calls. Our method is implemented in LLVM as a compilation pass, and shows promising results on two mini applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmed, H., Skjellum, A., Bangalore, P., Pirkelbauer, P.: Transforming blocking MPI collectives to non-blocking and persistent operations. In: Proceedings of the 24th European MPI Users’ Group Meeting, pp. 1–11 (2017)
Google Scholar
Clement, M.J., Quinn, M.J.: Overlapping computations, communications and I/O in parallel sorting. J. Parallel Distrib. Comput. 28(2), 162–172 (1995)
Article Google Scholar
Danalis, A., Pollock, L., Swany, M.: Automatic MPI application transformation with ASPhALT. In: 2007 IEEE International Parallel and Distributed Processing Symposium, pp. 1–8. IEEE (2007)
Google Scholar
Danalis, A., Pollock, L., Swany, M., Cavazos, J.: MPI-aware compiler optimizations for improving communication-computation overlap. In: Proceedings of the 23rd International Conference on Supercomputing, pp. 316–325 (2009)
Google Scholar
Das, D., Gupta, M., Ravindran, R., Shivani, W., Sivakeshava, P., Uppal, R.: Compiler-controlled extraction of computation-communication overlap in MPI applications. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8. IEEE (2008)
Google Scholar
Denis, A., Trahay, F.: MPI overlap: benchmark and analysis. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 258–267 (2016)
Google Scholar
Guo, J., Yi, Q., Meng, J., Zhang, J., Balaji, P.: Compiler-assisted overlapping of communication and computation in MPI applications. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 60–69. IEEE (2016)
Google Scholar
Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical report SAND2009-5574 3 (2009)
Google Scholar
Hoefler, T., Gottschling, P., Rehm, W., Lumsdaine, A.: Optimizing a conjugate gradient solver with non-blocking collective operations. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) EuroPVM/MPI 2006. LNCS, vol. 4192, pp. 374–382. Springer, Heidelberg (2006). https://doi.org/10.1007/11846802_52
Chapter Google Scholar
Hoefler, T., Lumsdaine, A.: Design, Implementation, and Usage of LibNBC. Technical report, Open Systems Lab, Indiana University, August 2006
Google Scholar
Kandalla, et al.: Can network-offload based non-blocking neighborhood MPI collectives improve communication overheads of irregular graph algorithms? In: 2012 IEEE International Conference on Cluster Computing Workshops, pp. 222–230. IEEE (2012)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, CGO 2004, pp. 75–86. IEEE (2004)
Google Scholar
Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)
Google Scholar
Song, S., Hollingsworth, J.K.: Computation-communication overlap and parameter auto-tuning for scalable Pparallel 3-D FFT. J. Comput. Sci. 14, 38–50 (2016)
Article MathSciNet Google Scholar
Weiser, M.: Program slicing. In: Proceedings of the 5th International Conference on Software Engineering, ICSE 1981, pp. 439–449. IEEE Press (1981)
Google Scholar

Download references

Author information

Authors and Affiliations

CEA, DAM, DIF, 91297, Arpajon, France
Van Man Nguyen, Julien Jaeger & Patrick Carribault
Inria, Bordeaux, France
Van Man Nguyen, Emmanuelle Saillard & Denis Barthou
Laboratoire en Informatique Haute Performance pour le Calcul et la simulation, Bruyères-le-Châtel, France
Van Man Nguyen, Julien Jaeger & Patrick Carribault
Bordeaux Institute of Technology, University of Bordeaux, LaBRI, Bordeaux, France
Van Man Nguyen & Denis Barthou

Authors

Van Man Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuelle Saillard
View author publications
You can also search for this author in PubMed Google Scholar
Julien Jaeger
View author publications
You can also search for this author in PubMed Google Scholar
Denis Barthou
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Carribault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Van Man Nguyen .

Editor information

Editors and Affiliations

University of Tennessee at Knoxville, Knowville, TN, USA
Heike Jagode
Department of Mathematics, KIT für Technologie Karlsruhe, Karlsruhe, Baden-Württemberg, Germany
Hartwig Anzt
Computational Science, Helmholtz-Zentrum Dresden Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Extreme Computing Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Hatem Ltaief

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, V.M., Saillard, E., Jaeger, J., Barthou, D., Carribault, P. (2020). Automatic Code Motion to Extend MPI Nonblocking Overlap Window. In: Jagode, H., Anzt, H., Juckeland, G., Ltaief, H. (eds) High Performance Computing. ISC High Performance 2020. Lecture Notes in Computer Science(), vol 12321. Springer, Cham. https://doi.org/10.1007/978-3-030-59851-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-59851-8_4
Published: 20 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59850-1
Online ISBN: 978-3-030-59851-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics