Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication

Subramoni, Hari; Chakraborty, Sourav; Panda, Dhabaleswar K.

doi:10.1007/978-3-319-58667-0_18

Hari Subramoni¹⁹,
Sourav Chakraborty¹⁹ &
Dhabaleswar K. Panda¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

International Conference on High Performance Computing

2372 Accesses
7 Citations
3 Altmetric

Abstract

Broadly, there exist two protocols for point-to-point data transfer in the Message Passing Interface (MPI) programming model - Eager and Rendezvous. State-of-the-art MPI libraries decide the switch point between these protocols based on the trade-off between memory footprint of the MPI library and communication performance without considering the overlap potential of these communication protocols. This results in sub-par overlap of communication and computation at the application level. While application developers can manually tune this threshold to achieve better overlap, it involves significant effort. Further, the communication pattern may change based on the size of the job and the input requiring constant re-tuning making such a solution impractical. In this paper, we take up this challenge and propose designs for point-to-point data transfer in MPI which accounts for overlap in addition to performance and memory footprint. The proposed designs dynamically adapt to the communication characteristic of each communicating pair of processes at runtime. Our proposed full in-band design is able to transition from one eager-threshold to another without impacting the communication throughput of the application. The proposed enhancements to limit the memory footprint by dynamically freeing unused internal communication buffer is able to significantly cut down on memory footprint of the MPI library without affecting the communication performance.

Experimental evaluations show that the proposed dynamic and adaptive design is able to deliver performance on-par with what exhaustive manual tuning provides while limiting the memory consumed to the absolute minimum necessary to deliver the desired benefits. For instance, with the Amber molecular dynamics application at 1,024 processes, the proposed design is able to perform on-par with the best manually tuned versions while reducing the memory footprint of the MPI library by 25%. With the 3D-Stencil benchmark at 8,192 processes, the proposed design is able to deliver much better overlap of computation and communication as well as improved overall time compared to the default version. To the best of our knowledge, this is the first point-to-point communication protocol design that is capable of dynamically adapting to the communication requirements of end applications.

This research is supported in part by National Science Foundation grants #CNS-1419123, #CNS-1513120, #ACI-1450440 and #CCF-1565414.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Mellanox Technologies. http://www.mellanox.com
Quadrics Supercomputers World Ltd. http://www.quadrics.com/
Barrett, B.W., Brightwell, R., Hemmert, K.S., Wheeler, K.B., Underwood, K.D.: Using triggered operations to offload rendezvous messages. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 120–129. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24449-0_15
Chapter Google Scholar
Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.: Myrinet: a gigabit-per-second local area network. IEEE Micro 15, 29–36 (1995)
Article Google Scholar
Brightwell, R., Underwood, K.: Evaluation of an eager protocol optimization for MPI. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 327–334. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39924-7_46
Chapter Google Scholar
Brunet, É., Trahay, F., Denis, A., Namyst, R.: A Sampling-based approach for communication libraries auto-tuning. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp. 299–307. IEEE (2011)
Google Scholar
Case, D.A., Darden, T.A., Cheatham, T.E., Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Walker, R.C., Zhang, W., Merz, K.M., Roberts, B.P., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Kolossváry, I., Wong, K.F., Paesani, F., Vanicek, J., Wu, X., Brozell, S.R., Steinbrecher, T., Gohlke, H., Cai, Q., Ye, X., Wang, J., Hsieh, M.-J., Cui, G., Roe, D.R., Mathews, D.H., Seetin, M.G., Sagui, C., Babin, V., Luchko, T., Gusarov, S., Kovalenko, A., Kollman, P.A.: Amber 2016, University of California, San Francisco (2016)
Google Scholar
Open MPI: Open Source High Performance Computing. http://www.open-mpi.org
Cui, Y., Moore, R., Olsen, K., Chourasia, A., Maechling, P., Minster, B., Day, S., Hu, Y., Zhu, J., Majumdar, A., Jordan, T.: Toward petascale earthquake simulations. Acta Geotech. 4, 79–93 (2008). Springer
Article Google Scholar
Derradji, S., Palfer-Sollier, T., Panziera, J.P., Poudes, A., Atos, F.W.: The BXI interconnect architecture. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 18–25, August 2015
Google Scholar
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI, message passing interface standard. Technical report, Argonne National Laboratory and Mississippi State University
Google Scholar
Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Technical report SAND2009-5574, Sandia National Laboratories (2009)
Google Scholar
Islam, T., Mohror, K., Schulz, M.: Exploring the capabilities of the new MPI_T interface. In: Proceedings of the 21st European MPI Users’ Group Meeting, p. 91. ACM (2014)
Google Scholar
Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and implementation of MPICH2 over InfiniBand with RDMA support. In: Proceedings of Int’l Parallel and Distributed Processing Symposium (IPDPS 2004), April 2004
Google Scholar
Lu, Y., Yang, C., Du, Y.: HPCG on Tianhe2
Google Scholar
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, March 1994
Google Scholar
Miceli, R., et al.: AutoTune: a plugin-driven approach to the automatic tuning of parallel applications. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 328–342. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36803-5_24
Chapter Google Scholar
MPI-3 Standard Document. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
Pimenta, A., Cesar, E., Sikora, A.: Methodology for MPI applications autotuning. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 145–146. ACM (2013)
Google Scholar
Portals Network Programming Interface. http://www.cs.sandia.gov/Portals/
San Diego Supercomputing Center. Gordon Supercomputer. http://www.sdsc.edu/services/hpc/hpc_systems.html#gordon
Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: InfiniBand scalability in open MPI. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS 2006, p. 100. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Sikora, A., César, E., Comprés, I., Gerndt, M.: Autotuning of MPI applications using PTF. In: Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, pp. 31–38. ACM (2016)
Google Scholar
Sur, S., Chai, L., Jin, H., Panda, D.K.: Shared receive queue based scalable MPI design for InfiniBand clusters. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS 2006, p. 101. IEEE Computer Society, Washington, DC (2006)
Google Scholar
Texas Advanced Computing Center. Stampede Supercomputer. http://www.tacc.utexas.edu/
The MIMD Lattice Computation (MILC) Collaboration. http://physics.indiana.edu/~sg/milc.html
Wu, J., Liu, J., Wyckoff, P., Panda, D.: Impact of on-demand connection management in MPI over via. In: Proceedings of the 2002 IEEE International Conference on Cluster Computing, pp. 152–159 (2002)
Google Scholar
Yu, W., Gao, Q., Panda, D.K.: Adaptive connection management for scalable MPI over InfiniBand. In: Proceedings 20th IEEE International Parallel Distributed Processing Symposium, p. 10, April 2006
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Hari Subramoni, Sourav Chakraborty & Dhabaleswar K. Panda

Authors

Hari Subramoni
View author publications
You can also search for this author in PubMed Google Scholar
Sourav Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar K. Panda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hari Subramoni .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
Argonne National Laboratory, Argonne, IL, USA
Pavan Balaji
KAUST, Thuwal, Saudi Arabia
David Keyes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Subramoni, H., Chakraborty, S., Panda, D.K. (2017). Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-58667-0_18
Published: 12 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58666-3
Online ISBN: 978-3-319-58667-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics