Skip to main content

Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10266))

Included in the following conference series:

Abstract

Broadly, there exist two protocols for point-to-point data transfer in the Message Passing Interface (MPI) programming model - Eager and Rendezvous. State-of-the-art MPI libraries decide the switch point between these protocols based on the trade-off between memory footprint of the MPI library and communication performance without considering the overlap potential of these communication protocols. This results in sub-par overlap of communication and computation at the application level. While application developers can manually tune this threshold to achieve better overlap, it involves significant effort. Further, the communication pattern may change based on the size of the job and the input requiring constant re-tuning making such a solution impractical. In this paper, we take up this challenge and propose designs for point-to-point data transfer in MPI which accounts for overlap in addition to performance and memory footprint. The proposed designs dynamically adapt to the communication characteristic of each communicating pair of processes at runtime. Our proposed full in-band design is able to transition from one eager-threshold to another without impacting the communication throughput of the application. The proposed enhancements to limit the memory footprint by dynamically freeing unused internal communication buffer is able to significantly cut down on memory footprint of the MPI library without affecting the communication performance.

Experimental evaluations show that the proposed dynamic and adaptive design is able to deliver performance on-par with what exhaustive manual tuning provides while limiting the memory consumed to the absolute minimum necessary to deliver the desired benefits. For instance, with the Amber molecular dynamics application at 1,024 processes, the proposed design is able to perform on-par with the best manually tuned versions while reducing the memory footprint of the MPI library by 25%. With the 3D-Stencil benchmark at 8,192 processes, the proposed design is able to deliver much better overlap of computation and communication as well as improved overall time compared to the default version. To the best of our knowledge, this is the first point-to-point communication protocol design that is capable of dynamically adapting to the communication requirements of end applications.

This research is supported in part by National Science Foundation grants #CNS-1419123, #CNS-1513120, #ACI-1450440 and #CCF-1565414.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Mellanox Technologies. http://www.mellanox.com

  2. Quadrics Supercomputers World Ltd. http://www.quadrics.com/

  3. Barrett, B.W., Brightwell, R., Hemmert, K.S., Wheeler, K.B., Underwood, K.D.: Using triggered operations to offload rendezvous messages. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 120–129. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24449-0_15

    Chapter  Google Scholar 

  4. Boden, N.J., Cohen, D., Felderman, R.E., Kulawik, A.E., Seitz, C.L., Seizovic, J.N., Su, W.: Myrinet: a gigabit-per-second local area network. IEEE Micro 15, 29–36 (1995)

    Article  Google Scholar 

  5. Brightwell, R., Underwood, K.: Evaluation of an eager protocol optimization for MPI. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 327–334. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39924-7_46

    Chapter  Google Scholar 

  6. Brunet, É., Trahay, F., Denis, A., Namyst, R.: A Sampling-based approach for communication libraries auto-tuning. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp. 299–307. IEEE (2011)

    Google Scholar 

  7. Case, D.A., Darden, T.A., Cheatham, T.E., Simmerling, C.L., Wang, J., Duke, R.E., Luo, R., Walker, R.C., Zhang, W., Merz, K.M., Roberts, B.P., Wang, B., Hayik, S., Roitberg, A., Seabra, G., Kolossváry, I., Wong, K.F., Paesani, F., Vanicek, J., Wu, X., Brozell, S.R., Steinbrecher, T., Gohlke, H., Cai, Q., Ye, X., Wang, J., Hsieh, M.-J., Cui, G., Roe, D.R., Mathews, D.H., Seetin, M.G., Sagui, C., Babin, V., Luchko, T., Gusarov, S., Kovalenko, A., Kollman, P.A.: Amber 2016, University of California, San Francisco (2016)

    Google Scholar 

  8. Open MPI: Open Source High Performance Computing. http://www.open-mpi.org

  9. Cui, Y., Moore, R., Olsen, K., Chourasia, A., Maechling, P., Minster, B., Day, S., Hu, Y., Zhu, J., Majumdar, A., Jordan, T.: Toward petascale earthquake simulations. Acta Geotech. 4, 79–93 (2008). Springer

    Article  Google Scholar 

  10. Derradji, S., Palfer-Sollier, T., Panziera, J.P., Poudes, A., Atos, F.W.: The BXI interconnect architecture. In: 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, pp. 18–25, August 2015

    Google Scholar 

  11. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI, message passing interface standard. Technical report, Argonne National Laboratory and Mississippi State University

    Google Scholar 

  12. Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Technical report SAND2009-5574, Sandia National Laboratories (2009)

    Google Scholar 

  13. Islam, T., Mohror, K., Schulz, M.: Exploring the capabilities of the new MPI_T interface. In: Proceedings of the 21st European MPI Users’ Group Meeting, p. 91. ACM (2014)

    Google Scholar 

  14. Liu, J., Jiang, W., Wyckoff, P., Panda, D.K., Ashton, D., Buntinas, D., Gropp, W., Toonen, B.: Design and implementation of MPICH2 over InfiniBand with RDMA support. In: Proceedings of Int’l Parallel and Distributed Processing Symposium (IPDPS 2004), April 2004

    Google Scholar 

  15. Lu, Y., Yang, C., Du, Y.: HPCG on Tianhe2

    Google Scholar 

  16. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, March 1994

    Google Scholar 

  17. Miceli, R., et al.: AutoTune: a plugin-driven approach to the automatic tuning of parallel applications. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 328–342. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36803-5_24

    Chapter  Google Scholar 

  18. MPI-3 Standard Document. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf

  19. Pimenta, A., Cesar, E., Sikora, A.: Methodology for MPI applications autotuning. In: Proceedings of the 20th European MPI Users’ Group Meeting, pp. 145–146. ACM (2013)

    Google Scholar 

  20. Portals Network Programming Interface. http://www.cs.sandia.gov/Portals/

  21. San Diego Supercomputing Center. Gordon Supercomputer. http://www.sdsc.edu/services/hpc/hpc_systems.html#gordon

  22. Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: InfiniBand scalability in open MPI. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS 2006, p. 100. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  23. Sikora, A., César, E., Comprés, I., Gerndt, M.: Autotuning of MPI applications using PTF. In: Proceedings of the ACM Workshop on Software Engineering Methods for Parallel and High Performance Applications, pp. 31–38. ACM (2016)

    Google Scholar 

  24. Sur, S., Chai, L., Jin, H., Panda, D.K.: Shared receive queue based scalable MPI design for InfiniBand clusters. In: Proceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS 2006, p. 101. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  25. Texas Advanced Computing Center. Stampede Supercomputer. http://www.tacc.utexas.edu/

  26. The MIMD Lattice Computation (MILC) Collaboration. http://physics.indiana.edu/~sg/milc.html

  27. Wu, J., Liu, J., Wyckoff, P., Panda, D.: Impact of on-demand connection management in MPI over via. In: Proceedings of the 2002 IEEE International Conference on Cluster Computing, pp. 152–159 (2002)

    Google Scholar 

  28. Yu, W., Gao, Q., Panda, D.K.: Adaptive connection management for scalable MPI over InfiniBand. In: Proceedings 20th IEEE International Parallel Distributed Processing Symposium, p. 10, April 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hari Subramoni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Subramoni, H., Chakraborty, S., Panda, D.K. (2017). Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10266. Springer, Cham. https://doi.org/10.1007/978-3-319-58667-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58667-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58666-3

  • Online ISBN: 978-3-319-58667-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics