ExaMPI: A Modern Design and Implementation to Accelerate Message Passing Interface Innovation

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1087)


The difficulty of deep experimentation with Message Passing Interface (MPI) implementations—which are quite large and complex—substantially raises the cost and complexity of proof-of-concept activities and limits the community of potential contributors to new and better MPI features and implementations alike. Our goal is to enable researchers to experiment rapidly and easily with new concepts, algorithms, and internal protocols for MPI, we introduce ExaMPI, a modern MPI-3.x subset with a robust MPI-4.x roadmap. We discuss design, early implementation, and ongoing utilization in parallel programming research, plus specific research activities enabled by ExaMPI.

Architecturally, ExaMPI is a C++17-based library designed for modularity, extensibility, and understandability. The code base supports both native C++ threading with thread-safe data structures and a modular progress engine. In addition, the transport abstraction implements UDP, TCP, OFED verbs, and LibFabrics for high-performance networks.

By enabling researchers with ExaMPI, we seek to accelerate innovations and increase the number of new experiments and experimenters, all while expanding MPI’s applicability.


MPI Middleware architecture Parallel programming models Performance portability Cost of portability 


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Intel MPI library, August 2018.
  6. 6.
    Bangalore, P., Rabenseifner, R., Holmes, D., Jaeger, J., Mercier, G., Blaas-Schenner, C., Skjellum, A.: Exposition, clarification, and expansion of MPI semantic terms and conventions (2019). Under reviewGoogle Scholar
  7. 7.
    Barigou, Y., Venkatesan, V., Gabriel, E.: Auto-tuning non-blocking collective communication operations. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1204–1213, May 2015.
  8. 8.
    Castillo, E., et al.: Optimizing computation-communication overlap in asynchronous task-based programs: poster. In: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, pp. 415–416. ACM, New York (2019).
  9. 9.
    Denis, A., Trahay, F.: MPI overlap: benchmark and analysis. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 258–267, August 2016.
  10. 10.
    Didelot, S., Carribault, P., Pérache, M., Jalby, W.: Improving MPI communication overlap with collaborative polling. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds.) EuroMPI 2012. LNCS, vol. 7490, pp. 37–46. Springer, Heidelberg (2012). Scholar
  11. 11.
    Dimitrov, R.P.: Overlapping of communication and computation and early binding: fundamental mechanisms for improving parallel performance on clusters of workstations. Ph.D. thesis, Mississippi State, MS, USA (2001)Google Scholar
  12. 12.
    Graham, R.L., Shipman, G.M., Barrett, B.W., Castain, R.H., Bosilca, G., Lumsdaine, A.: Open MPI: a high-performance, heterogeneous MPI. In: Cluster 2006, pp. 1–9, September 2006Google Scholar
  13. 13.
    Grant, R.E., Dosanjh, M.G.F., Levenhagen, M.J., Brightwell, R., Skjellum, A.: Finepoints: partitioned multithreaded MPI communication. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds.) ISC High Performance 2019. LNCS, vol. 11501, pp. 330–350. Springer, Cham (2019). Scholar
  14. 14.
    Gropp, W.: MPICH2: a new start for MPI implementations. In: Kranzlmüller, D., Volkert, J., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2002. LNCS, vol. 2474, pp. 7–7. Springer, Heidelberg (2002). Scholar
  15. 15.
    Guo, J., Yi, Q., Meng, J., Zhang, J., Balaji, P.: Compiler-assisted overlapping of communication and computation in MPI applications. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 60–69, September 2016.
  16. 16.
    Hager, G., Schubert, G., Wellein, G.: Prospects for truly asynchronous communication with pure MPI and hybrid MPI/OpenMP on current supercomputing platforms (2011)Google Scholar
  17. 17.
    Hassani, A.: Toward a scalable, transactional, fault-tolerant message passing interface for petascale and exascale machines. Ph.D. thesis, UAB (2016)Google Scholar
  18. 18.
    Hoefler, T., Lumsdaine, A.: Message progression in parallel computing - to thread or not to thread? In: 2008 IEEE International Conference on Cluster Computing, pp. 213–222, September 2008.
  19. 19.
    Holmes, D., et al.: MPI sessions: leveraging runtime infrastructure to increase scalability of applications at exascale. In: EuroMPI 2016, pp. 121–129. ACM, New York (2016)Google Scholar
  20. 20.
    Holmes, D.J., Morgan, B., Skjellum, A., Bangalore, P.V., Sridharan, S.: Planning for performance: Enhancing achievable performance for MPI through persistent collective operations. PARCOMP 81, 32–57 (2019)MathSciNetGoogle Scholar
  21. 21.
    ISO: ISO/IEC 14882:2017 Information technology – Programming languages – C++. Fifth edn., December 2017.
  22. 22.
    Liu, J., et al.: Performance comparison of MPI implementations over Infiniband, Myrinet and Quadrics. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003, pp. 58–58, November 2003.
  23. 23.
    Laguna, I., Mohror, K., Sultana, N., Rüfenacht, M., Marshall, R., Skjellum, A.: A large-scale study of MPI usage in open-source HPC applications. In: Proceedings of the SC 2019, November 2019 (2019, in press).
  24. 24.
    Lu, H., Seo, S., Balaji, P.: MPI+ULT: overlapping communication and computation with user-level threads. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, pp. 444–454, August 2015.
  25. 25.
    Panda, D.K., Tomko, K., Schulz, K., Majumdar, A.: The MVAPICH project: evolution and sustainability of an open source production quality MPI library for HPC. In: WSPPE (2013)Google Scholar
  26. 26.
    Skjellum, A., et al.: Object-oriented analysis and design of the message passing interface. Concurrency Comput.: Practice Exp. 13(4), 245–292 (2001).
  27. 27.
    Sridharan, S., Dinan, J., Kalamkar, D.D.: Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 487–498. IEEE Press, Piscataway (2014).
  28. 28.
    Sultana, N., Rüfenacht, M., Skjellum, A., Laguna, I., Mohror, K.: Failure recovery for bulk synchronous applications with MPI stages. PARCOMP 84, 1–14 (2019)Google Scholar
  29. 29.
    Wittmann, M., Hager, G., Zeiser, T., Wellein, G.: Asynchronous MPI for the masses. arXiv preprint arXiv:1302.4280 (2013)

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.University of Tennessee at ChattanoogaChattanoogaUSA
  2. 2.Auburn UniversityAuburnUSA
  3. 3.EPCCUniversity of EdinburghEdinburghScotland, UK
  4. 4.Tennessee Tech UniversityCookevilleUSA
  5. 5.Lawrence Livermore National LaboratoryLivermoreUSA

Personalised recommendations