MPI on a Million Processors

  • Pavan Balaji
  • Darius Buntinas
  • David Goodell
  • William Gropp
  • Sameer Kumar
  • Ewing Lusk
  • Rajeev Thakur
  • Jesper Larsson Träff
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5759)


Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.


Fault Tolerance Memory Consumption Graph Topology Collective Operation Broadcast Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Barbay, J., Navarro, G.: Compressed representations of permutations, and applications. In: Proc. of 26th Int’l Symposium on Theoretical Aspects of Computer Science (STACS), pp. 111–122 (2009)Google Scholar
  3. 3.
    Bonachea, D., Duell, J.: Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. In: 2nd Workshop on Hardware/Software Support for High Performance Sci. and Eng. Computing (2003)Google Scholar
  4. 4.
    Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fedak, G., Germain, C., Herault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Neri, V., Selikhov, A.: MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In: Proc. of SC 2002. IEEE, Los Alamitos (2002)Google Scholar
  5. 5.
    Chapman, B., Jost, G., van der Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, Cambridge (2007)Google Scholar
  6. 6.
    Fagg, G.E., Dongarra, J.J.: FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 346–353. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Gropp, W.D., Lusk, E.: Fault tolerance in MPI programs. Int’l Journal of High Performance Computer Applications 18(3), 363–372 (2004)CrossRefGoogle Scholar
  8. 8.
    Hoefler, T., Träff, J.L.: Sparse collective operations for MPI. In: Proc. of 14th Int’l Workshop on High-level Parallel Programming Models and Supportive Environments at IPDPS (2009)Google Scholar
  9. 9.
    Jitsumoto, H., Endo, T., Matsuoka, S.: ABARIS: An adaptable fault detection/recovery component framework for MPIs. In: Proc. of 12th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS 2007) in conjunction with IPDPS 2007 (March 2007)Google Scholar
  10. 10.
    Kumar, S., Dozsa, G., Berg, J., Cernohous, B., Miller, D., Ratterman, J., Smith, B., Heidelberger, P.: Architecture of the component collective messaging interface. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 23–32. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
  12. 12.
  13. 13.
  14. 14.
    Pieper, S.C., Wiringa, R.B.: Quantum Monte Carlo Calculations of Light Nuclei. Annu. Rev. Nucl. Part. Sci. 51, 53 (2001)CrossRefGoogle Scholar
  15. 15.
    Proposal for distributed graph topology,
  16. 16.
    Rabenseifner, R., Hager, G., Jost, G.: Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In: Proc. of 17th Euromicro Int’l Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009), February 2009, pp. 427–236 (2009)Google Scholar
  17. 17.
    Rane, A., Stanzione, D.: Experiences in tuning performance of hybrid MPI/OpenMP applications on quad-core systems. In: Proc. of 10th LCI Int’l Conference on High-Performance Clustered Computing (March 2009)Google Scholar
  18. 18.
    Ross, R., Miller, N., Gropp, W.: Implementing fast and reusable datatype processing. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 404–413. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int’l Journal of High-Performance Computing Applications 19(1), 49–66 (spring 2005)CrossRefGoogle Scholar
  20. 20.
    Träff, J.L.: SMP-aware message passing programming. In: Proc. of 8th Int’l Workshop on High-level Parallel Programming Models and Supportive Environments at IPDPS 2003, pp. 56–65 (2003)Google Scholar
  21. 21.
    Träff, J.L.: A simple work-optimal broadcast algorithm for message-passing parallel systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 173–180. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Pavan Balaji
    • 1
  • Darius Buntinas
    • 1
  • David Goodell
    • 1
  • William Gropp
    • 2
  • Sameer Kumar
    • 3
  • Ewing Lusk
    • 1
  • Rajeev Thakur
    • 1
  • Jesper Larsson Träff
    • 4
  1. 1.Argonne National LaboratoryArgonneUSA
  2. 2.University of IllinoisUrbanaUSA
  3. 3.IBM T.J. Watson Research Center, Yorktown HeightsUSA
  4. 4.NEC Laboratories EuropeSankt AugustinGermany

Personalised recommendations