Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation

  • Edgar Gabriel
  • Graham E. Fagg
  • George Bosilca
  • Thara Angskun
  • Jack J. Dongarra
  • Jeffrey M. Squyres
  • Vishal Sahay
  • Prabhanjan Kambadur
  • Brian Barrett
  • Andrew Lumsdaine
  • Ralph H. Castain
  • David J. Daniel
  • Richard L. Graham
  • Timothy S. Woodall
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3241)

Abstract

A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical challenges for end users. Building upon prior research, and influenced by experience gained from the code bases of the LAM/MPI, LA-MPI, and FT-MPI projects, Open MPI is an all-new, production-quality MPI-2 implementation that is fundamentally centered around component concepts. Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI. Its component architecture provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons. This paper presents a high-level overview the goals, design, and implementation of Open MPI.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bosilca, G., Bouteiller, A., Cappello, F., Djilali, S., Fedak, G., Germain, C., Herault, T., Lemarinier, P., Lodygensky, O., Magniette, F., Neri, V., Selikhov, A.: MPICH-V: Toward a scalable fault tolerant MPI for volatile nodes. In: SC 2002 Conference CD, Baltimore, MD, IEEE/ACM, New York (2002)Google Scholar
  2. 2.
    Bernholdt, D.E., et al.: A component architecture for high-performance scientific computing. Intl. J. High-Performance Computing Applications (2004)Google Scholar
  3. 3.
    Fagg, G.E., Gabriel, E., Chen, Z., Angskun, T., Bosilca, G., Bukovski, A., Dongarra, J.J.: Fault tolerant communication library and applications for high perofrmance. In: Los Alamos Computer Science Institute Symposium, Santa Fee, NM, October 27-29 (2003)Google Scholar
  4. 4.
    Graham, R.L., Choi, S.-E., Daniel, D.J., Desai, N.N., Minnich, R.G., Rasmussen, C.E., Risinger, L.D., Sukalksi, M.W.: A network-failure-tolerant messagepassing system for terascale clusters. International Journal of Parallel Programming 31(4), 285–303 (2003)MATHCrossRefGoogle Scholar
  5. 5.
    Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: Mag-PIe: MPI’s collective communication operations for clustered wide area systems. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 1999), May 1999, vol. 34(8), pp. 131–140 (1999)Google Scholar
  6. 6.
    Message Passing Interface Forum. MPI: A Message Passing Interface Standard (June 1995), http://www.mpi-forum.org
  7. 7.
    Message Passing Interface Forum. MPI-2: Extensions to the Message Passing Interface (July 1997), http://www.mpi-forum.org
  8. 8.
    Sankaran, S., Squyres, J.M., Barrett, B., Lumsdaine, A., Duell, J., Hargrove, P., Roman, E.: The LAM/MPI checkpoint/restart framework: System-initiated checkpointing. International Journal of High Performance Computing Applications (2004) (to appear)Google Scholar
  9. 9.
    Squyres, J.M., Lumsdaine, A.: A Component Architecture for LAM/MPI. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 379–387. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Thakur, R., Gropp, W., Lusk, E.: Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th Symposium on the Frontiers of Massively Parallel Computation, February 1999, pp. 182–189. IEEE Computer Society Press, Los Alamitos (1999)CrossRefGoogle Scholar
  11. 11.
    Woodall, T.S., Graham, R.L., Castain, R.H., Daniel, D.J., Sukalski, M.W., Fagg, G.E., Gabriel, E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A.: TEG: A high-performance, scalable, multinetwork point-to-point communications methodology. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, Budapest, Hungary (September 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Edgar Gabriel
    • 1
  • Graham E. Fagg
    • 1
  • George Bosilca
    • 1
  • Thara Angskun
    • 1
  • Jack J. Dongarra
    • 1
  • Jeffrey M. Squyres
    • 2
  • Vishal Sahay
    • 2
  • Prabhanjan Kambadur
    • 2
  • Brian Barrett
    • 2
  • Andrew Lumsdaine
    • 2
  • Ralph H. Castain
    • 3
  • David J. Daniel
    • 3
  • Richard L. Graham
    • 3
  • Timothy S. Woodall
    • 3
  1. 1.Innovative Computing LaboratoryUniversity of Tennessee 
  2. 2.Open System LaboratoryIndiana University 
  3. 3.Advanced Computing LaboratoryLos Alamos National Lab 

Personalised recommendations