FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World

  • Graham E. Fagg
  • Jack J. Dongarra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1908)

Abstract

Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models, subsequently forcing them to support dynamic process operations would have effected their performance. As current HPC systems increase in size with higher potential levels of individual node failure, the need rises for new fault tolerant systems to be developed. Here we present a new implementation of MPI called FT-MPI1 that allows the semantics and associated failure modes to be completely controlled by the application. Given is an overview of the FT-MPI semantics, design and some performance issues as well as the HARNESS g_hcore implementation it is built upon.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Graham E. Fagg
    • 1
  • Jack J. Dongarra
    • 1
  1. 1.Department of Computer ScienceUniversity of TennesseeKnoxvilleUSA

Personalised recommendations