User Level Failure Mitigation in MPI

  • Wesley Bland
Conference paper

DOI: 10.1007/978-3-642-36949-0_57

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7640)
Cite this paper as:
Bland W. (2013) User Level Failure Mitigation in MPI. In: Caragiannis I. et al. (eds) Euro-Par 2012: Parallel Processing Workshops. Euro-Par 2012. Lecture Notes in Computer Science, vol 7640. Springer, Berlin, Heidelberg

Introduction

In a constant effort to deliver steady performance improvements, the size of High Performance Computing (HPC) systems, as observed by the Top 500 ranking1, has grown tremendously over the last decade. This trend, along with the resultant decrease of the Mean Time Between Failure (MTBF), is unlikely to stop; thereby many computing nodes will inevitably fail during application execution [5]. It is alarming that most popular fault tolerant approaches see their efficiency plummet at Exascale [3,4], calling for more efficient approaches evolving around application centric failure mitigation strategies [7].

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Wesley Bland
    • 1
  1. 1.Innovative Computing LaboratoryUniversity of TennesseeUSA

Personalised recommendations