FlipIt: An LLVM Based Fault Injector for HPC

  • Jon Calhoun
  • Luke Olson
  • Marc Snir
Conference paper

DOI: 10.1007/978-3-319-14325-5_47

Volume 8805 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Calhoun J., Olson L., Snir M. (2014) FlipIt: An LLVM Based Fault Injector for HPC. In: Lopes L. et al. (eds) Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8805. Springer, Cham

Abstract

High performance computing (HPC) is increasingly subjected to faulty computations. The frequency of silent data corruptions (SDCs) in particular is expected to increase in emerging machines requiring HPC applications to handle SDCs. In this paper we, propose a robust fault injector structured through an LLVM compiler pass that allows simulation of SDCs in various applications. Although fault injection locations are enumerated at compile time, their activation is purely at runtime and based on a user-provided fault distribution. The robustness of our fault injector is in the ability to augment the runtime injection logic on a per application basis. This allows tighter control on the spacial, temporal, and probability of injected faults. The usability, scalability, and robustness of our fault injection is demonstrated with injecting faults into an algebraic multigird solver.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jon Calhoun
    • 1
  • Luke Olson
    • 1
  • Marc Snir
    • 1
  1. 1.University of Illinois at Urbana-ChampaignUrbanaUSA