Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2011: Euro-Par 2011: Parallel Processing Workshops pp 282–291Cite as

  1. Home
  2. Euro-Par 2011: Parallel Processing Workshops
  3. Conference paper
Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience

Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience

  • Nathan DeBardeleben30,
  • Sean Blanchard30,
  • Qiang Guan30,31,
  • Ziming Zhang30,31 &
  • …
  • Song Fu31 
  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7156)

Abstract

As the high performance computing (HPC) community continues to push for ever larger machines, reliability remains a serious obstacle. Further, as feature size and voltages decrease, the rate of transient soft errors is on the rise. HPC programmers of today have to deal with these faults to a small degree and it is expected this will only be a larger problem as systems continue to scale.

In this paper we present SEFI, the Soft Error Fault Injection framework, a tool for profiling software for its susceptibility to soft errors. In particular, we focus in this paper on logic soft error injection. Using the open source virtual machine and processor emulator (QEMU), we demonstrate modifying emulated machine instructions to introduce soft errors. We conduct experiments by modifying the virtual machine itself in a way that does not require intimate knowledge of the tested application. With this technique, we show that we are able to inject simulated soft errors in the logic operations of a target application without affecting other applications or the operating system sharing the VM. We present some initial results and discuss where we think this work will be useful in next generation hardware/software co-design.

Keywords

  • soft errors
  • resilience
  • fault tolerance
  • reliability
  • fault injection
  • virtual machines
  • high performance computing
  • supercomputing

Download conference paper PDF

References

  1. Bellard, F.: Qemu, a fast and portable dynamic translator. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC 2005, p. 41. USENIX Association, Berkeley (2005)

    Google Scholar 

  2. Bronevetsky, G., Laguna, I., Bagchi, S., de Supinski, B., Schulz, M., Anh, D.: Statistical fault detection for parallel applications with automaded. In: IEEE Workshop on Silicon Errors in Logic - System Effects, SELSE (March 2010)

    Google Scholar 

  3. Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Workshop on Silicon Errors in Logic - System Effects, SELSE (April 2007)

    Google Scholar 

  4. Bronevetsky, G., de Supinski, B.R., Schulz, M.: A foundation for the accurate prediction of the soft error vulnerability of scientic applications. In: IEEE Workshop on Silicon Errors in Logic - System Effects (March 2009)

    Google Scholar 

  5. Cappello, F., Geist, A., Gropp, B., Kale, L., Kramer, B., Snir, M.: Toward exascale resilience. International Journal of High Performance Computing Applications 23, 374–388 (2009)

    CrossRef  Google Scholar 

  6. DeBardeleben, N., Laros, J., Daly, J., Scott, S., Engelmann, C., Harrod, B.: High-end computing resilience: Analysis of issues facing the hec community and path-forward for research and development (December 2009), http://institute.lanl.gov/resilience/docs/HECResilience.pdf

  7. Dongarra, J., et al.: The international exascale software project roadmap. International Journal of High Performance Computing Applications 25, 3–60 (2011)

    CrossRef  Google Scholar 

  8. Kogge, P., et al.: Exascale computing study: Technology challenges in achieving exascale systems (2008)

    Google Scholar 

  9. Naughton, T., Bland, W., Vallee, G., Engelmann, C., Scott, S.L.: Fault injection framework for system resilience evaluation: fake faults for finding future failures. In: Proceedings of the 2009 Workshop on Resiliency in High Performance, Resilience 2009, pp. 23–28. ACM, New York (2009)

    CrossRef  Google Scholar 

  10. Quinn, H., Graham, P.: Terrestrial-based radiation upsets: A cautionary tale. In: Proceedings of the 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, pp. 193–202. IEEE Computer Society, Washington, DC (2005)

    Google Scholar 

  11. Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poonsankam, P., Saxena, P.: A high-level overview covering vine, temu, and rudder. In: Proceedings of the 4th International Conference on Information Systems Security (December 2008)

    Google Scholar 

  12. Stott, D., Floering, B., Burke, D., Kalbarczpk, Z., Iyer, R.: Nftape: a framework for assessing dependability in distributed systems with lightweight fault injectors. In: Proceedings of IEEE International Computer Performance and Dependability Symposium, IPDS 2000, pp. 91–100 (2000)

    Google Scholar 

  13. Ziegler, J.F., Lanford, W.A.: The effect of sea level cosmic rays on electric devices. Journal Applied Physics 528 (1981)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. High Performance Computing Division, Los Alamos National Laboratory, Ultrascale Systems Research Center, Los Alamos, NM, 87544, USA

    Nathan DeBardeleben, Sean Blanchard, Qiang Guan & Ziming Zhang

  2. Department of Computer Science and Engineering, University of North Texas, Dependable Computing Systems Lab, Denton, TX, 76203, USA

    Qiang Guan, Ziming Zhang & Song Fu

Authors
  1. Nathan DeBardeleben
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Sean Blanchard
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Qiang Guan
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Ziming Zhang
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Song Fu
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria

    Michael Alexander

  2. ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy

    Pasqua D’Ambra

  3. University of Amsterdam, 1090, Amsterdam, Netherlands

    Adam Belloum

  4. Innovative Computing Laboratory, The University of Tennessee, US

    George Bosilca

  5. Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy

    Mario Cannataro

  6. Computer Science Department, University of Pisa, Italy

    Marco Danelutto

  7. Second University of Naples, Italy

    Beniamino Di Martino

  8. TUMünchen,, Boltzmannstr. 3, ,, 85748, Garching, Germany

    Michael Gerndt

  9. Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Emmanuel Jeannot & Raymond Namyst & 

  10. Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France

    Jean Roman

  11. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831-6164, Oak Ridge, TN, USA

    Stephen L. Scott

  12. Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austria

    Jesper Larsson Traff

  13. Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA

    Geoffroy Vallée

  14. Technische Universität München, Germany

    Josef Weidendorfer

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

DeBardeleben, N., Blanchard, S., Guan, Q., Zhang, Z., Fu, S. (2012). Experimental Framework for Injecting Logic Errors in a Virtual Machine to Profile Applications for Soft Error Resilience. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_32

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-29740-3_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29739-7

  • Online ISBN: 978-3-642-29740-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature