Rapid Fault-Space Exploration by Evolutionary Pruning

  • Horst Schirmeier
  • Christoph Borchert
  • Olaf Spinczyk
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8666)


Recent studies suggest that future microprocessors need low-cost fault-tolerance solutions for reliable operation. Several competing software-implemented error-detection methods have been shown to increase the overall resiliency when applied to critical spots in the system. Fault injection (FI) is a common approach to assess a system’s vulnerability to hardware faults. In an FI campaign comprising multiple runs of an application benchmark, each run simulates the impact of a fault in a specific hardware location at a specific point in time. Unfortunately, exhaustive FI campaigns covering all possible fault locations are infeasible even for small target applications. Commonly used sampling techniques, while sufficient to measure overall resilience improvements, lack the level of detail and accuracy needed for the identification of critical spots, such as important variables or program phases. Many faults are sampled out, leaving the developer without any information on the application parts they would have targeted.

We present a methodology and tool implementation that application-specifically reduces experimentation efforts, allows to freely trade the number of FI runs for result accuracy, and provides information on all possible fault locations. After training a set of Pareto-optimal heuristics, the experimenting user is enabled to specify a maximum number of FI experiments. A detailed evaluation with a set of benchmarks running on the eCos embedded OS, including MiBench’s automotive benchmark category, emphasizes the applicability and effectiveness of our approach: For example, when the user chooses to run only 1.5% of all FI experiments, the average result accuracy is still 99.84%.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Borkar, S.Y.: Designing reliable systems from unreliable components: The challenges of transistor variability and degradation. IEEE Micro 25(6), 10–16 (2005)CrossRefGoogle Scholar
  2. 2.
    Duranton, M., Yehia, S., de Sutter, B., de Bosschere, K., Cohen, A., Falsafi, B., Gaydadjiev, G., Katevenis, M., Maebe, J., Munk, H., Navarro, N., Ramirez, A., Temam, O., Valero, M.: The HiPEAC vision. Technical report, HiPEAC (2010)Google Scholar
  3. 3.
    Narayanan, V., Xie, Y.: Reliability concerns in embedded system designs. IEEE Comp. 39(1), 118–120 (2006)CrossRefGoogle Scholar
  4. 4.
    Hari, S.K.S., Adve, S.V., Naeimi, H.: Low-cost program-level detectors for reducing silent data corruptions. In: 42nd IEEE/IFIP Int. Conf. on Dep. Sys. & Netw., DSN 2012. IEEE (2012)Google Scholar
  5. 5.
    Borchert, C., Schirmeier, H., Spinczyk, O.: Generative software-based memory error detection and correction for operating system data structures. In: 43rd IEEE/IFIP Int. Conf. on Dep. Sys. & Netw., DSN 2013. IEEE (June 2013)Google Scholar
  6. 6.
    Borchert, C., Schirmeier, H., Spinczyk, O.: Protecting the dynamic dispatch in C++ by dependability aspects. In: 1st GI W’shop on SW-Based Methods for Robust Embedded Sys., SOBRES 2012. LNI, pp. 521–535. German Society of Informatics (September 2012)Google Scholar
  7. 7.
    Borchert, C., Schirmeier, H., Spinczyk, O.: Return-address protection in C/C++ code by dependability aspects. In: 2nd GI W’shop on SW-Based Methods for Robust Embedded Sys., SOBRES 2013. LNI. German Society of Informatics (September 2013)Google Scholar
  8. 8.
    Arlat, J., Aguera, M., Amat, L., Crouzet, Y., Fabre, J.C., Laprie, J.C., Martins, E., Powell, D.: Fault injection for dependability validation: A methodology and some applications. IEEE TOSE 16(2), 166–182 (1990)Google Scholar
  9. 9.
    Benso, A., Prinetto, P.: Fault injection techniques and tools for embedded systems reliability evaluation. Frontiers in electronic testing. Kluwer, Boston (2003)MATHGoogle Scholar
  10. 10.
    Leveugle, R., Calvez, A., Maistri, P., Vanhauwaert, P.: Statistical fault injection: Quantified error and confidence. In: IEEE 2009 Conf. on Design, Autom. & Test in Europe, DATE 2009, pp. 502–506 (2009)Google Scholar
  11. 11.
    Ramachandran, P., Kudva, P., Kellington, J., Schumann, J., Sanda, P.: Statistical fault injection. In: 38th IEEE/IFIP Int. Conf. on Dep. Sys. & Netw., DSN 2008, pp. 122–127. IEEE (2008)Google Scholar
  12. 12.
    Schirmeier, H., Hoffmann, M., Kapitza, R., Lohmann, D., Spinczyk, O.: FAIL*: Towards a versatile fault-injection experiment framework. In: Mühl, G., Richling, J., Herkersdorf, A. (eds.) 25th Int. Conf. on Arch. of Comp. Sys., ARCS 2012, Workshop Proceedings. LNI, vol. 200, pp. 201–210. German Society of Informatics (March 2012)Google Scholar
  13. 13.
    Massa, A.: Embedded Software Development with eCos. Prentice Hall (2002)Google Scholar
  14. 14.
    Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: MiBench: A free, commercially representative embedded benchmark suite. In: IEEE Int. W’shop. on Workload Characterization (WWC 2001), pp. 3–14. IEEE, Washington, DC (2001)Google Scholar
  15. 15.
    Mukherjee, S.: Architecture Design for Soft Errors. Morgan Kaufmann (2008)Google Scholar
  16. 16.
    Smith, D.T., Johnson, B.W., Profeta III., J.A., Bozzolo, D.G.: A method to determine equivalent fault classes for permanent and transient faults. In: Annual Reliability and Maintainability Symposium, pp. 418–424 (January 1995)Google Scholar
  17. 17.
    Benso, A., Rebaudengo, M., Impagliazzo, L., Marmo, P.: Fault-list collapsing for fault-injection experiments. In: Annual Reliability and Maintainability Symposium (January 1998)Google Scholar
  18. 18.
    Berrojo, L., Gonzalez, I., Corno, F., Reorda, M., Squillero, G., Entrena, L., Lopez, C.: New techniques for speeding-up fault-injection campaigns. In: 2002 Conf. on Design, Autom. & Test in Europe, DATE 2002, pp. 847–852 (2002)Google Scholar
  19. 19.
    Barbosa, R., Vinter, J., Folkesson, P., Karlsson, J.: Assembly-level pre-injection analysis for improving fault injection efficiency. In: Dal Cin, M., Kaâniche, M., Pataricza, A. (eds.) EDCC 2005. LNCS, vol. 3463, pp. 246–262. Springer, Heidelberg (2005)Google Scholar
  20. 20.
    Grinschgl, J., Krieg, A., Steger, C., Weiss, R., Bock, H., Haid, J.: Efficient fault emulation using automatic pre-injection memory access analysis. In: SOC Conference, pp. 277–282 (2012)Google Scholar
  21. 21.
    Hari, S.K.S., Adve, S.V., Naeimi, H., Ramachandran, P.: Relyzer: Exploiting application-level fault equivalence to analyze application resiliency to transient faults. In: 17th Int. Conf. on Arch. Support for Programming Languages and Operating Systems, ASPLOS 2012, pp. 123–134. ACM, New York (2012)Google Scholar
  22. 22.
    Döbel, B., Schirmeier, H., Engel, M.: Investigating the limitations of PVF for realistic program vulnerability assessment. In: 5rd HiPEAC W’shop on Design for Reliability (DFR 2013), Berlin, Germany (January 2013)Google Scholar
  23. 23.
    Li, J., Tan, Q.: SmartInjector: Exploiting intelligent fault injection for SDC rate analysis. In: IEEE Int. Symp. on Defect & Fault Tol. in VLSI & Nanotech. Sys., DFT 2013 (2013)Google Scholar
  24. 24.
    Lawton, K.P.: Bochs: A portable PC emulator for Unix/X. Linux Journal 1996(29es) (1996)Google Scholar
  25. 25.
    Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In: Giannakoglou, K.C., Tsahalis, D.T., Périaux, J., Papailiou, K.D., Fogarty, T. (eds.) Evolutionary Methods for Design Optimization and Control with Applications to Industrial Problems, Athens, Greece. International Center for Numerical Methods in Engineering, pp. 95–100 (September 2001)Google Scholar
  26. 26.
    Bleuler, S., Laumanns, M., Thiele, L., Zitzler, E.: PISA — a platform and programming language independent interface for search algorithms. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 494–508. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  27. 27.
    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press (1998)Google Scholar
  28. 28.
    Hoffmann, M., Borchert, C., Dietrich, C., Schirmeier, H., Kapitza, R., Spinczyk, O., Lohmann, D.: Effectiveness of fault detection mechanisms in static and dynamic operating system designs. In: 17th IEEE Int. Symp. on OO Real-Time Distrib. Computing, ISORC 2014. IEEE (2014)Google Scholar
  29. 29.
    Smith, D.T., Johnson, B.W., Andrianos, N., Profeta III., J.A.: A variance-reduction technique via fault-expansion for fault-coverage estimation. IEEE TR 46(3), 366–374 (1997)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Horst Schirmeier
    • 1
  • Christoph Borchert
    • 1
  • Olaf Spinczyk
    • 1
  1. 1.Computer Science 12Technische Universität DortmundDortmundGermany

Personalised recommendations