Effectiveness of Software-Based Hardening for Radiation-Induced Soft Errors in Real-Time Operating Systems

  • Thiago Santini
  • Christoph Borchert
  • Christian Dietrich
  • Horst Schirmeier
  • Martin Hoffmann
  • Olaf Spinczyk
  • Daniel Lohmann
  • Flávio Rech Wagner
  • Paolo Rech
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10172)

Abstract

For decades, radiation-induced failures have been a known issue for aero-space systems, in which redundancy mechanisms are employed as a protection method. Due to the shrinking of structures and operating voltages, these failures are increasingly becoming an issue even for terrestrial applications. Unfortunately, redundancy increases costs, area usage, and power consumption, which can hinder its utilization in cost- and power-sensitive safety-critical applications, such as automotive. To overcome this limitation, multiple software-based approaches have been proposed, which assume the existence of an underlying error-free operating system. In this paper, we investigate the radiation reliability of two dependability-oriented real-time operating systems, namely, the popular eCos operating system hardened through aspect-oriented programming methods, and dOSEK, an embedded kernel designed from the ground up having reliability as a major concern. Both operating systems were evaluated through extensive neutron-beam testings on a 28 nm ARM-based state-of-the-art system-on-chip, and their fault tolerance mechanisms reached reductions in the overall cross-sections relative to their baselines up to 91% and 74%, respectively.

References

  1. 1.
    ISO/DIS 26262. Technical report (2011)Google Scholar
  2. 2.
    Baumann, R.: Soft errors in advanced computer systems. IEEE Design Test Comput. 22(3), 258–266 (2005)CrossRefGoogle Scholar
  3. 3.
    Borchert, C., Spinczyk, O.: Hardening an L4 microkernel against soft errors by aspect-oriented programming and whole-program analysis. In: Proceeding of the 8th Workshop on Programming Languages and Operating Systems. ACM (2015)Google Scholar
  4. 4.
    Borchert, C., et al.: Generative software-based memory error detection and correction for operating system data structures. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 1–12. IEEE (2013)Google Scholar
  5. 5.
    Borchert, C., et al.: Generic soft-error detection and correction for concurrent data structures. IEEE Trans. Dependable Secure Comput. PP(99) (2015)Google Scholar
  6. 6.
    Dietrich, C., et al.: Cross-kernel control-flow-graph analysis for event-driven real-time systems. In: Proceeding of the Conference on Languages, Compilers and Tools for Embedded Systems (LCTES 2015). ACM, June 2015Google Scholar
  7. 7.
  8. 8.
    Gu, W., et al.: Characterization of Linux kernel behavior under errors. In: International Conference on Dependable Systems and Networks (DSN). IEEE (2003)Google Scholar
  9. 9.
    Guillen Salas, A., et al.: PhoneSat in-flight experience results. In: Proceeding of the Small Satellites and Services Symposium, May 2014Google Scholar
  10. 10.
    Hamming, R.W.: Error detecting and error correcting codes. Bell Syst. Tech. J. 29(2), 147–160 (1950)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Herrera-Alzu, I., Lopez-Vallejo, M.: System design framework and methodology for Xilinx Virtex FPGA configuration scrubbers. IEEE Trans. Nucl. Sci. 61(1), 619–629 (2014)CrossRefGoogle Scholar
  12. 12.
    Hoffmann, M., et al.: dOSEK: the design and implementation of a dependability-oriented static embedded kernel. In: Proceeding of the 21st Real-Time and Embedded Technology and Applications (RTAS 2015). pp. 259–270. IEEE, April 2015Google Scholar
  13. 13.
    JEDEC Solid State Technology Association: JESD89-3A: Test Method for Beam Accelerated Soft Error Rate. http://www.jedec.org/standards-documents/docs/jesd-89-3a
  14. 14.
    Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M., Irwin, J.: Aspect-oriented programming. In: Akşit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997). doi:10.1007/BFb0053381 Google Scholar
  15. 15.
    Lesea, A., et al.: Soft error study of ARM SoC at 28 nanometers. In: Proceeding of the IEEE Workshop on Silicon Errors in Logic - System Effects 2014 (2014)Google Scholar
  16. 16.
    Massa, A.: Embedded Software Development with eCos. Prentice Hall Professional Technical Reference (2002)Google Scholar
  17. 17.
    Mukherjee, S.S., et al.: A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In: Proceeding of the 36th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE (2003)Google Scholar
  18. 18.
    OSEK/VDX Group: operating system specification 2.2.3. Technical report. http://portal.osek-vdx.org/files/pdf/specs/os223.pdf, Accessed 29 Sept 2014
  19. 19.
    Quinn, H., et al.: Single-event effects in low-cost, low-power microprocessors. In: Radiation Effects Data Workshop (REDW), pp. 1–9. IEEE, July 2014Google Scholar
  20. 20.
    Santini, T., et al.: Reducing embedded software radiation-induced failures through cache memories. In: 19th European Test Symposium (ETS), pp. 1–6. IEEE (2014)Google Scholar
  21. 21.
    Santini, T., et al.: Beyond cross-section: spatio-temporal reliability analysis. ACM Trans. Embed. Comput. Syst. 15(1), 3:1–3:16 (2015)CrossRefGoogle Scholar
  22. 22.
    Santini, T., et al.: Exploiting cache conflicts to reduce radiation sensitivity of operating systems on embedded systems. In: Proceeding of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES, pp. 49–58. IEEE (2015)Google Scholar
  23. 23.
    Schirmeier, H., et al.: FAIL*: an open and versatile fault-injection framework for the assessment of software-implemented hardware fault tolerance. In: Proceeding of the 11th European Dependable Computing Conference, pp. 245–255. IEEE, September 2015Google Scholar
  24. 24.
    Shirvani, P.P., et al.: Software-implemented EDAC protection against SEUs. IEEE Trans. Reliab. 49(3), 273–284 (2000)CrossRefGoogle Scholar
  25. 25.
    Shye, A., et al.: PLR: a software approach to transient fault tolerance for multicore architectures. IEEE Trans. Dependable Secure Comput. (2009)Google Scholar
  26. 26.
    Smith, D.J., Simpson, K.G.: Safety Critical Systems Handbook: a straightfoward guide to functional safety, IEC 61508 and related standards, including process IEC 61511 and machinery IEC 62061 and ISO 13849. Elsevier (2010)Google Scholar
  27. 27.
    Spinczyk, O., Lohmann, D.: The design and implementation of AspectC++. Knowl.-Based Syst. 20(7), 636–651 (2007). Special Issue on Techniques to Produce Intelligent Secure SoftwareCrossRefGoogle Scholar
  28. 28.
    Wang, C., et al.: Compiler-managed software-based redundant multi-threading for transient fault detection. In: Proceeding of the International Symposium on Code Generation and Optimization, CGO 2007, pp. 244–258. IEEE (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Thiago Santini
    • 1
  • Christoph Borchert
    • 2
  • Christian Dietrich
    • 3
  • Horst Schirmeier
    • 2
  • Martin Hoffmann
    • 3
  • Olaf Spinczyk
    • 2
  • Daniel Lohmann
    • 3
  • Flávio Rech Wagner
    • 4
  • Paolo Rech
    • 4
  1. 1.University of TübingenTübingenGermany
  2. 2.Technische Universität DortmundDortmundGermany
  3. 3.Friedrich-Alexander-Universität Erlangen-NürnbergErlangen-NürnbergGermany
  4. 4.Federal University of Rio Grande do SulPorto AlegreBrazil

Personalised recommendations