Skip to main content

RAT: A Lightweight Architecture Independent System-Level Soft Error Mitigation Technique

  • Conference paper
  • First Online:
VLSI-SoC: Design Trends (VLSI-SoC 2020)

Abstract

To achieve a substantial reliability and safety level, it is imperative to provide electronic computing systems with appropriate mechanisms to tackle soft errors. This paper proposes a low-cost system-level soft error mitigation technique, which allocates the critical application function to a pool of specific general-purpose processor registers. Both the critical function and the register pool are automatically selected by a developed profiling tool. The proposed technique was validated through more than 400K fault injections considering a Linux kernel, different benchmarks, and two multicore Arm processor architectures (ARMv7-A and ARMv8-A). Results show that our technique significantly reduces the code size and performance overheads while providing soft error reliability improvement compared with the Triple Modular Redundancy (TMR) technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abich, G., Gava, J., Reis, R., Ost, L.: Soft error reliability assessment of neural networks on resource-constrained IoT devices. In: 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 1–4 (2020). https://doi.org/10.1109/ICECS49266.2020.9294951

  2. Arm: ARMv8-A parameters in general-purpose registers (2020). https://developer.arm.com/docs/den0024/latest/the-abi-for-arm-64-bit-architecture/register-use-in-the-aarch64-procedure-call-standard/parameters-in-general-purpose-registers

  3. Avižienis, Algirdas., Laprie, Jean-Claude, Randell, Brian: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the Information Society. IIFIP, vol. 156, pp. 91–120. Springer, Boston, MA (2004). https://doi.org/10.1007/978-1-4020-8157-6_13

    Chapter  Google Scholar 

  4. Azambuja, J.R., Lapolli, A., Altieri, M., Kastensmidt, F.L.: Evaluating the efficiency of data-flow software-based techniques to detect sees in microprocessors. In: 2011 12th Latin American Test Workshop (LATW), pp. 1–6 (2011). https://doi.org/10.1109/LATW.2011.5985914

  5. Bandeira, V., Rosa, F., Reis, R., Ost, L.: Non-intrusive fault injection techniques for efficient soft error vulnerability analysis. In: 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC), pp. 123–128 (2019). https://doi.org/10.1109/VLSI-SoC.2019.8920378

  6. Benso, A., Chiusano, S., Prinetto, P., Tagliaferri, L.: A C/C++ source-to-source compiler for dependable applications. In: Proceeding International Conference on Dependable Systems and Networks (DSN 2000), pp. 71–78 (2000). https://doi.org/10.1109/ICDSN.2000.857517

  7. Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718

    Article  Google Scholar 

  8. Bohman, M., James, B., Wirthlin, M.J., Quinn, H., Goeders, J.: Microcontroller compiler-assisted software fault tolerance. IEEE Trans. Nucl. Sci. 66(1), 223–232 (2019). https://doi.org/10.1109/TNS.2018.2886094

    Article  Google Scholar 

  9. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009). https://doi.org/10.1109/IISWC.2009.5306797

  10. Chielle, E., Barth, R.S., Lapolli, A.C., Kastensmidt, F.L.: Configurable tool to protect processors against SEE by software-based detection techniques. In: 2012 13th Latin American Test Workshop (LATW), pp. 1–6 (2012). https://doi.org/10.1109/LATW.2012.6261259

  11. Chielle, Eduardo., Kastensmidt, Fernanda Lima, Cuenca-Asensi, Sergio: Overhead reduction in data-flow software-based fault tolerance techniques. In: Kastensmidt, Fernanda, Rech, Paolo (eds.) FPGAs and Parallel Architectures for Aerospace Applications, pp. 279–291. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-14352-1_18

    Chapter  Google Scholar 

  12. Cho, H., Mirkhani, S., Cher, C.Y., Abraham, J.A., Mitra, S.: Quantitative evaluation of soft error injection techniques for robust system design. In: Proceedings of the 50th Annual Design Automation Conference (DAC 2013). Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2463209.2488859

  13. Gava, J., Reis, R., Ost, L.: RAT: a lightweight system-level soft error mitigation technique. In: 2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC), pp. 165–170 (2020). https://doi.org/10.1109/VLSI-SOC46417.2020.9344080

  14. Imperas: OVPsim Simulator (2020). http://www.ovpworld.org

  15. Leveugle, R., Calvez, A., Maistri, P., Vanhauwaert, P.: Statistical fault injection: quantified error and confidence. In: 2009 Design, Automation Test in Europe Conference Exhibition, pp. 502–506 (2009). https://doi.org/10.1109/DATE.2009.5090716

  16. Martinez-Alvarez, A.: Compiler-directed soft error mitigation for embedded systems. IEEE Trans. Depend. Secure Comput. 9(2), 159–172 (2012). https://doi.org/10.1109/TDSC.2011.54

    Article  Google Scholar 

  17. Mukherjee, S.S., Emer, J., Reinhardt, S.K.: The soft error problem: an architectural perspective. In: 11th International Symposium on High-Performance Computer Architecture, pp. 243–247 (2005). https://doi.org/10.1109/HPCA.2005.37

  18. Nicolescu, B., Velazco, R.: Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results, pp. 39–51. Springer, Boston (2003). https://doi.org/10.1007/0-306-48709-8_4

  19. Reis, G.A., Chang, J., August, D.I.: Automatic instruction-level software-only recovery. IEEE Micro 27(1), 36–47 (2007). https://doi.org/10.1109/MM.2007.4

    Article  Google Scholar 

  20. Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I., Mukherjee, S.S.: Software-controlled fault tolerance. ACM Trans. Archit. Code Optim. 2(4), 366–396 (2005). https://doi.org/10.1145/1113841.1113843

    Article  Google Scholar 

  21. Rodrigues, G.S., Kastensmidt, F.L., Reis, R., Rosa, F., Ost, L.: Analyzing the impact of using pthreads versus OpenMP under fault injection in ARM Cortex-A9 dual-core, pp. 1–6 (2016). https://doi.org/10.1109/RADECS.2016.8093180

  22. Serrano-Cases, A., Morilla, Y., Martín-Holgado, P., Cuenca-Asensi, S., Martínez-Álvarez, A.: Nonintrusive automatic compiler-guided reliability improvement of embedded applications under proton irradiation. IEEE Trans. Nucl. Sci. 66(7), 1500–1509 (2019). https://doi.org/10.1109/TNS.2019.2912323

    Article  Google Scholar 

  23. Shirvani, P.P., Saxena, N.R., McCluskey, E.J.: Software-implemented EDAC protection against SEUs. IEEE Trans. Reliab. 49(3), 273–284 (2000). https://doi.org/10.1109/24.914544

    Article  Google Scholar 

  24. Snir, M., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2), 129–173 (2014). https://doi.org/10.1177/1094342014522573

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jonas Gava , Ricardo Reis or Luciano Ost .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gava, J., Reis, R., Ost, L. (2021). RAT: A Lightweight Architecture Independent System-Level Soft Error Mitigation Technique. In: Calimera, A., Gaillardon, PE., Korgaonkar, K., Kvatinsky, S., Reis, R. (eds) VLSI-SoC: Design Trends. VLSI-SoC 2020. IFIP Advances in Information and Communication Technology, vol 621. Springer, Cham. https://doi.org/10.1007/978-3-030-81641-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81641-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81640-7

  • Online ISBN: 978-3-030-81641-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics