Abstract
To achieve a substantial reliability and safety level, it is imperative to provide electronic computing systems with appropriate mechanisms to tackle soft errors. This paper proposes a low-cost system-level soft error mitigation technique, which allocates the critical application function to a pool of specific general-purpose processor registers. Both the critical function and the register pool are automatically selected by a developed profiling tool. The proposed technique was validated through more than 400K fault injections considering a Linux kernel, different benchmarks, and two multicore Arm processor architectures (ARMv7-A and ARMv8-A). Results show that our technique significantly reduces the code size and performance overheads while providing soft error reliability improvement compared with the Triple Modular Redundancy (TMR) technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abich, G., Gava, J., Reis, R., Ost, L.: Soft error reliability assessment of neural networks on resource-constrained IoT devices. In: 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 1–4 (2020). https://doi.org/10.1109/ICECS49266.2020.9294951
Arm: ARMv8-A parameters in general-purpose registers (2020). https://developer.arm.com/docs/den0024/latest/the-abi-for-arm-64-bit-architecture/register-use-in-the-aarch64-procedure-call-standard/parameters-in-general-purpose-registers
Avižienis, Algirdas., Laprie, Jean-Claude, Randell, Brian: Dependability and its threats: a taxonomy. In: Jacquart, Renè (ed.) Building the Information Society. IIFIP, vol. 156, pp. 91–120. Springer, Boston, MA (2004). https://doi.org/10.1007/978-1-4020-8157-6_13
Azambuja, J.R., Lapolli, A., Altieri, M., Kastensmidt, F.L.: Evaluating the efficiency of data-flow software-based techniques to detect sees in microprocessors. In: 2011 12th Latin American Test Workshop (LATW), pp. 1–6 (2011). https://doi.org/10.1109/LATW.2011.5985914
Bandeira, V., Rosa, F., Reis, R., Ost, L.: Non-intrusive fault injection techniques for efficient soft error vulnerability analysis. In: 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC), pp. 123–128 (2019). https://doi.org/10.1109/VLSI-SoC.2019.8920378
Benso, A., Chiusano, S., Prinetto, P., Tagliaferri, L.: A C/C++ source-to-source compiler for dependable applications. In: Proceeding International Conference on Dependable Systems and Networks (DSN 2000), pp. 71–78 (2000). https://doi.org/10.1109/ICDSN.2000.857517
Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718
Bohman, M., James, B., Wirthlin, M.J., Quinn, H., Goeders, J.: Microcontroller compiler-assisted software fault tolerance. IEEE Trans. Nucl. Sci. 66(1), 223–232 (2019). https://doi.org/10.1109/TNS.2018.2886094
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54 (2009). https://doi.org/10.1109/IISWC.2009.5306797
Chielle, E., Barth, R.S., Lapolli, A.C., Kastensmidt, F.L.: Configurable tool to protect processors against SEE by software-based detection techniques. In: 2012 13th Latin American Test Workshop (LATW), pp. 1–6 (2012). https://doi.org/10.1109/LATW.2012.6261259
Chielle, Eduardo., Kastensmidt, Fernanda Lima, Cuenca-Asensi, Sergio: Overhead reduction in data-flow software-based fault tolerance techniques. In: Kastensmidt, Fernanda, Rech, Paolo (eds.) FPGAs and Parallel Architectures for Aerospace Applications, pp. 279–291. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-14352-1_18
Cho, H., Mirkhani, S., Cher, C.Y., Abraham, J.A., Mitra, S.: Quantitative evaluation of soft error injection techniques for robust system design. In: Proceedings of the 50th Annual Design Automation Conference (DAC 2013). Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2463209.2488859
Gava, J., Reis, R., Ost, L.: RAT: a lightweight system-level soft error mitigation technique. In: 2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC), pp. 165–170 (2020). https://doi.org/10.1109/VLSI-SOC46417.2020.9344080
Imperas: OVPsim Simulator (2020). http://www.ovpworld.org
Leveugle, R., Calvez, A., Maistri, P., Vanhauwaert, P.: Statistical fault injection: quantified error and confidence. In: 2009 Design, Automation Test in Europe Conference Exhibition, pp. 502–506 (2009). https://doi.org/10.1109/DATE.2009.5090716
Martinez-Alvarez, A.: Compiler-directed soft error mitigation for embedded systems. IEEE Trans. Depend. Secure Comput. 9(2), 159–172 (2012). https://doi.org/10.1109/TDSC.2011.54
Mukherjee, S.S., Emer, J., Reinhardt, S.K.: The soft error problem: an architectural perspective. In: 11th International Symposium on High-Performance Computer Architecture, pp. 243–247 (2005). https://doi.org/10.1109/HPCA.2005.37
Nicolescu, B., Velazco, R.: Detecting Soft Errors by a Purely Software Approach: Method, Tools and Experimental Results, pp. 39–51. Springer, Boston (2003). https://doi.org/10.1007/0-306-48709-8_4
Reis, G.A., Chang, J., August, D.I.: Automatic instruction-level software-only recovery. IEEE Micro 27(1), 36–47 (2007). https://doi.org/10.1109/MM.2007.4
Reis, G.A., Chang, J., Vachharajani, N., Rangan, R., August, D.I., Mukherjee, S.S.: Software-controlled fault tolerance. ACM Trans. Archit. Code Optim. 2(4), 366–396 (2005). https://doi.org/10.1145/1113841.1113843
Rodrigues, G.S., Kastensmidt, F.L., Reis, R., Rosa, F., Ost, L.: Analyzing the impact of using pthreads versus OpenMP under fault injection in ARM Cortex-A9 dual-core, pp. 1–6 (2016). https://doi.org/10.1109/RADECS.2016.8093180
Serrano-Cases, A., Morilla, Y., Martín-Holgado, P., Cuenca-Asensi, S., Martínez-Álvarez, A.: Nonintrusive automatic compiler-guided reliability improvement of embedded applications under proton irradiation. IEEE Trans. Nucl. Sci. 66(7), 1500–1509 (2019). https://doi.org/10.1109/TNS.2019.2912323
Shirvani, P.P., Saxena, N.R., McCluskey, E.J.: Software-implemented EDAC protection against SEUs. IEEE Trans. Reliab. 49(3), 273–284 (2000). https://doi.org/10.1109/24.914544
Snir, M., et al.: Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28(2), 129–173 (2014). https://doi.org/10.1177/1094342014522573
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Gava, J., Reis, R., Ost, L. (2021). RAT: A Lightweight Architecture Independent System-Level Soft Error Mitigation Technique. In: Calimera, A., Gaillardon, PE., Korgaonkar, K., Kvatinsky, S., Reis, R. (eds) VLSI-SoC: Design Trends. VLSI-SoC 2020. IFIP Advances in Information and Communication Technology, vol 621. Springer, Cham. https://doi.org/10.1007/978-3-030-81641-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-81641-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81640-7
Online ISBN: 978-3-030-81641-4
eBook Packages: Computer ScienceComputer Science (R0)