Abstract
Technological advances allow the production of increasingly complex electronic systems. Nevertheless, technology and voltage scaling increased dramatically the susceptibility of new devices not only to Single Bit Upsets (SBU), but also to Multiple Bit Upsets (MBU). In safety critical applications, it is mandatory to provide fault-tolerant systems, providing high reliability while meeting applications requirements. The problem of reliability is particularly expressed within the memory which represents more than 80 % of systems on chips. To tackle this problem we propose a new memory reliability techniques referred to as DPSR: Double Parity Single Redundancy. DPSR is designed to enhance computing systems resilience to SBU and MBU. Based on a thorough fault injection experiments, DPSR shows promising results; It detects and corrects more than 99.6 % of encountered MBU and has an average time overhead of less than 3 %.
Similar content being viewed by others
References
Ziegler, J.F., Curtis, H.W., Muhlfeld, H.P., Montrose, C.J., Chin, B., Nicewicz, M., Russell, C.A., Wang, W.Y., Freeman, L.B., Hosier, P., LaFave, L.E., Walsh, J.L., Orro, J.M., Unger, G.J., Ross, J.M., O’Gorman, T.J., Messina, B., Sullivan, T.D., Sykes, A.J., Yourke, H., Enger, T.A., Tolat, V., Scott, T.S., Taber, A.H., Sussman, R.J., Klein, W.A., & Wahaus, C.W. (1996). Ibm experiments in soft fails in computer electronics (1978–1994). IBM Journal of Research and Development, 40(1), 3–18.
Dixit, A., & Wood, A. (2011). Impact of new technology on soft error rates. Reliability Physics Symposim (IRPS), 486–492.
Semiconductor industry association, international technology roadmap for semiconductors. http://www.itrs.net.
Rehman, S., Shafique, M., & Henkel, J. (2016). Reliable software for unreliable hardware: A cross layer perspective. https://doi.org/10.1007/978-3-319-25772-3.
Pintard, L. (2015). From safety analysis to experimental validation by fault injection - case of automotive embedded systems, Ph.D. thesis, University of Toulouse, France.
Avizienis, A., Laprie, J.C., Randell, B., & Landwehr, C. (2004). Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1(1), 11–33.
Hsueh, M.-C., Tsai, T.K., & Iyer, R.K. (1997). Fault injection techniques and tools. Computer, 30(4), 75–82.
Hazucha, P., & Svensson, C. (2001). Impact of cmos technology scaling on the atmospheric neutron soft error rate. IEEE Transactions on Nuclear Science, 47, 2586–2594. https://doi.org/10.1109/23.903813.
Velazco, R., Fouillat, P., & Reis, R. (2007). Radiation effects on embedded systems. Berlin: Springer.
Moore, G.E. Creaming more components onto integrated circuits. Electronics 38(8).
Radaelli, D., Puchner, H., Wong, S., & Daniel, S. (2005). Investigation of multi-bit upsets in a 150 nm technology sram device. IEEE Transactions on Nuclear Science, 52(6), 2433–2437.
Borkar, S. (2005). Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro, 25(6), 10–16. https://doi.org/10.1109/MM.2005.110.
Hartman, A.S., Thomas, D.E., & Meyer, B.H. (2010). A case for lifetime-aware task mapping in embedded chip multiprocessors. In 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) (pp. 145–154).
Zhu, D., & Aydin, H. (2009). Reliability-aware energy management for periodic real-time tasks. IEEE Transactions on Computers, 58(10), 1382–1397.
FIDES-Group, Reliability Methodology for Electronic Systems (2010).
Hardie, F.H., & Suhocki, R.J. (1967). Design and use of fault simulation for saturn computer design. IEEE Transactions on Electronic Computers EC, 16(4), 412–429. https://doi.org/10.1109/PGEC.1967.264644.
Kooli, M., Bosio, A., Benoit, P., & Torres, L. (2015). Software testing and software fault injection. In 2015 10th International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS) (pp. 1–6).
Kooli, M., & Natale, G.D. (2014). A survey on simulation-based fault injection tools for complex systems. In 2014 9th IEEE International Conference on Design Technology of Integrated Systems in Nanoscale Era (DTIS) (pp. 1–6).
Anceau, S., Bleuet, P., Clédière, J., Maingault, L., luc Rainard, J., & Tucoulou, R. (2017). Nanofocused x-ray beam to reprogram secure circuits. In Cryptographic Hardware and Embedded Systems – CHES 2017 of Lecture Notes in Computer Science. https://doi.org/10.1007/978-3-319-66787-4_9, (Vol. 10529 pp. 175–188): Springer.
Abbasitabar, H., Zarandi, H.R., & Salamat, R. (2012). Susceptibility analysis of leon3 embedded processor against multiple event transients and upsets. In 2012 IEEE 15th International Conference on Computational Science and Engineering. https://doi.org/10.1109/ICCSE.2012.81 (pp. 548–553).
Benjamin, P., Erraguntla, M., Delen, D., & Mayer, R. (1998). Simulation modeling at multiple levels of abstraction. In 1998 Winter Simulation Conference. Proceedings (Cat. No.98CH36274). https://doi.org/10.1109/WSC.1998.745013, (Vol. 1 pp. 391–398).
Gajski, D.D., & Kuhn, R.H. (1983). New vlsi tools. Computer, 16(12), 11–14. https://doi.org/10.1109/MC.1983.1654264.
Accellera, Systemc standard download page (2011). http://www.accellera.org/downloads/standards/systemc.
Kanawati, G.A., Kanawati, N.A., & Abraham, J.A. (1995). Ferrari: a flexible software-based fault and error injection system. IEEE Transactions on Computers, 44(2), 248–260.
Sanches, B.P., Basso, T., & Moraes, R. (2011). J-swfit: A java software fault injection tool. In 2011 5th Latin-American Symposium on Dependable Computing (pp. 106–115).
Li, D., Vetter, J.S., & Yu, W. (2012). Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. https://doi.org/10.1109/SC.2012.29 (pp. 1–11).
Hari, S.K.S., Tsai, T., Stephenson, M., Keckler, S.W., & Emer, J. (2017). Sassifi: An architecture-level fault injection tool for gpu application resilience evaluation. In 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). https://doi.org/10.1109/ISPASS.2017.7975296 (pp. 249–258).
Kaliorakis, M., Tselonis, S., Chatzidimitriou, A., Foutris, N., & Gizopoulos, D. (2015). Differential fault injection on microarchitectural simulators. In 2015 IEEE international symposium on Workload characterization (IISWC).
Cheng, E., Mirkhani, S., Szafaryn, L.G., Cher, C., Cho, H., Skadron, K., Stan, M.R., Lilja, K., Abraham, J.A., Bose, P., & Mitra, S. CLEAR: cross-layer exploration for architecting resilience - combining hardware and software techniques to tolerate soft errors in processor cores, 1604.03062.
Ozdemir, S., Sinha, D., Memik, G., Adams, J., & Zhou, H. (2006). Yield-aware cache architectures. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’06). https://doi.org/10.1109/MICRO.2006.52 (pp. 15–25).
Quach, N. (2000). High availability and reliability in the itanium processor. IEEE Micro, 20(5), 61–69. https://doi.org/10.1109/40.877951.
Alouani, I., Niar, S., Kurdahi, F., & Abid, M. (2012). Parity-based mono-copy cache for low power consumption and high reliability. In 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP). https://doi.org/10.1109/RSP.2012.6380689 (pp. 44–48).
Qureshi, M.K., & Chishti, Z. (2013). Operating secded-based caches at ultra-low voltage with flair. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). https://doi.org/10.1109/DSN.2013.6575314 (pp. 1–11).
Chen, C.L., & Hsiao, M. Y. (1984). Error-correcting codes for semiconductor memory applications: A state-of-the-art review. IBM J. Res. Dev., 28(2), 124–134. https://doi.org/10.1147/rd.282.0124.
Saiz-Adalid, L., Reviriego, P., Gil, P., Pontarelli, S., & Maestro, J. A. (2015). Mcu tolerance in srams through low-redundancy triple adjacent error correction. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 23(10), 2332–2336.
Kim, J., Hardavellas, N., Mai, K., Falsafi, B., & Hoe, J. (2007). Multi-bit error tolerant caches using two-dimensional error coding. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007) (pp. 197–209).
Kim, J., Hardavellas, N., Mai, K., Falsafi, B., & Hoe, J. (2007). Multi-bit error tolerant caches using two-dimensional error coding. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007). https://doi.org/10.1109/MICRO.2007.19 (pp. 197–209).
Bagatin, M., Gerardin, S., Paccagnella, A., Andreani, C., Gorini, G., & Frost, C. (2012). Temperature dependence of neutron-induced soft errors in srams. Microelectronics Reliability, 52(1), 289– 293.
Kagiyama, Y., Okumura, S., Yanagida, K., Yoshimoto, S., Nakata, Y., Izumi, S., Kawaguchi, H., & Yoshimoto, M. (2012). Bit error rate estimation in sram considering temperature fluctuation. In Thirteenth International Symposium on Quality Electronic Design (ISQED) (pp. 516–519).
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., & Brown, R.B. (2001). Mibench: A free, commercially representative embedded benchmark suite, 3–14.
Carlson, T.E., Heirman, W., & Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (pp. 52:1–52:12).
Chabot, A., Alouani, I., Niar, S., & Nouacer, R. (2018). A comprehensive fault injection strategy for embedded systems reliability assessment. In 2018 International Symposium on Rapid System Prototyping (RSP). https://doi.org/10.1109/RSP.2018.8631986 (pp. 22–28).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chabot, A., Alouani, I., Nouacer, R. et al. A Memory Reliability Enhancement Technique for Multi Bit Upsets. J Sign Process Syst 93, 439–459 (2021). https://doi.org/10.1007/s11265-020-01603-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-020-01603-5