Soft Error Mitigation in Soft-Core Processors

  • Antonio Martínez-Álvarez
  • Sergio Cuenca-Asensi
  • Felipe Restrepo-Calle

Abstract

This chapter aims to present different approaches and techniques available in literature regarding the fault mitigation on soft-core processors, with an especial emphasis on those ones involving hardware/software hybrid-based solutions.

Keywords

Fault Injection Transient Fault Control Flow Graph Register Transfer Level Very Long Instruction Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Baumann RC (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Device Mater Reliab 5:305–316. doi: 10.1109/TDMR.2005.853449 CrossRefGoogle Scholar
  2. 2.
    Shivakumar P, Kistler M, Keckler SW, Burger D, Alvisi L (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of the international conference on dependable systems and networks, IEEE Computer Society, pp 389–398. doi: 10.1109/DSN.2002.1028924
  3. 3.
    Benedetto JM, Eaton PH, Mavis DG, Gadlage M, Turflinger T (2006) Digital single event transient trends with technology node scaling. IEEE Trans Nucl Sci 53:3462–3465. doi: 10.1109/TNS.2006.886044 CrossRefGoogle Scholar
  4. 4.
    Perry F, Mackey L, Reis GA, Ligatti J, August DI, Walker D. (2007) Fault-tolerant typed assembly language. In: Proceedings of the 2007 ACM SIGPLAN conference on programming language design and implementation—PLDI’07. ACM Press, New York, p 42. doi: 10.1145/1250734.1250741
  5. 5.
    Karnik T, Hazucha P (2004) Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans Dependable Secure Comput 1:128–143. doi: 10.1109/TDSC.2004.14 CrossRefGoogle Scholar
  6. 6.
    Edwards R, Dyer C, Normand E (2004) Technical standard for atmospheric radiation single event effects, (SEE) on avionics electronics. In: Proceedings of the 2004 IEEE radiation effects data workshop (IEEE Cat. No. 04TH8774), IEEE, pp 1–5. doi: 10.1109/REDW.2004.1352895
  7. 7.
    Barth JL, Dyer CS, Stassinopoulos EG (2003) Space, atmospheric, and terrestrial radiation environments. IEEE Trans Nucl Sci 50:466–482. doi: 10.1109/TNS.2003.813131 CrossRefGoogle Scholar
  8. 8.
    Michalak SE, Harris KW, Hengartner NW, Takala BE, Wender SA (2005) Predicting the number of fatal soft errors in Los Alamos national laboratory’s ASC Q supercomputer. IEEE Trans Device Mater Reliab 5:329–335. doi: 10.1109/TDMR.2005.855685 CrossRefGoogle Scholar
  9. 9.
    Agency ES (1993) The radiation design handbook ESA PSS-01-609. European Space Agency technical reportGoogle Scholar
  10. 10.
    Fulton R (2014) Airborne electronic hardware design assurance: a practitioner’s guide to RTCA/DO-254. CRC Press, Boca RatonGoogle Scholar
  11. 11.
    Commission IE (2006) IEC/TS 62396-1. Technical report, International Electrotechnical CommissionGoogle Scholar
  12. 12.
    Council AE (2003) Stress test qualification for integrated circuits, AEC-Q100-Rev-F.2. Technical reportGoogle Scholar
  13. 13.
    AEC-Q100 (1994) Stress test qualification for integrated circuits for automotive industryGoogle Scholar
  14. 14.
    Corporation A (2010) RTAX-S/SL and RTAX-DSP radiation-tolerant FPGAs. Data Sheet Rev 13Google Scholar
  15. 15.
    Xilinx Inc. (2010) Radiation-hardened, space-grade Virtex-5QV FPGA data sheet: DC and switching characteristics. Data sheet DS692 (v1.0.1)Google Scholar
  16. 16.
    Kubalík P, Kubátová H (2008) Dependable design technique for system-on-chip. J Syst Archit 54:452–464. doi: 10.1016/j.sysarc.2007.09.003 CrossRefGoogle Scholar
  17. 17.
    Kastensmidt FL, Carro L, Reis R (2006) Fault-tolerance techniques for SRAM-based FPGAs (frontiers in electronic testing). Springer, SecaucusGoogle Scholar
  18. 18.
    Mentor Graphics Corporation (2010) Advanced FPGA synthesis: precision rad-tolerant. Data sheet 1028010Google Scholar
  19. 19.
    Inc. S (2010) Synopsys FPGA synthesis synplify pro reference manual. Technical report, Actel editionGoogle Scholar
  20. 20.
    Xilinx Inc. (2009) Aerospace and defense: Xilinx TMRtool. Technical reportGoogle Scholar
  21. 21.
    Huang K, Yu H, Li X (2011) Cross-layer optimized placement and routing for FPGA soft error mitigation. In: Proceedings of the 2011 design, automation test in Europe conference exhibition, IEEE, pp 1–6. doi: 10.1109/DATE.2011.5763018
  22. 22.
    Sterpone L, Violante M (2006) A new reliability-oriented place and route algorithm for SRAM-based FPGAs. IEEE Trans Comput 55:732–744. doi: 10.1109/TC.2006.82 CrossRefGoogle Scholar
  23. 23.
    De Lima Kastensmidt FG, Neuberger G, Hentschke RF, Carro L, Reis R (2004) Designing fault-tolerant techniques for SRAM-based FPGAs. IEEE Des Test Comput 21:552–562. doi: 10.1109/MDT.2004.85 CrossRefGoogle Scholar
  24. 24.
    Nicolaidis M, Achouri N, Boutobza S (2003) Dynamic data-bit memory built-in self-repair. In: Proceedings of the international conference on computer aided design ICCAD-2003, pp 588–594. doi: 10.1109/ICCAD.2003.1257870
  25. 25.
    Lima F, Carro L, Reis R (2003) Designing fault tolerant systems into SRAM-based FPGAs. In: Proceedings of the 2003 design automation conference (IEEE Cat. No. 03CH37451), IEEE, pp 650–655. doi: 10.1109/DAC.2003.1219099
  26. 26.
    De Lima FG, Cota E, Carro L, Lubaszewski M, Reis R, Velazco R, et al (2000) Designing a radiation hardened 8051-like micro-controller. In: Proceedings of the 13th symposium on integrated circuits and systems design (Cat. No. PR00843), IEEE Computer Society, pp 255–260. doi: 10.1109/SBCCI.2000.876039
  27. 27.
    Nicolaidis M (2001) Soft errors in modern electronic systems, vol 41. Chapter 8. Front electron testing, 1st edn. Springer, New YorkGoogle Scholar
  28. 28.
    Neuberger G, de Lima Kastensmidt FG, Reis R (2005) An automatic technique for optimizing Reed-Solomon codes to improve fault tolerance in memories. In :Proceedings of the IEEE Des Test Comput 22:50–8. doi: 10.1109/MDT.2005.2
  29. 29.
    Hentschke R, Marques F, Lima F, Carro L, Susin A, Reis R (2002) Analyzing area and performance penalty of protecting different digital modules with Hamming code and triple modular redundancy. In: Proceedings of the 15th symposium on integrated circuits and systems design, IEEE Computer Society, pp 95–100. doi: 10.1109/SBCCI.2002.1137643
  30. 30.
    Calin T, Nicolaidis M, Velazco R (1996) Upset hardened memory design for submicron CMOS technology. IEEE Trans Nucl Sci 43:2874–2878. doi: 10.1109/23.556880 CrossRefGoogle Scholar
  31. 31.
    Von Neumann J (1956) Probabilistic logics and synthesis of reliable organisms from unreliable components. In: Shannon C, McCarthy J (eds) Automata studies. Princeton University Press, Princeton, pp 43–98Google Scholar
  32. 32.
    Mahmood A, McCluskey EJ (1988) Concurrent error detection using watchdog processors—a survey. IEEE Trans Comput 37:160–174. doi: 10.1109/12.2145 CrossRefGoogle Scholar
  33. 33.
    Austin TM (1999) DIVA: a reliable substrate for deep submicron microarchitecture design. In: Proceedings of the 32nd annual ACM/IEEE international symposium on microarchitecture, MICRO-32, IEEE Computer Society, pp 196–207. doi: 10.1109/MICRO.1999.809458
  34. 34.
    Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8:300–304. doi: 10.1137/0108018 CrossRefMathSciNetMATHGoogle Scholar
  35. 35.
    Johnson BW (1989) Design and analysis of fault-tolerant systems for industrial applications. In: Görke W, Sörensen H (eds) Fault-tolerant computer systems, vol 214, pp 57–73. doi: 10.1007/978-3-642-75002-1_5
  36. 36.
    Martínez-Álvarez A, Restrepo-Calle F, Vivas Tejuelo LA, Cuenca-Asensi S (2013) Fault tolerant embedded systems design by multi-objective optimization. Expert Syst Appl 40:6813–6822CrossRefGoogle Scholar
  37. 37.
    Nicolaidis M (1999) Time redundancy based soft-error tolerance to rescue nanometer technologies. In: Proceedings of the 17th IEEE VLSI test symposium (Cat. No. PR00146), IEEE Computer Society, pp 86–94. doi: 10.1109/VTEST.1999.766651
  38. 38.
    Goloubeva O, Rebaudengo M, Reorda MS, Violante M (2006) Hardening the control flow. In: Software-implemented hardware fault tolerance. Springer, New York, pp 63–116. doi: 10.1007/0-387-32937-4
  39. 39.
    Benso A, Carlo SD, Natale GD, Prinetto P, Tagliaferri L (2001) Control-flow checking via regular expressions. In: Proceedings of the 10th Asian test symposium, IEEE, pp 299–303. doi: 10.1109/ATS.2001.990300
  40. 40.
    Hayes JP, Murray BT. (n.d.) Low-cost on-line fault detection using control flow assertions. In: Proceedings of the 9th IEEE on-line test symposium 2003. IOLTS 2003, IEEE Computer Society, pp 137–143. doi: 10.1109/OLT.2003.1214380
  41. 41.
    Goloubeva O, Rebaudengo M, Reorda MS, Violante M (2003) Soft-error detection using control flow assertions. In: Proceedings of the 16th IEEE symposium Comput. Arith., IEEE Computer Society, pp 581–588. doi: 10.1109/DFTVS.2003.1250158
  42. 42.
    Oh N, Shirvani PP, McCluskey EJ (2002) Control-flow checking by software signatures. IEEE Trans Reliab 51:111–122. doi: 10.1109/24.994926 CrossRefGoogle Scholar
  43. 43.
    Avizienis A (1985) The N-version approach to fault-tolerant software. IEEE Trans Softw Eng SE-11:1491–1501. doi: 10.1109/TSE.1985.231893 CrossRefGoogle Scholar
  44. 44.
    Jochim M (2002) Detecting processor hardware faults by means of automatically generated virtual duplex systems. In: Proceedings of the international conference on dependable systems and networks, IEEE Computer Society, pp 399–408. doi: 10.1109/DSN.2002.1028925
  45. 45.
    Oh N, Mitra S, McCluskey EJ (2002) ED/sup 4/I: error detection by diverse data and duplicated instructions. IEEE Trans Comput 51:180–199. doi: 10.1109/12.980007 CrossRefGoogle Scholar
  46. 46.
    Oh N, McCluskey EJ (2002) Error detection by selective procedure call duplication for low energy consumption. IEEE Trans Reliab 51:392–402. doi: 10.1109/TR.2002.804735 CrossRefGoogle Scholar
  47. 47.
    Rebaudengo M, Sonza Reorda M, Torchiano M, Violante M (1999) Soft-error detection through software fault-tolerance techniques. In: Proceedings of the 1999 IEEE international symposium on defect fault tolerance VLSI Systems, IEEE Computer Society, pp 210–218. doi: 10.1109/DFTVS.1999.802887
  48. 48.
    Rebaudengo M, Reorda MS, Violante M, Torchiano M (2001) A source-to-source compiler for generating dependable software. In: Proceedings of the 1st IEEE international workshop on source code analysis and manipulation, IEEE Computer Society, pp 33–42. doi: 10.1109/SCAM.2001.972664
  49. 49.
    Reis GA, Chang J, Vachharajani N, Rangan R, August DI (2005) SWIFT: software implemented fault tolerance. In: Proceedings of the international symposium on code generation and optimization, IEEE, pp 243–254. doi: 10.1109/CGO.2005.34
  50. 50.
    Chang J, Reis GA, August DI (2006) Automatic instruction-level software-only recovery. In: Proceedings of the international conference on dependable systems and networks, IEEE, pp 83–92. doi: 10.1109/DSN.2006.15
  51. 51.
    Reis GA, Chang J, Vachharajani N, Rangan R, August DI, Mukherjee SS (2005) Software-controlled fault tolerance. ACM Trans Archit Code Optim 2:366–396. doi: 10.1145/1113841.1113843 CrossRefGoogle Scholar
  52. 52.
    Reis GA, Chang J, Vachharajani N, Rangan R, August DI, Mukherjee SS (2005) Design and evaluation of hybrid fault-detection systems. In: Proceedings of the 32nd international symposium on computer architecture, IEEE, pp 148–159. doi: 10.1109/ISCA.2005.21
  53. 53.
    Mukherjee SS, Kontz M, Reinhardt SK (2002) Detailed design and evaluation of redundant multi-threading alternatives. In: Proceedings of the 29th annual International symposium on computer architecture, IEEE Computer Society, pp 99–110. doi: 10.1109/ISCA.2002.1003566
  54. 54.
    Bernardi P, Bolzani LMV, Rebaudengo M, Reorda MS, Vargas FL, Violante M (2006) A new hybrid fault detection technique for systems-on-a-chip. IEEE Trans Comput 55:185–198. doi: 10.1109/TC.2006.15 CrossRefGoogle Scholar
  55. 55.
    Bernardi P, Sterpone L, Violante M, Portela-Garcia M (2006) Hybrid fault detection technique: a case study on Virtex-II Pro’s PowerPC 405. IEEE Trans Nucl Sci 53:3550–3557. doi: 10.1109/TNS.2006.886221 CrossRefGoogle Scholar
  56. 56.
    Bernardi P, Bolzani Poehls L, Grosso M, Sonza RM (2010) A hybrid approach for detection and correction of transient faults in SoCs. IEEE Trans Dependable Secure Comput 7:439–445. doi: 10.1109/TDSC.2010.33 CrossRefGoogle Scholar
  57. 57.
    Bernardi P, Bolzani L, Reorda MS (2007) A hybrid approach to fault detection and correction in SoCs. In: Proceedings of the 13th IEEE international on-line test symposium (IOLTS 2007), IEEE, pp 107–112. doi: 10.1109/IOLTS.2007.8
  58. 58.
    Rebaudengo M, Reorda MS, Violante M, Nicolescu B, Velazco R (2002) Coping with SEUs/SETs in microprocessors by means of low-cost solutions: a comparison study. IEEE Trans Nucl Sci 49:1491–1495. doi: 10.1109/TNS.2002.1039689 CrossRefGoogle Scholar
  59. 59.
    Azambuja JR, Lapolli Â, Rosa L, Kastensmidt FL (2011) Detecting SEEs in microprocessors through a non-intrusive hybrid technique. IEEE Trans Nucl Sci 58:993–1000. doi: 10.1109/TNS.2011.2109398 CrossRefGoogle Scholar
  60. 60.
    Azambuja JR, Souza F, Rosa L, Kastensmidt F (2010) Non-intrusive hybrid signature-based technique to detect SEU and set faults in microprocessors. In: Proceedings of the 11th European conference on radiation and its effects on components and systems RADECS 2010, LängenfeldGoogle Scholar
  61. 61.
    Li X, Gaudiot J-L (2009) Tolerating radiation-induced transient faults in modern processors. Int J Parallel Prog 38:85–116. doi: 10.1007/s10766-009-0114-9 CrossRefGoogle Scholar
  62. 62.
    Scholzel M (2010) HW/SW co-detection of transient and permanent faults with fast recovery in statically scheduled data paths. In: Proceedings of the 2010 design automation and test in Europe conference exhibition (DATE 2010), IEEE, pp 723–728. doi: 10.1109/DATE.2010.5456957
  63. 63.
    Lee J, Shrivastava A (2010) A compiler-microarchitecture hybrid approach to soft error reduction for register files. IEEE Trans Comput Des Integr Circuits Syst 29:1018–1027. doi: 10.1109/TCAD.2010.2049050 CrossRefGoogle Scholar
  64. 64.
    Parra L, Lindoso A, Portela M, Entrena L, Restrepo-Calle F, Cuenca-Asensi S et al (2014) Efficient mitigation of data and control flow errors in microprocessors. IEEE Trans Nucl Sci 61:1590–1596. doi: 10.1109/TNS.2014.2310492 CrossRefGoogle Scholar
  65. 65.
    Martínez-Álvarez A, Restrepo-Calle F, Cuenca-Asensi S, Reyneri LM, Lindoso A, Entrena L (2012) A hybrid technique for soft error mitigation in interrupt-driven applications. In: Proceedings of the 13th European conference on radiation and its effects components and systems RADECS 2012, BiarritzGoogle Scholar
  66. 66.
    Altieri M, Becker J, Kastensmidt FL (2013) HETA: hybrid error-detection technique using assertions. IEEE Trans Nucl Sci 60:2805–2812. doi: 10.1109/TNS.2013.2246798 CrossRefGoogle Scholar
  67. 67.
    Portela-Garcia M, Grosso M, Gallardo-Campos M, Sonza Reorda M, Entrena L, Garcia-Valderas M et al (2012) On the use of embedded debug features for permanent and transient fault resilience in microprocessors. Microprocess Microsyst 36:334–343. doi: 10.1016/j.micpro.2012.02.013 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Antonio Martínez-Álvarez
    • 1
  • Sergio Cuenca-Asensi
    • 1
  • Felipe Restrepo-Calle
    • 2
  1. 1.Department of Computer TechnologyUniversity of AlicanteAlicanteSpain
  2. 2.Department of Systems and Industrial EngineeringUniversidad Nacional de ColombiaBogotáColombia

Personalised recommendations