Skip to main content
Log in

Recovery Time and Fault Tolerance Improvement for Circuits mapped on SRAM-based FPGAs

  • Published:
Journal of Electronic Testing Aims and scope Submit manuscript

Abstract

The rapid adoption of FPGA-based systems in space and avionics demands dependability rules from the design to the layout phases to protect against radiation effects. Triple Modular Redundancy is a widely used fault tolerance methodology to protect circuits against radiation-induced Single Event Upsets implemented on SRAM-based FPGAs. The accumulation of SEUs in the configuration memory can cause the TMR replicas to fail, requiring a periodic write-back of the configuration bit-stream. The associated system downtime due to scrubbing and the probability of simultaneous failures of two TMR domains are increasing with growing device densities. We propose a methodology to reduce the recovery time of TMR circuits with increased resilience to Cross-Domain Errors. Our methodology consists of an automated tool-flow for fine-grain error detection, error flags convergence and non-overlapping domain placement. The fine-grain error detection logic identifies the faulty domain using gate-level functions while the error flag convergence logic reduces the overwhelming number of flag signals. The non-overlapping placement enables selective domain reconfiguration and greatly reduces the number of Cross-Domain Errors. Our results demonstrate an evident reduction of the recovery time due to fast error detection time and selective partial reconfiguration of faulty domains. Moreover, the methodology drastically reduces Cross-Domain Errors in Look-Up Tables and routing resources. The improvements in recovery time and fault tolerance are achieved at an area overhead of a single LUT per majority voter in TMR circuits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Azambuja JR, Sousa F, Rosa L, Kastensmidt FL (2009) “Evaluating large grain TMR and selective partial reconfiguration for soft error mitigation in SRAM-based FPGAs,”. In On-Line Testing Symposium, Sesimbra, pp 101–106

    Google Scholar 

  2. Berg M, Poivey C, Petrick D et al (2008) "Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: Design, test, and analysis,". IEEE Trans on Nucl Sci 55(4):2259–2266

    Article  Google Scholar 

  3. Boost C++ libraries. [Online]. www.boost.org

  4. Carmichael C (2001) “Triple module redundancy design techniques for Virtex FPGAs,” Xilinx Inc., XAPP197 (V1.0), November

  5. Carmichael C, Caffrey M, Salazar A (2000) Correcting single event upsets through virtex partial reconfiguration XAPP216 v1.0

  6. Cetin E, Diessel O (2012) "Guaranteed Fault Recovery Time for FPGA- based TMR Circuits Employing Partial Reconfiguration," in 2nd International Workshop on Computing in Heterogeneous. Autonomous ‘N’ Goal-oriented Environments, San Francisco

    Google Scholar 

  7. Champman K, Jones L (2009) SEU strategies for Virtex-5 devices, Xilinx Inc., XAPP864

  8. Constraints Guide. Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_/cgd.pdf

  9. Fleetwood DM, Peter S, Winokurb, Doddb PE (2000) “An overview of radiation effects on electronics in the space telecommunications environment,”. Microelectron Reliab 40(1):17–26

    Article  Google Scholar 

  10. Iturbe X, Benkrid K, Torrego R, Ebrahim A, Arslan T (2012) “Online clock routing in Xilinx FPGAs for High performance and reliability,”. In IEEE Adaptive Hardware Systems, Erlangen, pp 85–91

    Google Scholar 

  11. Jedec Standard (2006) “Measurement and reporting of alpha particle and terrestrial cosmi ray-induced soft errors in semiconductor devices,” Tech. Rep JESD89A, [Online]. http://www.jedec.org/sites/default/?les/docs/jesd89a.pdf

  12. Nazar GL (2013) “Fine-Grain Error Detection Techniques for Fast Repair of FPGAs”, Phd Dissertation, UFRGS, [Online]. http://www.lume.ufrgs.br/bitstream/handle/10183/77746/000897120.pdf?sequence=1

  13. Nazar GL, Carro L (2012) “Exploiting Modified Placement and Hardwired Rescources to Provide High Reliability in FPGAs,”. In 20th International Symposium on Field-Programmable Custom Computing Machines (FCCM), Toronto, pp 149–152

    Google Scholar 

  14. Nicolaidis M (2011) Soft Errors in Modern Electronic Systems. Springer, US

    Book  Google Scholar 

  15. Open Cores Repository. [Online]. opencores.org

  16. Pilotto C, Azambuja JR, Kastensmidt LF (2008) “Synchronizing triple modular redundant designs in dynamic partial reconfiguration applications,”. In ACM 21st annual symposium on Integrated circuits and system design (SBCCI’08), Gramado, pp 199–204

    Google Scholar 

  17. Quinn H, Morgan K, Graham P et al (2007) Domain Crossing Errors: Limitations on Single Device Triple Modular Redundancy Circuits in Xilinx FPGAs. IEEE Trans Nucl Sci 54(6):2037–2043

    Article  Google Scholar 

  18. Reorda MS, Sterpone L, Ullah A (2013) “An error-detection and self-repairing for dynamically and partially reconfigurable systems,”. In IEEE Europen Testing Symposium, Avignon, pp 1–7

    Google Scholar 

  19. Sellers B, Wirthlin M, Kalb J (2009) “FPGA partial reconfiguration via configuration scrubbing,”. In Field Programmable Logic and Applications, Prague, pp 99–104

    Google Scholar 

  20. Steiner N, Wood A, Shojaei H et al (2011) “Torc: Towards an Open-Source Tool Flow,”. In Proceeding of 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, pp 41–44

    Google Scholar 

  21. Sterpone L, Ullah A (2013) "On the optimal reconfiguration times of TMR circuits on SRAM based FPGAs,". In NASA/ESA Adaptive Hardware Systems, Torino, pp 9–14

    Google Scholar 

  22. Sterpone L, Violante M (December 2005) A new analytical approach to estimate the effects of SEU in TMR architectures implemented through SRAM based FPGAs. IEEE Trans Nucl Sci 52(6):2217–2223

  23. Virtex-5 configuration user guide. Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/user_guides/ug191.pdf

  24. Virtex-5 FPGA User Guide. (2012) Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/user_guides/ug190.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anees Ullah.

Additional information

Responsible Editor: C.-W. Wu

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ullah, A., Sterpone, L. Recovery Time and Fault Tolerance Improvement for Circuits mapped on SRAM-based FPGAs. J Electron Test 30, 425–442 (2014). https://doi.org/10.1007/s10836-014-5463-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10836-014-5463-7

Keywords

Navigation