Abstract
The rapid adoption of FPGA-based systems in space and avionics demands dependability rules from the design to the layout phases to protect against radiation effects. Triple Modular Redundancy is a widely used fault tolerance methodology to protect circuits against radiation-induced Single Event Upsets implemented on SRAM-based FPGAs. The accumulation of SEUs in the configuration memory can cause the TMR replicas to fail, requiring a periodic write-back of the configuration bit-stream. The associated system downtime due to scrubbing and the probability of simultaneous failures of two TMR domains are increasing with growing device densities. We propose a methodology to reduce the recovery time of TMR circuits with increased resilience to Cross-Domain Errors. Our methodology consists of an automated tool-flow for fine-grain error detection, error flags convergence and non-overlapping domain placement. The fine-grain error detection logic identifies the faulty domain using gate-level functions while the error flag convergence logic reduces the overwhelming number of flag signals. The non-overlapping placement enables selective domain reconfiguration and greatly reduces the number of Cross-Domain Errors. Our results demonstrate an evident reduction of the recovery time due to fast error detection time and selective partial reconfiguration of faulty domains. Moreover, the methodology drastically reduces Cross-Domain Errors in Look-Up Tables and routing resources. The improvements in recovery time and fault tolerance are achieved at an area overhead of a single LUT per majority voter in TMR circuits.
Similar content being viewed by others
References
Azambuja JR, Sousa F, Rosa L, Kastensmidt FL (2009) “Evaluating large grain TMR and selective partial reconfiguration for soft error mitigation in SRAM-based FPGAs,”. In On-Line Testing Symposium, Sesimbra, pp 101–106
Berg M, Poivey C, Petrick D et al (2008) "Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: Design, test, and analysis,". IEEE Trans on Nucl Sci 55(4):2259–2266
Boost C++ libraries. [Online]. www.boost.org
Carmichael C (2001) “Triple module redundancy design techniques for Virtex FPGAs,” Xilinx Inc., XAPP197 (V1.0), November
Carmichael C, Caffrey M, Salazar A (2000) Correcting single event upsets through virtex partial reconfiguration XAPP216 v1.0
Cetin E, Diessel O (2012) "Guaranteed Fault Recovery Time for FPGA- based TMR Circuits Employing Partial Reconfiguration," in 2nd International Workshop on Computing in Heterogeneous. Autonomous ‘N’ Goal-oriented Environments, San Francisco
Champman K, Jones L (2009) SEU strategies for Virtex-5 devices, Xilinx Inc., XAPP864
Constraints Guide. Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_/cgd.pdf
Fleetwood DM, Peter S, Winokurb, Doddb PE (2000) “An overview of radiation effects on electronics in the space telecommunications environment,”. Microelectron Reliab 40(1):17–26
Iturbe X, Benkrid K, Torrego R, Ebrahim A, Arslan T (2012) “Online clock routing in Xilinx FPGAs for High performance and reliability,”. In IEEE Adaptive Hardware Systems, Erlangen, pp 85–91
Jedec Standard (2006) “Measurement and reporting of alpha particle and terrestrial cosmi ray-induced soft errors in semiconductor devices,” Tech. Rep JESD89A, [Online]. http://www.jedec.org/sites/default/?les/docs/jesd89a.pdf
Nazar GL (2013) “Fine-Grain Error Detection Techniques for Fast Repair of FPGAs”, Phd Dissertation, UFRGS, [Online]. http://www.lume.ufrgs.br/bitstream/handle/10183/77746/000897120.pdf?sequence=1
Nazar GL, Carro L (2012) “Exploiting Modified Placement and Hardwired Rescources to Provide High Reliability in FPGAs,”. In 20th International Symposium on Field-Programmable Custom Computing Machines (FCCM), Toronto, pp 149–152
Nicolaidis M (2011) Soft Errors in Modern Electronic Systems. Springer, US
Open Cores Repository. [Online]. opencores.org
Pilotto C, Azambuja JR, Kastensmidt LF (2008) “Synchronizing triple modular redundant designs in dynamic partial reconfiguration applications,”. In ACM 21st annual symposium on Integrated circuits and system design (SBCCI’08), Gramado, pp 199–204
Quinn H, Morgan K, Graham P et al (2007) Domain Crossing Errors: Limitations on Single Device Triple Modular Redundancy Circuits in Xilinx FPGAs. IEEE Trans Nucl Sci 54(6):2037–2043
Reorda MS, Sterpone L, Ullah A (2013) “An error-detection and self-repairing for dynamically and partially reconfigurable systems,”. In IEEE Europen Testing Symposium, Avignon, pp 1–7
Sellers B, Wirthlin M, Kalb J (2009) “FPGA partial reconfiguration via configuration scrubbing,”. In Field Programmable Logic and Applications, Prague, pp 99–104
Steiner N, Wood A, Shojaei H et al (2011) “Torc: Towards an Open-Source Tool Flow,”. In Proceeding of 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, pp 41–44
Sterpone L, Ullah A (2013) "On the optimal reconfiguration times of TMR circuits on SRAM based FPGAs,". In NASA/ESA Adaptive Hardware Systems, Torino, pp 9–14
Sterpone L, Violante M (December 2005) A new analytical approach to estimate the effects of SEU in TMR architectures implemented through SRAM based FPGAs. IEEE Trans Nucl Sci 52(6):2217–2223
Virtex-5 configuration user guide. Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/user_guides/ug191.pdf
Virtex-5 FPGA User Guide. (2012) Xilinx Inc. [Online]. http://www.xilinx.com/support/documentation/user_guides/ug190.pdf
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: C.-W. Wu
Rights and permissions
About this article
Cite this article
Ullah, A., Sterpone, L. Recovery Time and Fault Tolerance Improvement for Circuits mapped on SRAM-based FPGAs. J Electron Test 30, 425–442 (2014). https://doi.org/10.1007/s10836-014-5463-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10836-014-5463-7