Specification and Verification of Soft Error Performance in Reliable Electronic Systems
This chapter describes the modeling, analysis, and verification methods used to achieve a reliability target set for transient outages in equipment used to build the backbone routing infrastructure of the Internet. We focus on ASIC design and analysis techniques that were undertaken to achieve the targeted behavior using the 65-nm technology. Considerable attention is paid to Single Event Upset in flip-flops and their potential to produce network impacting events that are not systematically detected and controlled. Using random fault injection in large-scale RTL simulations, and slack time distributions from static timing analysis, estimates of functional and temporal soft error masking effects were applied to a system soft error model to drive decisions on interventions such as the choice of flip-flops, parity protection of registers groupings, and designed responses to detected upsets. Central to the design process is a modeling framework that accounts for the upset effects and relates them to the target specification. This enables the final system to be tested using large area neutron beam radiation to confirm the specification has been met.
The authors would like to thank D. Mah, W. Obeidi, M. Pugi, M. Kucinska, A. Agrawal, M. Ruthruff, P. Narayan, J. Bolduc, N. Taranath, and P. Therrien of Cisco Systems for their contributions to the neutron beam testing effort. We would also like to thank M. Olmos and D. Gauthier at iROC Inc, as well as E. Blackmore at TRIUMF and A. Prokofiev at TSL, for their critical support at the neutron test facilities. Finally, we would like to thank David Jones at Xtreme EDA for his contributions to the error masking investigations. The TRIUMF facility is supported by funding from the National Research Council of Canada.
- 1.D. Miras, “A survey of network QOS needs of advanced internet applications,” 2002, Internet2 – QoS working group, pp. 30–31, http://qos.internet2.edu/wg/apps/fellowship/Docs/Internet2AppsQoSNeeds.html.Google Scholar
- 2.“Transmission systems and media, digital systems and networks,” ITU-T Recommendation G.114, pp. 2–3, approved May 2003.Google Scholar
- 3.D. Katz, D. Ward, “Bidirectional Forwarding Detection,” IETF Network Working Group, Internet-Draft, http://tools.ietf.org/html/draft-ietf-bfd-base-00.
- 4.V. Paxson, M. Allman, “Computing TCP’s Retransmission Timer,” IETF Network Working Group, RFC 2988, p. 1, http://www.ietf.org/rfc/rfc2988.txt.
- 5.“Network performance objectives for IP-based services,” ITU-T Recommendation Y.1541, approved Feb. 22, 2006.Google Scholar
- 7.E. Ibe et al, “Spreading diversity in multi-cell neutron-induced upsets with device scaling,” Custom Integrated Circuits Conference, San Jose CA, pp. 437–444, Sept. 2006.Google Scholar
- 8.N. Seifert et al, “Radiation-induced soft error rates of advanced CMOS bulk devices,” Int. Reliability Physics Symposium, San Jose, CA, pp. 217–224, 2006.Google Scholar
- 9.Y. Tosaka et al, “Comprehensive study of soft errors in advanced CMOS circuits with 90/130 nm technology,” Int. Electron Devices Meeting, San Francisco, CA, pp. 38.3.1–38.3.4, Dec. 2004.Google Scholar
- 11.S. Mitra, N. Seifert, M. Zhang, K. Kim, “Robust system design with built-in soft-error resilience,” IEEE Computer, pp. 43–52, Feb. 2005.Google Scholar
- 13.N. Wang, J. Quek, T. Rafacz, S. Patel, “Characterizing the effects of transient faults on a high-performance processor pipeline,” Proc. of the Intl. Conf. on Dependable Systems and Networks, pp. 61–70, 2004.Google Scholar