Abstract
Dormant faults and latent errors can impair recovery of fault tolerant systems. For some, a relatively long time may occur before detection. This can jeopardize recovery because hardware faults or transient errors can accumulate. Since most fault recovery mechanisms are only designed to cope with a single fault, accumulated faults during the recovery process may cause system failure. Several studies have shown that latent faults cannot be ignored in highly reliable systems [ShMc 75], [ShMc 76], [Chil 86], [Shin 86] and [Swer 87]. This is especially a problem in some aerospace systems which must operate in very severe environments in which high rates of transient errors are expected and external disturbances may cause multiple faults. This paper investigates VLSI design techniques which can search for these latent faults actively to allow rapid recovery before multiple errors build up. The basic principle of these techniques is to implement a system with concurrent error detection. Very short test cycles are then inserted into the system periodically as it performs its normal program execution in order to expose and detect latent errors and faults. The rate of inserting these test cycles is chosen (typically every 100 execution cycles) such that the entire system can be exercised in a fraction of a second. The goal is to achieve self-exercising without an excessive hardware overhead (beyond the initial design for concurrent fault-detection which is needed for fault-tolerance anyway) or significantly degrading the system performance. These techniques are also used to provide rapid hardware diagnosis, to simplify initial testing, (a major expense in space programs) and to provide the possibility of on- line acceptance testing (i.e. testing a circuit by just running applications programs.) A previous paper described how self-checking, self-exercising design could be effectively employed in memory [Renn 86]. This paper extends the approach to the processor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Avizienis A., “Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital System Design”, IEEE Trans. Comp.,Vol C-20, No. 11, Nov 1971, pp. 1322–1331.
Bozorgui-Nesbat s., McCluskey E., “Design for Autonomous Test”, Proc. 1980 IEEE Test Conf.,pp. 15–21.
Chau S., “Self-Exercising in Self-Checking Fault Tolerant Computers”, PhD Dissertation, UCLA, in preparation.
Chillarege R., Iyer R., “Fault Latency in the Memory - An Experimental Study on VAX 11/780”, Digest 16th FTCS, Vienna, Austria, July 1986.
Fujiwara, H., Kinoshita, K., “A Design of Programmable Logic Arrays with Universal Tests,” IEEE Trans. Computers, Nov. 1981, pp. 823–828.
Mead C., Conway L., Introduction to VLSI Systems, Addison-Wesley. 1979.
Rennels, D. and S. Chau, “A Self-Exercising Self- Checking Memory Design,” Dig. Int. Symp. Fault-Tolerant Computing,Vienna, June 1986, pp. 358363
Sedmack, R., and Liebergot, H., “Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration, ”, IEEE Trans. Computers, Vol C-20, No. 6, June 1980, pp. 492–500.
Shin K., Lee Y-H., “Measurement and Application of Fault Latency”, IEEE Tran. Comp., Vol C-35, pp. 370–375, Apr. 1986.
Shedletshy J., McClusky E., “The Error Latency of a Fault in a Combinational Digital Circuit”, Digest 5th FTCS, Paris, France, June 1975, pp. 210–214.
Shedletshy J., McClusky E., “The Error Latency of a Fault in a Sequential Circuit”, IEEE Trans. Comp., June 1976, pp. 655–658.
Swem F. et el, “The Effects of Latent Faults on Highly Reliable Computer Systems”, IEEE Tran. Comp., Aug. 1987, pp. 1000–1005.
Tamir, Y. and SéQuin, C. H., “Reducing Common Mode Failures in Duplicate Modules,” International Conference on Computer Design, Port Chester, NY, pp. 302–307 ( October 1984
Wang, S., and Avizienis, A., “The Design of Totally Self- Checking Circuits using Programmable Logic Arrays,” Proc. 1979 Int. Symp. on Fault-Tolerant Computing, Madison, WI, June 1979, pp. 173–180.
Weste N., Eshraghian K., Principles of CMOS VLSI Design: A Systems Perspective, Addison-Wesley, 1985.
Williams T., Parker K., “Design for Testability - A Survey”, IEEE Trans. Comp., Vol C-31, No 1, Jan. 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1989 Plenum Press, New York
About this chapter
Cite this chapter
Chau, S., Rennels, D. (1989). Design Techniques for a Self-Checking Self-Exercising Processor. In: Koren, I. (eds) Defect and Fault Tolerance in VLSI Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-6799-8_18
Download citation
DOI: https://doi.org/10.1007/978-1-4615-6799-8_18
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4615-6801-8
Online ISBN: 978-1-4615-6799-8
eBook Packages: Springer Book Archive