Abstract
This research deals with fault-tolerant computers capable of operating for extended periods without external maintenance. Conventional fault-tolerance techniques such as majority voting are unsuitale for these applications, because performance is too low, power consumption is too high and ab exces- sive number of spares must be included to keep all of the replicated systems working over an extended life. The preferred design approach is to operate as many different computations as possible on single computers, thus maximiz- ing the amount of processing available from limited hardware resources. Fault-tolerance is implemented in a hierarchic fashion. Fault recovery is either done locally within an afflicted computer or, if that unsuccsessfull, by the other working computers when one fails. Concurrent error detrection is required in the computer making up these system since errors must be quickly detected and isolated to allow recovery to begin.
This chaptrer discusses ways of implementing concurrent error detection (i.e., self-checking) and in addition providing self-exercising capabilities that can rapidly expose dormant faults and latent errors. The fundamentals of self- checking design are presented along with an example -- the design of a self - checking self-exercising memory system. A new methodology for implement- ing self-checking in asynchoronous subsystems is discussed along with error simulation result to examine its effectiveness.
This work was supported by the Office of Naval Research, grant N00014-91-J-1009.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rennels, D. and J. Rohr, “Fault-Tolerant Parallel Processors for Avionics with Reduced Maintenance,” Proc. 9th Digital Avionics Systems Conference, October 15–18, 1990, Virginia Beach, Virginia.
W.C. Carter, A.B. Wadia, and D.C. Jessep Jr., “Computer Error Control by Testable Morphic Boolean Functions — A Way of Removing Hardcore”, In Proc. 1972 Int. Symp. Fault-Tolerant Computing, pages 154–159, Newton, Massachusetts, June 1972.
Rennels, D., “Architectures for Fault-Tolerant Spacecraft Computers”, Proc. of the IEEE, October 1978, 66–10: 1255–1268.
David A. Rennels and Hyeongil Kim, “VLSI Implementation of A Self-Checking Self-Exercising Memory System”. Proc. 21th Int. Symp. Fault-Tolerant Computing, pages 170–177, Montreal, Canada, June 1991.
Meyer, J. and L. Wei, “Influence of Workload on Error Recovery in Random Access Memories,” IEEE Trans. Computers, April 1988, pp. 500–507.
Z. Barziiai, V.S. Iyengar, B.K. Rosen, and G.M. Silberman, “Accurate Fault Modeling and Efficient Simulation of Differential CVS Circuits” In International Test Conference, pages 722–729, Philadelphia, PA, Nov 1985.
R. K. Montoye, “Testing Scheme for Differential Cascode Voltage Switch Circuits”. IBM Technical Disclosure Bulletin, 27(10B):6148–6152, Mar 1985.
Niraj K. Jha, “Fault Detection in CVS Parity Trees: Application to SSC CVS Parity and Two-Rail Checkers”, In Proc. 19th Int. Symp. Fault-Tolerant Computing, pages 407–414, Chicago, IL, June 1989.
Niraj K. Jha, “Testing of Differential Cascode Voltage Switch One-Count Generators”. IEEE Journal of Solid-State Circuits, 25(1):246–253, Feb 1990
Andres R. Takach and Niraj K. Jha., “Easily Testable DCVS Multiplier”. In IEEE International Symposium on Circuits and Systems, pages 2732–2735, New Orleans, LA., June 1990.
N. Kanopoulos and N. Vasanthavada, “Testing of Differential Cascode Voltage Switch (DCVS) Circuits”, IEEE Journal of Solid-State Circuits, 25(3):806–813. June 1990.
N. Kanopoulos, Dimitris Pantzartzis, and Frederick R. Bartram, “Design of Self-Checking Circuits Using DCVS Logic: A Case Study”, IEEE Transactions on Computers, 41(7):891–896, July 1992.
Alain J. Martin, Steven M. Burns, T. K. Lee, Drazen Borkovic, and Pieter J. Hazewindus, “The Design of an Asynchronous Microprocessor”. Technical Report Caltech-CS-TR-89-2, CSD, Caltech, 1989
Gordon M. Jacobs and Robert W. Broderson, “A Fully Asynchronous Digital Signal Processor Using Self-timed Circuits”. IEEE Journal of Solid-State Circuits, 25(6):1526–1537, Dec 1990.
W.C. Carter and P.R. Schneider, “Design of Dynamically Checked Computers”, In Proc. IFIP Congress 68, pages 878–883, Edinburgh, Scotland, Aug 1968.
Richard M. Sedmak and Harris L. Liebergot, “Fault Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration”. IEEE Transactions on Computer, 29(6):492–500, June 1980.
Teresa H. Meng. Synchronization Design for Digital Systems, Kluwer Academic Publishers, 1991.
A. Avizienis and D. Renneis, “Fault-Tolerance Experiments with the JPL-STAR Computer”. Dig. of the 6th Annual IEEE Computer Society Int. Conf. (COMPCON), San Francisco, 1972, pp. 321–324.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Kluwer Academic Publishers
About this chapter
Cite this chapter
Rennels, D., Kim, H. (1994). Self-Checking and Self-Exercising Design for Hierarchic Long-Life Fault-Tolerant Systems. In: Koob, G.M., Lau, C.G. (eds) Foundations of Dependable Computing. The Kluwer International Series in Engineering and Computer Science, vol 285. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-28002-8_1
Download citation
DOI: https://doi.org/10.1007/978-0-585-28002-8_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-9486-0
Online ISBN: 978-0-585-28002-8
eBook Packages: Springer Book Archive