Systematic Design of Fault-Tolerant Computers

  • Algirdas Avižienis
Conference paper

Abstract

The origin of the concept of fault tolerance and the evolution of guidelines for the systematic design of fault-tolerant systems is reviewed. The current formulation of the guidelines, called a design paradigm, is presented. The problem of using off-the-shelf subsystems in a fault-tolerant system is discussed. In conclusion, an analogy of complex fault-tolerant systems and living organisms is suggested as a means to advance the understanding of fault tolerance.

Keywords

Transportation Explosive Assure Subsys 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Avižienis, A., “Design of Fault-Tolerant Computers,” AFIPS Conference Proceedings, 1967 Fall Joint Computer Conference, Vol. 31, Washington D.C.: Thompson, 1967, pp. 733–743.Google Scholar
  2. [2]
    Proceedings of the Second Symposium on Large-Scale Digital Calculating Machinery,” Sept. 13–16, 1949, Annals Computation Lab., Harvard University, Vol. XVI, Cambridge, MA: Harvard University Press, 1951.Google Scholar
  3. [3]
    Proceedings of the Joint AIEE-IRE Computer Conference, Dec. 10–12, 1951.Google Scholar
  4. [4]
    “Information Processing Systems - Reliability and Requirements”, Proceedings of Eastern Joint Computer Conference, December, 1953.Google Scholar
  5. [5]
    Session 14: “Symposium: Diagnostic Programs and Marginal Checking for Large Scale Digital Computers,” in Convention Record of the IRE 1953 National Convention, part 7, New York, N.Y., March 1953, pp. 48–71.Google Scholar
  6. [6]
    von Neumann, J., “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, C.E. Shannon and J. McCarthy editors, Annals of Math Studies No 34, Princeton, NJ: Princeton University Press, 1956, pp. 43–98.Google Scholar
  7. [7]
    Moore, E.F., Shannon, C.E., “Reliable Circuits Using Less Reliable Relays,” Journal of the Franklin Institute 262, No. 9 and 10, Sept. Oct. 1956, pp. 191–208 and 281–297.MathSciNetCrossRefGoogle Scholar
  8. [8]
    Avižienis, A., Rennels, D.A., “The Evolution of Fault Tolerant Computing at the Jet Propulsion Laboratory and at UCLA: 1960–1986,” in The Evolution of Fault-Tolerant Computing, Springer- Verlag: Vienna and New York, 1987.Google Scholar
  9. [9]
    Avižienis, A., Gilley, G.C., Mathur, F.P., Rennels, D.A., Rohr, J.A., Rubin, D.K., “The STAR (Self-Testing-and-Repairing) Computer: An Investigation of the Theory and Practice of Fault- Tolerant Computer Design,” IEEE Trans, on Computers, Vol. C-20, No. 11, November 1971, pp. 1312–1321; also in Digest of the 1971 International Symposium on Fault Tolerant Computing, Pasadena, CA, March 1971, pp. 92–96.Google Scholar
  10. [10]
    “TOPS Outer Planet Spacecraft,” Astronautics and Aeronautics, Vol. 8, September, 1970.Google Scholar
  11. [11]
    Special Issue, Digest of the 25th International Symposium on Fault-Tolerant Computing, Pasadena, CA., June 27–30, 1995.Google Scholar
  12. [12]
    Pierce, W.H., Failure-Tolerant Computer Design, Academic Press: New York and London, 1965.Google Scholar
  13. [13]
    Avižienis, A., “Architecture of Fault-Tolerant Computing Systems,” Digest of FTCS-5, the 25th International Symposium on Fault-Tolerant Computing, Paris, June 1975, pp.3–16.Google Scholar
  14. [14]
    Avižienis, A., “Fault-Tolerance: The Survival Attribute of Digital Systems,” Proceedings of the IEEE, October 1978, Vol. 66, No. 10, pp. 1109–1125.CrossRefGoogle Scholar
  15. [15]
    Avižienis, A., Laprie, J.C., “Dependable Computing: From Concepts to Design Diversity,” Proceedings of the IEEE, Vol. 74, No. 5, May 1986, pp. 629–638.CrossRefGoogle Scholar
  16. [16]
    Avižienis, A. “A Design Paradigm for Fault Tolerant Systems,” Proceedings of the AIAA Computers in Aerospace VI Conference, Wakefield, MA, October 1987, pp. 52–57.Google Scholar
  17. [17]
    Avižienis, A., “Software Fault Tolerance,” in Information Processing 89, Proceedings of the IFIP 11th World Computer Congress, San Francisco, CA, G.X. Ritter (Ed.), Elsevier Science Publishers, B.V. North Holland, 1989, pp. 491–498.Google Scholar
  18. [18]
    Joseph, M.K., Avižienis, A., “A Fault Tolerance Approach to Computer Viruses”, Proceedings of the 1988 IEEE Symposium on Security and Privacy, Oakland, CA, April 18–21, 1988, pp. 52–58.Google Scholar
  19. [19]
    Avižienis, A., Ball, D.E., “On the Development of a Highly Dependable and Fault Tolerant Air Traffic Control System”, Computer, Vol. 20, No. 2, February 1987, pp. 84–90.CrossRefGoogle Scholar
  20. [20]
    Avižienis, A., “The Dependability Problem: Introduction and Verification of Fault Tolerance for a Very Complex System,” Proceedings of the 1987 Fall Joint Computer Conference, Dallas, Texas, October 1987, pp. 89–93.Google Scholar
  21. [21]
    Avižienis, A., Huang, L.J., He, Y., Valentino, D.J., “Software and System Engineering for a Large- Scale PACS” Paper No. 2435–54 in the Proceedings of SPIE Conference 2435, “PACS Design and Evaluation: Engineering and Clinical Issues,” San Diego, CA, Feb. 26 - Mar. 2, 1995.Google Scholar
  22. [22]
    Avižienis, A., “Fault Tolerance by Means of External Monitoring of Computer Systems,” AFIPS Conference Proceedings, Vol. 50, May 1981, pp. 27–40.Google Scholar
  23. [23]
    Cristian, F., Dancey, R.D., Dehn, J.D., “Fault Tolerance in the Advanced Automation System,” Digest of FTCS-20, the 20th International Symposium on Fault Tolerant Computing, June 1990, pp. 6–17.Google Scholar

Copyright information

© Springer-Verlag London Limited 1997

Authors and Affiliations

  • Algirdas Avižienis
    • 1
    • 2
  1. 1.UCLA Computer Science DepartmentUniversity of CaliforniaLos AngelesUSA
  2. 2.Vytautas Magnus UniversityKaunasLithuania

Personalised recommendations