Skip to main content

Software Fault Tolerance

  • Chapter
Fault Tolerance

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 3))

Abstract

Fault tolerance techniques for coping with the occurrence and effects of anticipated hardware component failures are now well established and form a vital part of any reliable computing system. However, it is more unusual to find that strategies for fault tolerance have been included in a system for coping with design faults, although such strategies are becoming increasingly common in systems with high reliability requirements. For instance, applications in railway systems, nuclear reactor control and aircraft control are reported by Voges.1 Design faults may not have been a problem in hardware systems (or at least not recognized as such) but are of major concern in software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. U. Voges (ed.), Software Diversity in Computerized Control Systems, Springer-Verlag, Wien (1988).

    Google Scholar 

  2. J.G. Robinson and E.S. Roberts, “Software Fault-Tolerance in the Pluribus,” AFIPS Conference Proceedings 1978 NCC 47, Anaheim (CA), pp. 563–569 (June 1978).

    Google Scholar 

  3. J.H. Wensley et al., “SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings of the IEEE 66 (10), pp. 1240–1255 (October 1978).

    Article  Google Scholar 

  4. J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser ), Springer-Verlag, Berlin (1974).

    Google Scholar 

  5. T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).

    Google Scholar 

  6. P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C-29 (6), pp. 546–549 (June 1980).

    Article  Google Scholar 

  7. F. Cristian, “Exception Handling and Software-Fault Tolerance,” Digest of Papers FTCS-10: 10th International Symposium on Fault-Tolerant Computing Systems, Kyoto, pp. 97–103 (October 1980).

    Google Scholar 

  8. P.M. Melliar-Smith and B. Randell, “Software Reliability: The Role of Programmed Exception Handling,” SIGPLAN Notices 12 (3), pp. 95–100 (March 1977).

    Article  Google Scholar 

  9. D.E. Knuth, The Art of Computer Programming Vols.1–3, Addison-Wesley, Reading (MA) (1968).

    MATH  Google Scholar 

  10. T. Gilb, “Distinct Software: A Redundancy Technique for Reliable Software,” pp. 117–133 in State of the Art Report on Software Reliability, Infotech, Maidenhead (1977).

    Google Scholar 

  11. H. Kopetz, “Software Redundancy in Real Time Systems,” IFIP Congress 74, Stockholm, pp. 182–186 (August 1974).

    Google Scholar 

  12. M.A. Fischler, O. Firschein, and D.L. Drew, “Distinct Software: An Approach to Reliable Computing,” Proceedings of Second USA-Japan Computer Conference, Tokyo, pp.573–579 (August 1975).

    Google Scholar 

  13. H. Hecht, “Fault Tolerant Software for Real-Time Applications,” Computing Surveys 8 (4), pp. 391–407 (December 1976).

    Article  MATH  Google Scholar 

  14. A.B. Long et al., “A Methodology for the Development and Validation of Critical Software for Nuclear Power Plants,” Proceedings COMPSAC 77, Chicago (IL), pp. 620–626 (November 1977).

    Google Scholar 

  15. O.B. von Linde, “Computers Can Now Perform Vital Functions Safely,” Railway Gazette International 135 (11), pp. 1004–1006 (November 1979).

    Google Scholar 

  16. J.P.J. Kelly and A. Avizienis, “A Specification-Oriented Multi-Version Software Experiment,” Digest of Papers FTCS13: Thirteenth Annual International Symposium on Fault-Tolerant Computing, Milano, pp. 120–126 (June 1983).

    Google Scholar 

  17. T. Anderson et al., “Software Fault Tolerance: An Evaluation,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1502–1510 (December 1985).

    Article  Google Scholar 

  18. J.C. Knight and N.G. Leveson, “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering SE-12 (1), pp. 96–109 (January 1986).

    Google Scholar 

  19. D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1511–1517 (December 1985).

    Article  Google Scholar 

  20. B. Littlewood and D.R. Miller, “A Conceptual Model of the Effect of Diverse Methodologies on Coincident Failures in Multi-version Software,” pp. 321–333 in Measurement for Software Control and Assurance, (ed. B.A. Kitchenham and B. Littlewood ), Elsevier Applied Science (1989).

    Google Scholar 

  21. E. Best and F. Cristian, “Systematic Detection of Exception Occurrences,” Technical Report 165, Computing Laboratory, University of Newcastle upon Tyne (April 1981).

    Google Scholar 

  22. R.H. Campbell, K.H. Horton, and G.G. Belford, “Simulations of a Fault-Tolerant Deadline Mechanism,” Digest of Papers FTCS-9: Ninth Annual International Symposium on Fault-Tolerant Computing, Madison (WI), pp. 95–101 (June 1979).

    Google Scholar 

  23. E.J. Salzman, “An Experiment in Producing Highly Reliable Software,” M.Sc. Dissertation, Computing Laboratory, University of Newcastle upon Tyne (1978).

    Google Scholar 

  24. S.K. Shrivastava and A.A. Akinpelu, “Fault Tolerant Sequential Programming Using Recovery Blocks,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, p. 207 (June 1978).

    Google Scholar 

  25. H.O. Welch, “Distributed Recovery Block Performance in a Real-Time Control Loop,” Proceedings of Real-Time Systems Symposium, Arlington (VA), pp. 268–276 (1983).

    Google Scholar 

  26. A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1491–1501 (December 1985).

    Article  Google Scholar 

  27. L. Chen and A. Avizienis, „N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, pp. 3–9 (June 1978).

    Google Scholar 

  28. S.S. Brilliant, J.C. Knight, and N.G. Leveson, “The Consistent Comparison Problem in N-Version Software,” ACM SIGSOFT Software Engineering Notes 12 (1), pp. 29–34 (January 1987).

    Article  Google Scholar 

  29. A. Avizienis and L. Chen, “On the Implementation of N-Version Programming for Software Fault-Tolerance During Program Execution,” Proceedings COMPSAC 77, Chicago (IL), pp. 149–155 (November 1977).

    Google Scholar 

  30. J.C. Knight and N.G. Leveson, “An Empirical Study of Failure Probabilities in Multi-Version Software,” Digest of Papers FTCS-16: Sixteenth Annual International Symposium on Fault-Tolerant Computing, Wien, pp. 165–170 (July 1986).

    Google Scholar 

  31. A. Avizienis, “DEDIX 87–A Supervisory System for Design Diversity Experiments at UCLA,” pp. 129–168 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).

    Google Scholar 

  32. K.S. Tso and A. Avizienis, “Community Error Recovery in N-Version Software: A Design Study With Experimentation,” Digest of Papers FTCS-17: Seventeenth Annual International Symposium on Fault-Tolerant Computing, Pittsburgh, pp.127–133 (July 1987).

    Google Scholar 

  33. R.M. Sedmak and H.L. Liebergot, “Fault-Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29 (6), pp. 492–500 (June 1980).

    Article  Google Scholar 

  34. P. Traverse, “AIRBUS and ATR System Architecture and Specification,” pp. 95–104 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).

    Google Scholar 

  35. P.G. Bishop, “The PODS Diversity Experiment,” pp. 51–84 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).

    Google Scholar 

  36. J.R. Garman, “The Bug Heard Around The World,” ACM Software Engineering Notes 6 (5), pp. 3–10 (October 1981).

    Article  Google Scholar 

  37. G. Hagelin, “ERICSSON Safety System For Railway Control,” pp. 11–22 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1990 Springer-Verlag/Wien

About this chapter

Cite this chapter

Lee, P.A., Anderson, T. (1990). Software Fault Tolerance. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-8990-0_9

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-8992-4

  • Online ISBN: 978-3-7091-8990-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics