Software Fault Tolerance

Lee, Peter Alan; Anderson, Thomas

doi:10.1007/978-3-7091-8990-0_9

Peter Alan Lee² &
Thomas Anderson²

Part of the book series: Dependable Computing and Fault-Tolerant Systems ((DEPENDABLECOMP,volume 3))

203 Accesses
4 Citations

Abstract

Fault tolerance techniques for coping with the occurrence and effects of anticipated hardware component failures are now well established and form a vital part of any reliable computing system. However, it is more unusual to find that strategies for fault tolerance have been included in a system for coping with design faults, although such strategies are becoming increasingly common in systems with high reliability requirements. For instance, applications in railway systems, nuclear reactor control and aircraft control are reported by Voges.¹ Design faults may not have been a problem in hardware systems (or at least not recognized as such) but are of major concern in software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

U. Voges (ed.), Software Diversity in Computerized Control Systems, Springer-Verlag, Wien (1988).
Google Scholar
J.G. Robinson and E.S. Roberts, “Software Fault-Tolerance in the Pluribus,” AFIPS Conference Proceedings 1978 NCC 47, Anaheim (CA), pp. 563–569 (June 1978).
Google Scholar
J.H. Wensley et al., “SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings of the IEEE 66 (10), pp. 1240–1255 (October 1978).
Article Google Scholar
J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser ), Springer-Verlag, Berlin (1974).
Google Scholar
T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).
Google Scholar
P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C-29 (6), pp. 546–549 (June 1980).
Article Google Scholar
F. Cristian, “Exception Handling and Software-Fault Tolerance,” Digest of Papers FTCS-10: 10th International Symposium on Fault-Tolerant Computing Systems, Kyoto, pp. 97–103 (October 1980).
Google Scholar
P.M. Melliar-Smith and B. Randell, “Software Reliability: The Role of Programmed Exception Handling,” SIGPLAN Notices 12 (3), pp. 95–100 (March 1977).
Article Google Scholar
D.E. Knuth, The Art of Computer Programming Vols.1–3, Addison-Wesley, Reading (MA) (1968).
MATH Google Scholar
T. Gilb, “Distinct Software: A Redundancy Technique for Reliable Software,” pp. 117–133 in State of the Art Report on Software Reliability, Infotech, Maidenhead (1977).
Google Scholar
H. Kopetz, “Software Redundancy in Real Time Systems,” IFIP Congress 74, Stockholm, pp. 182–186 (August 1974).
Google Scholar
M.A. Fischler, O. Firschein, and D.L. Drew, “Distinct Software: An Approach to Reliable Computing,” Proceedings of Second USA-Japan Computer Conference, Tokyo, pp.573–579 (August 1975).
Google Scholar
H. Hecht, “Fault Tolerant Software for Real-Time Applications,” Computing Surveys 8 (4), pp. 391–407 (December 1976).
Article MATH Google Scholar
A.B. Long et al., “A Methodology for the Development and Validation of Critical Software for Nuclear Power Plants,” Proceedings COMPSAC 77, Chicago (IL), pp. 620–626 (November 1977).
Google Scholar
O.B. von Linde, “Computers Can Now Perform Vital Functions Safely,” Railway Gazette International 135 (11), pp. 1004–1006 (November 1979).
Google Scholar
J.P.J. Kelly and A. Avizienis, “A Specification-Oriented Multi-Version Software Experiment,” Digest of Papers FTCS13: Thirteenth Annual International Symposium on Fault-Tolerant Computing, Milano, pp. 120–126 (June 1983).
Google Scholar
T. Anderson et al., “Software Fault Tolerance: An Evaluation,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1502–1510 (December 1985).
Article Google Scholar
J.C. Knight and N.G. Leveson, “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering SE-12 (1), pp. 96–109 (January 1986).
Google Scholar
D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1511–1517 (December 1985).
Article Google Scholar
B. Littlewood and D.R. Miller, “A Conceptual Model of the Effect of Diverse Methodologies on Coincident Failures in Multi-version Software,” pp. 321–333 in Measurement for Software Control and Assurance, (ed. B.A. Kitchenham and B. Littlewood ), Elsevier Applied Science (1989).
Google Scholar
E. Best and F. Cristian, “Systematic Detection of Exception Occurrences,” Technical Report 165, Computing Laboratory, University of Newcastle upon Tyne (April 1981).
Google Scholar
R.H. Campbell, K.H. Horton, and G.G. Belford, “Simulations of a Fault-Tolerant Deadline Mechanism,” Digest of Papers FTCS-9: Ninth Annual International Symposium on Fault-Tolerant Computing, Madison (WI), pp. 95–101 (June 1979).
Google Scholar
E.J. Salzman, “An Experiment in Producing Highly Reliable Software,” M.Sc. Dissertation, Computing Laboratory, University of Newcastle upon Tyne (1978).
Google Scholar
S.K. Shrivastava and A.A. Akinpelu, “Fault Tolerant Sequential Programming Using Recovery Blocks,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, p. 207 (June 1978).
Google Scholar
H.O. Welch, “Distributed Recovery Block Performance in a Real-Time Control Loop,” Proceedings of Real-Time Systems Symposium, Arlington (VA), pp. 268–276 (1983).
Google Scholar
A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1491–1501 (December 1985).
Article Google Scholar
L. Chen and A. Avizienis, „N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, pp. 3–9 (June 1978).
Google Scholar
S.S. Brilliant, J.C. Knight, and N.G. Leveson, “The Consistent Comparison Problem in N-Version Software,” ACM SIGSOFT Software Engineering Notes 12 (1), pp. 29–34 (January 1987).
Article Google Scholar
A. Avizienis and L. Chen, “On the Implementation of N-Version Programming for Software Fault-Tolerance During Program Execution,” Proceedings COMPSAC 77, Chicago (IL), pp. 149–155 (November 1977).
Google Scholar
J.C. Knight and N.G. Leveson, “An Empirical Study of Failure Probabilities in Multi-Version Software,” Digest of Papers FTCS-16: Sixteenth Annual International Symposium on Fault-Tolerant Computing, Wien, pp. 165–170 (July 1986).
Google Scholar
A. Avizienis, “DEDIX 87–A Supervisory System for Design Diversity Experiments at UCLA,” pp. 129–168 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
Google Scholar
K.S. Tso and A. Avizienis, “Community Error Recovery in N-Version Software: A Design Study With Experimentation,” Digest of Papers FTCS-17: Seventeenth Annual International Symposium on Fault-Tolerant Computing, Pittsburgh, pp.127–133 (July 1987).
Google Scholar
R.M. Sedmak and H.L. Liebergot, “Fault-Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29 (6), pp. 492–500 (June 1980).
Article Google Scholar
P. Traverse, “AIRBUS and ATR System Architecture and Specification,” pp. 95–104 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
Google Scholar
P.G. Bishop, “The PODS Diversity Experiment,” pp. 51–84 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
Google Scholar
J.R. Garman, “The Bug Heard Around The World,” ACM Software Engineering Notes 6 (5), pp. 3–10 (October 1981).
Article Google Scholar
G. Hagelin, “ERICSSON Safety System For Railway Control,” pp. 11–22 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
Google Scholar

Download references

Author information

Authors and Affiliations

Computing Laboratory, University of Newcastle upon Tyne, UK
Peter Alan Lee & Thomas Anderson

Authors

Peter Alan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, P.A., Anderson, T. (1990). Software Fault Tolerance. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-7091-8990-0_9
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-8992-4
Online ISBN: 978-3-7091-8990-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics