Abstract
Fault tolerance techniques for coping with the occurrence and effects of anticipated hardware component failures are now well established and form a vital part of any reliable computing system. However, it is more unusual to find that strategies for fault tolerance have been included in a system for coping with design faults, although such strategies are becoming increasingly common in systems with high reliability requirements. For instance, applications in railway systems, nuclear reactor control and aircraft control are reported by Voges.1 Design faults may not have been a problem in hardware systems (or at least not recognized as such) but are of major concern in software systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
U. Voges (ed.), Software Diversity in Computerized Control Systems, Springer-Verlag, Wien (1988).
J.G. Robinson and E.S. Roberts, “Software Fault-Tolerance in the Pluribus,” AFIPS Conference Proceedings 1978 NCC 47, Anaheim (CA), pp. 563–569 (June 1978).
J.H. Wensley et al., “SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control,” Proceedings of the IEEE 66 (10), pp. 1240–1255 (October 1978).
J.J. Horning et al., “A Program Structure for Error Detection and Recovery,” pp. 171–187 in Lecture Notes in Computer Science 16, (ed. E. Gelenbe and C. Kaiser ), Springer-Verlag, Berlin (1974).
T. Anderson and R. Kerr, “Recovery Blocks in Action: A System Supporting High Reliability,” Proceedings of 2nd International Conference on Software Engineering, San Francisco (CA), pp. 447–457 (October 1976).
P.A. Lee, N. Ghani, and K. Heron, “A Recovery Cache for the PDP-11,” IEEE Transactions on Computers C-29 (6), pp. 546–549 (June 1980).
F. Cristian, “Exception Handling and Software-Fault Tolerance,” Digest of Papers FTCS-10: 10th International Symposium on Fault-Tolerant Computing Systems, Kyoto, pp. 97–103 (October 1980).
P.M. Melliar-Smith and B. Randell, “Software Reliability: The Role of Programmed Exception Handling,” SIGPLAN Notices 12 (3), pp. 95–100 (March 1977).
D.E. Knuth, The Art of Computer Programming Vols.1–3, Addison-Wesley, Reading (MA) (1968).
T. Gilb, “Distinct Software: A Redundancy Technique for Reliable Software,” pp. 117–133 in State of the Art Report on Software Reliability, Infotech, Maidenhead (1977).
H. Kopetz, “Software Redundancy in Real Time Systems,” IFIP Congress 74, Stockholm, pp. 182–186 (August 1974).
M.A. Fischler, O. Firschein, and D.L. Drew, “Distinct Software: An Approach to Reliable Computing,” Proceedings of Second USA-Japan Computer Conference, Tokyo, pp.573–579 (August 1975).
H. Hecht, “Fault Tolerant Software for Real-Time Applications,” Computing Surveys 8 (4), pp. 391–407 (December 1976).
A.B. Long et al., “A Methodology for the Development and Validation of Critical Software for Nuclear Power Plants,” Proceedings COMPSAC 77, Chicago (IL), pp. 620–626 (November 1977).
O.B. von Linde, “Computers Can Now Perform Vital Functions Safely,” Railway Gazette International 135 (11), pp. 1004–1006 (November 1979).
J.P.J. Kelly and A. Avizienis, “A Specification-Oriented Multi-Version Software Experiment,” Digest of Papers FTCS13: Thirteenth Annual International Symposium on Fault-Tolerant Computing, Milano, pp. 120–126 (June 1983).
T. Anderson et al., “Software Fault Tolerance: An Evaluation,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1502–1510 (December 1985).
J.C. Knight and N.G. Leveson, “An Experimental Evaluation of the Assumption of Independence in Multiversion Programming,” IEEE Transactions on Software Engineering SE-12 (1), pp. 96–109 (January 1986).
D.E. Eckhardt and L.D. Lee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1511–1517 (December 1985).
B. Littlewood and D.R. Miller, “A Conceptual Model of the Effect of Diverse Methodologies on Coincident Failures in Multi-version Software,” pp. 321–333 in Measurement for Software Control and Assurance, (ed. B.A. Kitchenham and B. Littlewood ), Elsevier Applied Science (1989).
E. Best and F. Cristian, “Systematic Detection of Exception Occurrences,” Technical Report 165, Computing Laboratory, University of Newcastle upon Tyne (April 1981).
R.H. Campbell, K.H. Horton, and G.G. Belford, “Simulations of a Fault-Tolerant Deadline Mechanism,” Digest of Papers FTCS-9: Ninth Annual International Symposium on Fault-Tolerant Computing, Madison (WI), pp. 95–101 (June 1979).
E.J. Salzman, “An Experiment in Producing Highly Reliable Software,” M.Sc. Dissertation, Computing Laboratory, University of Newcastle upon Tyne (1978).
S.K. Shrivastava and A.A. Akinpelu, “Fault Tolerant Sequential Programming Using Recovery Blocks,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, p. 207 (June 1978).
H.O. Welch, “Distributed Recovery Block Performance in a Real-Time Control Loop,” Proceedings of Real-Time Systems Symposium, Arlington (VA), pp. 268–276 (1983).
A. Avizienis, “The N-Version Approach to Fault-Tolerant Software,” IEEE Transactions on Software Engineering SE-11 (12), pp. 1491–1501 (December 1985).
L. Chen and A. Avizienis, „N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation,” Digest of Papers FTCS-8: Eighth Annual International Conference on Fault-Tolerant Computing, Toulouse, pp. 3–9 (June 1978).
S.S. Brilliant, J.C. Knight, and N.G. Leveson, “The Consistent Comparison Problem in N-Version Software,” ACM SIGSOFT Software Engineering Notes 12 (1), pp. 29–34 (January 1987).
A. Avizienis and L. Chen, “On the Implementation of N-Version Programming for Software Fault-Tolerance During Program Execution,” Proceedings COMPSAC 77, Chicago (IL), pp. 149–155 (November 1977).
J.C. Knight and N.G. Leveson, “An Empirical Study of Failure Probabilities in Multi-Version Software,” Digest of Papers FTCS-16: Sixteenth Annual International Symposium on Fault-Tolerant Computing, Wien, pp. 165–170 (July 1986).
A. Avizienis, “DEDIX 87–A Supervisory System for Design Diversity Experiments at UCLA,” pp. 129–168 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
K.S. Tso and A. Avizienis, “Community Error Recovery in N-Version Software: A Design Study With Experimentation,” Digest of Papers FTCS-17: Seventeenth Annual International Symposium on Fault-Tolerant Computing, Pittsburgh, pp.127–133 (July 1987).
R.M. Sedmak and H.L. Liebergot, “Fault-Tolerance of a General Purpose Computer Implemented by Very Large Scale Integration,” IEEE Transactions on Computers C-29 (6), pp. 492–500 (June 1980).
P. Traverse, “AIRBUS and ATR System Architecture and Specification,” pp. 95–104 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
P.G. Bishop, “The PODS Diversity Experiment,” pp. 51–84 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
J.R. Garman, “The Bug Heard Around The World,” ACM Software Engineering Notes 6 (5), pp. 3–10 (October 1981).
G. Hagelin, “ERICSSON Safety System For Railway Control,” pp. 11–22 in Software Diversity in Computerized Control Systems, (ed. U. Voges ), Springer-Verlag, Wien (1988).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1990 Springer-Verlag/Wien
About this chapter
Cite this chapter
Lee, P.A., Anderson, T. (1990). Software Fault Tolerance. In: Fault Tolerance. Dependable Computing and Fault-Tolerant Systems, vol 3. Springer, Vienna. https://doi.org/10.1007/978-3-7091-8990-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-7091-8990-0_9
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-8992-4
Online ISBN: 978-3-7091-8990-0
eBook Packages: Springer Book Archive