Abstract
Fault tolerance so far was considered as a property of a system. In fact and instead, we introduce a Generalized Algorithm of Fault Tolerance (GAFT) that considers property of fault tolerance as a system process. GAFT implementation analysis—if we want to make it rigorous—should be using classification of redundancy types. Various redundancy types have different “power” of use at various steps of GAFT. Properties of GAFT implementation impact on overall performance of the system, coverage of faults, and ability of reconfiguration. Clear that separation of malfunctions from permanent fault simply must be implemented and reliability gain is analyzed. A ratio of malfunctions to permanent faults is achieving 105−7 and simple exclusion from working configuration a malfunctioned element is no longer feasible. Further, we have to consider GAFT extension in terms of generalization and application for support of system safety of complex systems. Our algorithms of searching correct state, “guilty” element, and analysis of potential damages become powerful extension of GAFT for challenging applications like avionic systems, aircraft as a whole. In Chap. 3, we showed that fault tolerance should be treated as a process. In this chapter, we elaborate further this process into a clearly defined algorithm and develop a framework to the design of fault-tolerant systems, the generalized algorithm of fault tolerance—GAFT.We also introduce a theoretical model to quantify the impact of the additional redundancy to the reliability of the whole system and derive an answer to the question of how much added redundancy leads to the system with highest reliability. A question that GAFT cannot answer is how the real source of a detected fault can be identified, as the fault manifestation might have occurred in another hardware element and spread in the system due to nonexistent fault containment. We will show an algorithm that based on the dependencies of the elements of a system can identify the possible fault sources and also predict which elements an identified fault might have affected. We now start in a first step by further elaborating the process of fault tolerance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Avizienis A, Gilley G, Mathur FP, Rennels D, Rohr J, Rubin D (1971) The star (self-testing and repairing) computer: an investigation of the theory and practice of fault-tolerant computer design. IEEE Trans Comput 20(11):1312–1321
DeAngelis D, Lauro J (1976) Software recovery in the fault-tolerant space borne computer. FTCS-6 26
Schagaev I (1986) Algorithms of computation recovery. Automat Remote Control 7
Schagaev I (1987) Algorithms for restoring a computing process. Automat Remote Control 48(4)
Schagaev I et al (2001) Redundancy classification and its applications for fault tolerant computer design. In IEEE proceedings of man system cybernetics, Arizona Tucson
Avizienis A (1985) Architectures of fault tolerant computing systems, 1975. FTCS-5. In 5th international symposium, pp 3–16
Laprie J-C (1984) Dependability modeling and evaluation of software and hardware systems. In: Fehlertolerierende Rechensysteme, 2. GI/NTG/GMR- Fachtagung, pp 202–215, Springer, London
Laprie J-C et al. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Schagaev I (2008) Reliability of malfunction tolerance. In International multi-conference on computer science and information technology. IMCSIT 2008, pp 733–737
O’Brian F (1976) Rollback point insertion strategies. In Digest of papers 6th international symposium on fault-tolerant computing, 1976, FTCS-6
Vilenkin S, Schagaev I (1998) Operating system for fault tolerant SIMD computers Programmirovanie, (No. 3)
Birolini A (2014) Reliability engineering theory and practice, 7th edn, Springer, London
Castano V, Schagaev I (2015) Resilient computer system design. Springer, London ISBN 978- 3-319-15068-0
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Schagaev, I., Zouev, E., Thomas, K. (2020). Generalized Algorithm of Fault Tolerance (GAFT). In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-21244-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21243-8
Online ISBN: 978-3-030-21244-5
eBook Packages: EngineeringEngineering (R0)