Generalized Algorithm of Fault Tolerance (GAFT)

Schagaev, Igor; Zouev, Eugene; Thomas, Kaegi

doi:10.1007/978-3-030-21244-5_4

Igor Schagaev⁴,
Eugene Zouev⁵ &
Kaegi Thomas⁴

649 Accesses

Abstract

Fault tolerance so far was considered as a property of a system. In fact and instead, we introduce a Generalized Algorithm of Fault Tolerance (GAFT) that considers property of fault tolerance as a system process. GAFT implementation analysis—if we want to make it rigorous—should be using classification of redundancy types. Various redundancy types have different “power” of use at various steps of GAFT. Properties of GAFT implementation impact on overall performance of the system, coverage of faults, and ability of reconfiguration. Clear that separation of malfunctions from permanent fault simply must be implemented and reliability gain is analyzed. A ratio of malfunctions to permanent faults is achieving 10⁵⁻⁷ and simple exclusion from working configuration a malfunctioned element is no longer feasible. Further, we have to consider GAFT extension in terms of generalization and application for support of system safety of complex systems. Our algorithms of searching correct state, “guilty” element, and analysis of potential damages become powerful extension of GAFT for challenging applications like avionic systems, aircraft as a whole. In Chap. 3, we showed that fault tolerance should be treated as a process. In this chapter, we elaborate further this process into a clearly defined algorithm and develop a framework to the design of fault-tolerant systems, the generalized algorithm of fault tolerance—GAFT.We also introduce a theoretical model to quantify the impact of the additional redundancy to the reliability of the whole system and derive an answer to the question of how much added redundancy leads to the system with highest reliability. A question that GAFT cannot answer is how the real source of a detected fault can be identified, as the fault manifestation might have occurred in another hardware element and spread in the system due to nonexistent fault containment. We will show an algorithm that based on the dependencies of the elements of a system can identify the possible fault sources and also predict which elements an identified fault might have affected. We now start in a first step by further elaborating the process of fault tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Avizienis A, Gilley G, Mathur FP, Rennels D, Rohr J, Rubin D (1971) The star (self-testing and repairing) computer: an investigation of the theory and practice of fault-tolerant computer design. IEEE Trans Comput 20(11):1312–1321
Article Google Scholar
DeAngelis D, Lauro J (1976) Software recovery in the fault-tolerant space borne computer. FTCS-6 26
Google Scholar
Schagaev I (1986) Algorithms of computation recovery. Automat Remote Control 7
Google Scholar
Schagaev I (1987) Algorithms for restoring a computing process. Automat Remote Control 48(4)
Google Scholar
Schagaev I et al (2001) Redundancy classification and its applications for fault tolerant computer design. In IEEE proceedings of man system cybernetics, Arizona Tucson
Google Scholar
Avizienis A (1985) Architectures of fault tolerant computing systems, 1975. FTCS-5. In 5th international symposium, pp 3–16
Google Scholar
Laprie J-C (1984) Dependability modeling and evaluation of software and hardware systems. In: Fehlertolerierende Rechensysteme, 2. GI/NTG/GMR- Fachtagung, pp 202–215, Springer, London
Chapter Google Scholar
Laprie J-C et al. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Google Scholar
Schagaev I (2008) Reliability of malfunction tolerance. In International multi-conference on computer science and information technology. IMCSIT 2008, pp 733–737
Google Scholar
O’Brian F (1976) Rollback point insertion strategies. In Digest of papers 6th international symposium on fault-tolerant computing, 1976, FTCS-6
Google Scholar
Vilenkin S, Schagaev I (1998) Operating system for fault tolerant SIMD computers Programmirovanie, (No. 3)
Google Scholar
Birolini A (2014) Reliability engineering theory and practice, 7th edn, Springer, London
Book Google Scholar
Castano V, Schagaev I (2015) Resilient computer system design. Springer, London ISBN 978- 3-319-15068-0
Google Scholar

Download references

Author information

Authors and Affiliations

IT-ACS Ltd, Stevenage, UK
Igor Schagaev & Kaegi Thomas
Department of Informatics, Technopolis, Innopolis, Kazan, Russia
Eugene Zouev

Authors

Igor Schagaev
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Zouev
View author publications
You can also search for this author in PubMed Google Scholar
Kaegi Thomas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Igor Schagaev .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schagaev, I., Zouev, E., Thomas, K. (2020). Generalized Algorithm of Fault Tolerance (GAFT). In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-21244-5_4
Published: 10 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21243-8
Online ISBN: 978-3-030-21244-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics