Skip to main content

Generalized Algorithm of Fault Tolerance (GAFT)

  • Chapter
  • First Online:
Software Design for Resilient Computer Systems

Abstract

Fault tolerance so far was considered as a property of a system. In fact and instead, we introduce a Generalized Algorithm of Fault Tolerance (GAFT) that considers property of fault tolerance as a system process. GAFT implementation analysis—if we want to make it rigorous—should be using classification of redundancy types. Various redundancy types have different “power” of use at various steps of GAFT. Properties of GAFT implementation impact on overall performance of the system, coverage of faults, and ability of reconfiguration. Clear that separation of malfunctions from permanent fault simply must be implemented and reliability gain is analyzed. A ratio of malfunctions to permanent faults is achieving 105−7 and simple exclusion from working configuration a malfunctioned element is no longer feasible. Further, we have to consider GAFT extension in terms of generalization and application for support of system safety of complex systems. Our algorithms of searching correct state, “guilty” element, and analysis of potential damages become powerful extension of GAFT for challenging applications like avionic systems, aircraft as a whole. In Chap. 3, we showed that fault tolerance should be treated as a process. In this chapter, we elaborate further this process into a clearly defined algorithm and develop a framework to the design of fault-tolerant systems, the generalized algorithm of fault tolerance—GAFT.We also introduce a theoretical model to quantify the impact of the additional redundancy to the reliability of the whole system and derive an answer to the question of how much added redundancy leads to the system with highest reliability. A question that GAFT cannot answer is how the real source of a detected fault can be identified, as the fault manifestation might have occurred in another hardware element and spread in the system due to nonexistent fault containment. We will show an algorithm that based on the dependencies of the elements of a system can identify the possible fault sources and also predict which elements an identified fault might have affected. We now start in a first step by further elaborating the process of fault tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Avizienis A, Gilley G, Mathur FP, Rennels D, Rohr J, Rubin D (1971) The star (self-testing and repairing) computer: an investigation of the theory and practice of fault-tolerant computer design. IEEE Trans Comput 20(11):1312–1321

    Article  Google Scholar 

  2. DeAngelis D, Lauro J (1976) Software recovery in the fault-tolerant space borne computer. FTCS-6 26

    Google Scholar 

  3. Schagaev I (1986) Algorithms of computation recovery. Automat Remote Control 7

    Google Scholar 

  4. Schagaev I (1987) Algorithms for restoring a computing process. Automat Remote Control 48(4)

    Google Scholar 

  5. Schagaev I et al (2001) Redundancy classification and its applications for fault tolerant computer design. In IEEE proceedings of man system cybernetics, Arizona Tucson

    Google Scholar 

  6. Avizienis A (1985) Architectures of fault tolerant computing systems, 1975. FTCS-5. In 5th international symposium, pp 3–16

    Google Scholar 

  7. Laprie J-C (1984) Dependability modeling and evaluation of software and hardware systems. In: Fehlertolerierende Rechensysteme, 2. GI/NTG/GMR- Fachtagung, pp 202–215, Springer, London

    Chapter  Google Scholar 

  8. Laprie J-C et al. Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33

    Google Scholar 

  9. Schagaev I (2008) Reliability of malfunction tolerance. In International multi-conference on computer science and information technology. IMCSIT 2008, pp 733–737

    Google Scholar 

  10. O’Brian F (1976) Rollback point insertion strategies. In Digest of papers 6th international symposium on fault-tolerant computing, 1976, FTCS-6

    Google Scholar 

  11. Vilenkin S, Schagaev I (1998) Operating system for fault tolerant SIMD computers Programmirovanie, (No. 3)

    Google Scholar 

  12. Birolini A (2014) Reliability engineering theory and practice, 7th edn, Springer, London

    Book  Google Scholar 

  13. Castano V, Schagaev I (2015) Resilient computer system design. Springer, London ISBN 978- 3-319-15068-0

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Igor Schagaev .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Schagaev, I., Zouev, E., Thomas, K. (2020). Generalized Algorithm of Fault Tolerance (GAFT). In: Software Design for Resilient Computer Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-21244-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21244-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21243-8

  • Online ISBN: 978-3-030-21244-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics