Fault Tolerance: Theory and Concepts

  • Igor SchagaevEmail author
  • Eugene Zouev
  • Kaegi Thomas


This chapter briefly introduces how reliability of the system might be considered in combination with fault tolerance. Having introduced hardware faults in the previous chapter, we present in this chapter the elements of theory of fault tolerance and reliability and show how the hardware components of a computing system can be made more resilient to hardware faults. We then introduce the mathematical definition of reliability and show how to calculate the reliability of a system according to the topology of its components. Then we describe the connection between reliability and fault tolerance, i.e., we show how applying different types of redundancy, implemented in software and hardware, increases the reliability of a system. Also, some design advices are given.


  1. 1.
    Birolini A (2007) Reliability engineering theory and practice. SpringerGoogle Scholar
  2. 2.
    Von Neumann J (1956) Probabilistic logics and synthesis of reliable organisms from unreliable components. In: Shannon C, McCarthy J (eds) Automata studies. Princeton University Press, pp 43–98Google Scholar
  3. 3.
    Pierce WH (1965) Failure-tolerant computer design. Academic Press Inc., New YorkGoogle Scholar
  4. 4.
    Laprie J-C (1984) Dependability modeling and evaluation of software and hardware systems. In: Fehlertolerierende Rechensysteme, 2. GI/NTG/GMR- Fachtagung, London, UK. Springer, pp 202–215CrossRefGoogle Scholar
  5. 5.
    Avizienis A, Gilley GC, Mathur FP, Rennels DA, Rohr JA, Rubin DK (1971, November) The star (self-testing and repairing) computer: an investigation of the theory and practice of fault-tolerant computer design. IEEE Trans Comput C 20(11):1312–1321CrossRefGoogle Scholar
  6. 6.
    Avizienis A (1971) FT computing: an overview. Computer 4(1):5–8CrossRefGoogle Scholar
  7. 7.
    Avizienis A (1967) Design of fault-tolerant computers. In: AFIPS conference proceedings, vol 31. Thompson Books, pp 733–743Google Scholar
  8. 8.
    Laprie JC (1992) Dependability: basic concepts and terminology. Springer, LuxemburgCrossRefGoogle Scholar
  9. 9.
    Avizienis A, Laprie J-C (1986) Dependable computing: from concepts to design diversity. In: Proceedings of the IEEE, vol 74, pp 629–638CrossRefGoogle Scholar
  10. 10.
    Schagaev I (1990) Yet another approach to classification of redundancy. In: FTSD-FTCS congress on technical diagnostic, VarnaGoogle Scholar
  11. 11.
    Schagaev I et al (2001) Redundancy classification and its applications for fault tolerant computer design. In: IEEE TESADI-01Google Scholar
  12. 12.
    Sogomonyan E, Shagaev I (1988) Hardware and software for fault-tolerant computing systems. Autom Remote Control 49:129–151zbMATHGoogle Scholar
  13. 13.
    McCluskey E et al (2002) Control-flow checking by software signatures. IEEE Trans Reliab 51(1):111–122CrossRefGoogle Scholar
  14. 14.
    Kulkarni V, Nicola V, Trivedi KS (1990) Effects of check pointing and queuing on program performance Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.IT-ACS LtdStevenageUK
  2. 2.Department of InformaticsTechnopolisInnopolis, KazanRussia

Personalised recommendations