Byzantine Fault Tolerance, from Theory to Reality

  • Kevin Driscoll
  • Brendan Hall
  • Håkan Sivencrona
  • Phil Zumsteg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2788)

Abstract

Since its introduction nearly 20 years ago, the Byzantine Generals Problem has been the subject of many papers having the scrutiny of the fault tolerance community. Numerous Byzantine tolerant algorithms and architectures have been proposed. However, this problem is not yet sufficiently understood by those who design, build, and maintain systems with high dependability requirements. Today, there are still many misconceptions relating to Byzantine failure, what makes a system vulnerable, and indeed the very nature and reality of Byzantine faults. This paper revisits the Byzantine problem from a practitioner’s perspective. It has the intention to provide the reader with a working appreciation of the Byzantine failure from a practical as well as a theoretical perspective. A discussion of typical failure properties and the difficulties in preventing the associated failure propagation is presented. These are illustrated with real Byzantine failure observations. Finally, various architectural solutions to the Byzantine problem are presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    NOAA/American Red Cross: Thunderstorms and Lightning, safety brochure (1994)Google Scholar
  2. 2.
    RTCA Inc.: DO-254, Design Assurance Guidance for Airborne Electronic HardwareGoogle Scholar
  3. 3.
    Constantinescu, C.: Impact of Deep Submicron Technology on Dependability of VLSI Circuits. In: Proc. Dependable Systems and Networks (2002)Google Scholar
  4. 4.
    Systems Standard & Technology Council: Avionics Process Management Committee, http://www.geia.org/sstc/APM/
  5. 5.
    Kelling, N., Heck, W.: The Brake Project—Centralized Versus Distributed Redundancy for Brake-By-Wire Systems. Paper No 2002-01-0266, SAE (2002)Google Scholar
  6. 6.
    TTTech Computertechnik AG, Specification of the TTP/C Protocol V1.0Google Scholar
  7. 7.
    Lamport, L., Shostak, R., Pease, M.: The Byzantine Generals Problem. ACM Transactions on Programming Languages and Systems 4(3), 382–401 (1982)MATHCrossRefGoogle Scholar
  8. 8.
    Lavo, D., Larrabee, B., Chess, T.: Beyond the Byzantine Generals: Unexpected Behavior and Bridging Fault Diagnosis. In: Proc. Int. Test Conference, pp. 611–619 (1996)Google Scholar
  9. 9.
    Bohr, N.: The quantum postulate and the recent development of atomic theory. Nature 121, 580–589 (1928); Reprinted in Quantum Theory and MeasurementGoogle Scholar
  10. 10.
    Chaney, T.: Measured Flip-Flop Responses to Marginal Triggering. IEEE Transactions of Computers C-32(12), 1207–1209 (1983)CrossRefGoogle Scholar
  11. 11.
    Kopetz, H.: Real-Time Systems. In: Design Principles for Distributed Embedded Applications. Kluwer Academic Publishers, Boston (1997)Google Scholar
  12. 12.
    Fault Injection for TTA. Deliverable 5.1–5.5 Combined Report IST 1999 10748Google Scholar
  13. 13.
    Pfeifer, H., Schwier, D., von Henke, F.W.: Formal Verification for Time Triggered Clock Synchronization. In: Proc. 7th IFIP International Working Conference on Dependable Computing for Critical Applications (January 1999)Google Scholar
  14. 14.
    Ademaj, A.: Slightly-Off-Specification Failures in the Time Triggered Architecture. In: 7th IEEE Int. Workshop on High Level Design Validation and Test (October 2002)Google Scholar
  15. 15.
    Wensly, J.H., Lamport, L., Goldberg, J., Levitt, K.N., Melliar-Smith, P.M., Shostak, R.E., Weinstock, C.B.: SIFT: Design and Analysis of fault tolerant computer control for aircraft. Proceedings of IEEE 66(10), 1240–1255 (1978)CrossRefGoogle Scholar
  16. 16.
    Hopkins, A., Smith, T., Lala, J.: FTMP—A Highly Reliable Fault Tolerant Multiprocessor for Aircraft. Proceedings of IEEE 66(10), 1221–1239 (1978)CrossRefGoogle Scholar
  17. 17.
    Miner, P., Malekpour, M., Torres, W.: A Conceptual Design for a Reliable Optical Bus (ROBUS). In: Proc. 21st Digital Avionics Systems Conference (2002)Google Scholar
  18. 18.
    Hoyme, K., Driscoll, K.: SAFEbus. In: Proc. 11th Digital Avionics Systems Conference, October 5–9 (1992)Google Scholar
  19. 19.
    Kopetz, H., Bauer, G., Poledna, S.: Tolerating Arbitrary Node Failure in the Time- Triggered Architecture. Doc No 2001-01-0677, SAE (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Kevin Driscoll
    • 1
  • Brendan Hall
    • 1
  • Håkan Sivencrona
    • 2
  • Phil Zumsteg
    • 1
  1. 1.Honeywell InternationalMinneapolisUSA
  2. 2.Department of Computer EngineeringChalmers University of TechnologyGöteborgSweden

Personalised recommendations