Low-overhead time-triggered group membership

  • Shmuel Katz
  • Pat Lincoln
  • John Rushby
Contributed Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1320)


A group membership protocol is presented and proven correct for a synchronous time-triggered model of computation with processors in a ring that broadcast in turn. The protocol, derived from one used for critical control functions in automobiles, accepts a very restrictive fault model to achieve low overhead and requires only one bit of membership information piggybacked on regular broadcasts. Given its strong fault model, the protocol guarantees that a faulty processor will be promptly diagnosed and removed from the agreed group of processors, and will also diagnose itself as faulty. The protocol is correct under a fault-arrival assumption that new faults arrive at least n + 1 time units apart, when there are n processors. Exploiting this assumption leads to unusual real-time reasoning in the correctness proof.


time-triggered protocol group membership synchronous algorithms fault tolerance formal modeling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    ARINC Specification 659: Backplane Data Bus. Aeronautical Radio, Inc, Annapolis, MD, December 1993. Prepared by the Airlines Electronic Engineering Committee.Google Scholar
  2. [2]
    Flaviu Cristian. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Systems, 4:175–187, 1991.Google Scholar
  3. [3]
    David L. Dill. The Muro verification system. In Rajeev Alur and Thomas A. Henzinger, editors, Computer-Aided Verification, CAV '96, volume 1102 of Lecture Notes in Computer Science, pages 390–393, New Brunswick, NJ, July/August 1996. Springer-Verlag.Google Scholar
  4. [4]
    Li Gong, Patrick Lincoln, and John Rushby. Byzantine agreement with authentication: Observations and applications in tolerating hybrid and link ] aults. In Dependable Computing for Critical Applications—5, pages 79–90, Champaign, IL, September 1995. IFIP WG 10.4, preliminary proceedings; final proceedings to be published by IEEE.Google Scholar
  5. [5]
    Fault Tolerant Computing Symposium 25: Highlights from 25 Years, Pasadena, CA, June 1995. IEEE Computer Society.Google Scholar
  6. [6]
    H. Kopetz. Automotive electronics-present state and future prospects. In Fault Tolerant Computing Symposium 25: Special Issue, pages 66–75, Pasadena, CA, June 1995. IEEE Computer Society.Google Scholar
  7. [7]
    H. Kopetz, G. Grünsteidl, and J. Reisinger. Fault-tolerant membership service in a synchronous distributed real-time system. In A. Avizienis and J. C. Laprie, editors, Dependable Computing for Critical Applications, volume 4 of Dependable Computing and Fault-Tolerant Systems, pages 411–429, Santa Barbara, CA, August 1989. Springer-Verlag, Vienna, Austria.Google Scholar
  8. [8]
    Hermann Kopetz and Günter Grünsteidl. TTP-a protocol for fault-tolerant real-time systems. IEEE Computer, 27(1):14–23, January 1994.Google Scholar
  9. [9]
    Patrick Lincoln and John Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In Fault Tolerant Computing Symposium 23, pages 402–411, Toulouse, France, June 1993. IEEE Computer Society. Reprinted in [5, pp. 438–447].Google Scholar
  10. [10]
    Z. Manna and A. Pnueli. Temporal Verification of Reactive Systems: Safety. Springer-Verlag, 1995.Google Scholar
  11. [11]
    Fred J. Meyer and Dhiraj K. Pradhan. Consensus with dual failure modes. IEEE Transactions on Parallel and Distributed Systems, 2(2):214–222, April 1991.CrossRefGoogle Scholar
  12. [12]
    Sam Owre, John Rushby, Natarajan Shankar, and Friedrich von Henke. Formal verification for fault-tolerant architectures: Prolegomena to the design of PVS. IEEE Transactions on Software Engineering, 21(2):107–125, February 1995.CrossRefGoogle Scholar
  13. [13]
    John Rushby. A formally verified algorithm for clock synchronization under a hybrid fault model. In Thirteenth ACM Symposium on Principles of Distributed Computing, pages 304–313, Los Angeles, CA, August 1994. Association for Computing Machinery1.Google Scholar
  14. [14]
    Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319, December 1990.CrossRefGoogle Scholar
  15. [15]
    Philip Thambidurai and You-Keun Park. Interactive consistency with multiple failure modes. In 7th Symposium on Reliable Distributed Systems, pages 93–100, Columbus, OH, October 1988. IEEE Computer Society.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Shmuel Katz
    • 1
    • 2
  • Pat Lincoln
    • 1
  • John Rushby
    • 1
  1. 1.Computer Science LaboratorySRI InternationalMenlo ParkUSA
  2. 2.Computer Science DepartmentThe TechnionHaifaIsrael

Personalised recommendations