Low-overhead time-triggered group membership
A group membership protocol is presented and proven correct for a synchronous time-triggered model of computation with processors in a ring that broadcast in turn. The protocol, derived from one used for critical control functions in automobiles, accepts a very restrictive fault model to achieve low overhead and requires only one bit of membership information piggybacked on regular broadcasts. Given its strong fault model, the protocol guarantees that a faulty processor will be promptly diagnosed and removed from the agreed group of processors, and will also diagnose itself as faulty. The protocol is correct under a fault-arrival assumption that new faults arrive at least n + 1 time units apart, when there are n processors. Exploiting this assumption leads to unusual real-time reasoning in the correctness proof.
Keywordstime-triggered protocol group membership synchronous algorithms fault tolerance formal modeling
Unable to display preview. Download preview PDF.
- ARINC Specification 659: Backplane Data Bus. Aeronautical Radio, Inc, Annapolis, MD, December 1993. Prepared by the Airlines Electronic Engineering Committee.Google Scholar
- Flaviu Cristian. Reaching agreement on processor-group membership in synchronous distributed systems. Distributed Systems, 4:175–187, 1991.Google Scholar
- David L. Dill. The Muro verification system. In Rajeev Alur and Thomas A. Henzinger, editors, Computer-Aided Verification, CAV '96, volume 1102 of Lecture Notes in Computer Science, pages 390–393, New Brunswick, NJ, July/August 1996. Springer-Verlag.Google Scholar
- Li Gong, Patrick Lincoln, and John Rushby. Byzantine agreement with authentication: Observations and applications in tolerating hybrid and link ] aults. In Dependable Computing for Critical Applications—5, pages 79–90, Champaign, IL, September 1995. IFIP WG 10.4, preliminary proceedings; final proceedings to be published by IEEE.Google Scholar
- Fault Tolerant Computing Symposium 25: Highlights from 25 Years, Pasadena, CA, June 1995. IEEE Computer Society.Google Scholar
- H. Kopetz. Automotive electronics-present state and future prospects. In Fault Tolerant Computing Symposium 25: Special Issue, pages 66–75, Pasadena, CA, June 1995. IEEE Computer Society.Google Scholar
- H. Kopetz, G. Grünsteidl, and J. Reisinger. Fault-tolerant membership service in a synchronous distributed real-time system. In A. Avizienis and J. C. Laprie, editors, Dependable Computing for Critical Applications, volume 4 of Dependable Computing and Fault-Tolerant Systems, pages 411–429, Santa Barbara, CA, August 1989. Springer-Verlag, Vienna, Austria.Google Scholar
- Hermann Kopetz and Günter Grünsteidl. TTP-a protocol for fault-tolerant real-time systems. IEEE Computer, 27(1):14–23, January 1994.Google Scholar
- Patrick Lincoln and John Rushby. A formally verified algorithm for interactive consistency under a hybrid fault model. In Fault Tolerant Computing Symposium 23, pages 402–411, Toulouse, France, June 1993. IEEE Computer Society. Reprinted in [5, pp. 438–447].Google Scholar
- Z. Manna and A. Pnueli. Temporal Verification of Reactive Systems: Safety. Springer-Verlag, 1995.Google Scholar
- John Rushby. A formally verified algorithm for clock synchronization under a hybrid fault model. In Thirteenth ACM Symposium on Principles of Distributed Computing, pages 304–313, Los Angeles, CA, August 1994. Association for Computing Machinery1.Google Scholar
- Philip Thambidurai and You-Keun Park. Interactive consistency with multiple failure modes. In 7th Symposium on Reliable Distributed Systems, pages 93–100, Columbus, OH, October 1988. IEEE Computer Society.Google Scholar