Continuous Consensus with Failures and Recoveries
A continuous consensus (CC) protocol maintains for each process i at each time k an up-to-date core M_i[k] of information about the past, so that the cores at all processes are guaranteed to be identical. This is a generalization of simultaneous consensus that provides processes with the ability to perform simultaneously coordinated actions, and saves the need to compute multiple instances of simultaneous consensus at any given time. For an indefinite ongoing service of this type, it is somewhat unreasonable to assume a bound on the number of processes that ever fail. Moreover, over time, we can expect failed processes to be corrected. A failure assumption called (m,t) interval-bounded failures, closely related to the window of vulnerability model of Castro and Liskov, is considered for this type of service. The assumption is that in any given interval of m rounds, at most t processes can display faulty behavior.
This paper presents an efficient CC protocol for the (m,t) bound in the crash and sending omissions failure models. A matching lower bound proof shows that the protocol is optimal in all runs (and not just in the worst case): For each and every behavior of the adversary, and at each time instant m, the core that our protocol maintains at time m is a superset of the core maintained by any other correct CC protocol under the same adversary. The lower bound is a significant generalization of previous proofs for common knowledge, and it applies to continuous consensus in a wide class of benign failure models, including the general omissions model, for which no similar proof existed.
KeywordsAgreement problem Consensus Continuous Consensus Distributed algorithm Early decision Common Knowledge Lower bound Modularity Process crash failure Omission failures Process recovery Round-based computation model Simultaneity Synchronous message-passing system
Unable to display preview. Download preview PDF.
- 1.Burns, J.E., Lynch, N.A.: The byzantine firing squad problem. Technical Report MIT/LCS/TM-275 (1985)Google Scholar
- 2.Castro, M., Liskov, B.: Proactive recovery in a Byzantine-fault-tolerant system. In: Proc. 4th OSDI: Symp. Op. Sys. Design and Implementation, pp. 273–288 (2000)Google Scholar
- 3.Charron-Bost, B., Schiper, A.: The Heard-Of Model: Unifying all Benign Failures. EPFL LSR-REPORT-2006-004 (2006)Google Scholar
- 5.Dolev, D., Reischuk, R., Strong, H.R.: Eventual is earlier than immediate. In: Proc. 23rd IEEE Symp. on Foundations of Computer Science, pp. 196–203 (1982)Google Scholar
- 10.Merritt, M.J.: Unpublished notes on the Dolev-Strong lower bound for Byzantine Agreement (1984)Google Scholar
- 12.Moses, Y., Raynal, M.: Revisiting Simultaneous Consensus with Crash Failures. Tech Report 1885, 17 pages, IRISA, Université de Rennes 1, France (2008), http://hal.inria.fr/inria-00260643/en/
- 18.Santoro, N., Widmayer, P.: Time is not a healer. In: Proc. 6th Symp. Theo. Asp. Comp. Sci (STACS), pp. 304–313 (1989)Google Scholar