Continuous Consensus with Failures and Recoveries

  • Tal Mizrahi
  • Yoram Moses
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5218)

Abstract

A continuous consensus (CC) protocol maintains for each process i at each time k an up-to-date core M_i[k] of information about the past, so that the cores at all processes are guaranteed to be identical. This is a generalization of simultaneous consensus that provides processes with the ability to perform simultaneously coordinated actions, and saves the need to compute multiple instances of simultaneous consensus at any given time. For an indefinite ongoing service of this type, it is somewhat unreasonable to assume a bound on the number of processes that ever fail. Moreover, over time, we can expect failed processes to be corrected. A failure assumption called (m,t) interval-bounded failures, closely related to the window of vulnerability model of Castro and Liskov, is considered for this type of service. The assumption is that in any given interval of m rounds, at most t processes can display faulty behavior.

This paper presents an efficient CC protocol for the (m,t) bound in the crash and sending omissions failure models. A matching lower bound proof shows that the protocol is optimal in all runs (and not just in the worst case): For each and every behavior of the adversary, and at each time instant m, the core that our protocol maintains at time m is a superset of the core maintained by any other correct CC protocol under the same adversary. The lower bound is a significant generalization of previous proofs for common knowledge, and it applies to continuous consensus in a wide class of benign failure models, including the general omissions model, for which no similar proof existed.

Keywords

Agreement problem Consensus Continuous Consensus Distributed algorithm Early decision Common Knowledge Lower bound Modularity Process crash failure Omission failures Process recovery Round-based computation model Simultaneity Synchronous message-passing system 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burns, J.E., Lynch, N.A.: The byzantine firing squad problem. Technical Report MIT/LCS/TM-275 (1985)Google Scholar
  2. 2.
    Castro, M., Liskov, B.: Proactive recovery in a Byzantine-fault-tolerant system. In: Proc. 4th OSDI: Symp. Op. Sys. Design and Implementation, pp. 273–288 (2000)Google Scholar
  3. 3.
    Charron-Bost, B., Schiper, A.: The Heard-Of Model: Unifying all Benign Failures. EPFL LSR-REPORT-2006-004 (2006)Google Scholar
  4. 4.
    Coan, B.A., Dolev, D., Dwork, C., Stockmeyer, L.J.: The distributed firing squad problem. SIAM J. Comput. 18(5), 990–1012 (1989)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Dolev, D., Reischuk, R., Strong, H.R.: Eventual is earlier than immediate. In: Proc. 23rd IEEE Symp. on Foundations of Computer Science, pp. 196–203 (1982)Google Scholar
  6. 6.
    Dolev, S., Rajsbaum, S.: Stability of long-lived consensus. J. Comput. Syst. Sci. 67(1), 26–45 (2003)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Dwork, C., Moses, Y.: Knowledge and common knowledge in a Byzantine environment: crash failures. Information and Computation 88(2), 156–186 (1990)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Fagin, R., Halpern, J.Y., Moses, Y., Vardi, M.Y.: Reasoning about Knowledge. MIT Press, Cambridge (1995) (revised 2003)MATHGoogle Scholar
  9. 9.
    Halpern, J.Y., Moses, Y.: Knowledge and common knowledge in a distributed environment. Journal of the ACM 37(3), 549–587 (1990)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Merritt, M.J.: Unpublished notes on the Dolev-Strong lower bound for Byzantine Agreement (1984)Google Scholar
  11. 11.
    Mizrahi, T., Moses, Y.: Continuous consensus via common knowledge. Distributed Computing 20(5), 305–321 (2008)CrossRefGoogle Scholar
  12. 12.
    Moses, Y., Raynal, M.: Revisiting Simultaneous Consensus with Crash Failures. Tech Report 1885, 17 pages, IRISA, Université de Rennes 1, France (2008), http://hal.inria.fr/inria-00260643/en/
  13. 13.
    Moses, Y., Tuttle, M.R.: Programming simultaneous actions using common knowledge. Algorithmica 3, 121–169 (1988)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Mostéfaoui, A., Rajsbaum, S., Raynal, M.: Synchronous condition-based consensus. Distributed Computing 18(5), 325–343 (2006)CrossRefGoogle Scholar
  15. 15.
    Neiger, G., Bazzi, R.A.: Using knowledge to optimally achieve coordination in distributed systems. Theor. Comput. Sci. 220(1), 31–65 (1999)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Neiger, G., Tuttle, M.R.: Common knowledge and consistent simultaneous coordination. Distributed Computing 6(3), 181–192 (1993)MATHCrossRefGoogle Scholar
  17. 17.
    Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. Journal of the ACM 27(2), 228–234 (1980)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Santoro, N., Widmayer, P.: Time is not a healer. In: Proc. 6th Symp. Theo. Asp. Comp.  Sci (STACS), pp. 304–313 (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Tal Mizrahi
    • 1
  • Yoram Moses
    • 1
  1. 1.Department of Electrical EngineeringTechnionHaifaIsrael

Personalised recommendations