Distributed Computing

, Volume 22, Issue 1, pp 49–71

The Heard-Of model: computing in distributed systems with benign faults

Article

Abstract

Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas: (1) degree of synchrony and failure model are two independent parameters that determine a particular type of system, (2) the notion of faulty component is helpful and even necessary for the analysis of distributed computations when faults occur. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign faults: we present a computational model based only on the notion of transmission faults. In this model, computations evolve in rounds, and messages missed in a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that p “hears of” at round r (heard-of set), namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. We demonstrate how this approach leads to shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that the Consensus problem cannot be generally solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions.

Keywords

Consensus Benign fault Computational model HO (Heard-Of) model Transmission fault 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ben-Or, M. : Another advantage of free choice: Completely asynchronous agreement protocols. In: Proceedings of the Second ACM Symposium on Principles of Distributed Computing, pp. 27–30 (1983)Google Scholar
  2. 2.
    Biely, M., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A., Widder, J.: Tolerating corrupted communications. In: Proceedings of the Twentysixth ACM Symposium on Principles of Distributed Computing, pp. 244–253 (2007)Google Scholar
  3. 3.
    Brasileiro, F., Greve, F., Mostéfaoui, A., Raynal, M.: Consensus in one communication step. In: 6th International Conference Parallel Computing Technologies (PaCT), pp. 42–50. Springer, LNCS 2127 (2001)Google Scholar
  4. 4.
    Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Chandra T.D., Toueg S.: Unreliable failure detectors for asynchronous systems. J. ACM 43(2), 225–267 (1996)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Chandy K.M., Misra J.: How processes learn. Distrib. Comput. 1(1), 40–52 (1986)MATHCrossRefGoogle Scholar
  7. 7.
    Charron-Bost, B., Hutle, M., Widder, J.: In search of lost time. Technical Report LSR/2008-006, EPFL (2008)Google Scholar
  8. 8.
    Charron-Bost, B., Merz, S.: Formal verification of a Consensus algorithm in the Heard-Of model. TR 20009/07, LIX (2009)Google Scholar
  9. 9.
    Chlebus B.S., Dicks K., Pelc A.: Broadcasting in synchronous networks with dynamic faults. Networks 27, 309–318 (1996)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Chor B., Dwork C.: Randomization in byzantine agreement. Adv. Comput. Res. 5, 443–497 (1989)Google Scholar
  11. 11.
    Dobrev S.: Computing input multiplicity in anonymous synchronous networks with dynamic faults. J. Discrete Algorithms 2(4), 425–438 (2004)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Dolev D.: The Byzantine generals strike again. J. Algorithms 3(1), 14–30 (1982)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Dolev D., Reischuk R., Strong H.R.: Early stopping in Byzantine agreement. J. ACM 37(4), 720–741 (1990)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Dolev, D., Strong, H.R.: Polynomial algorithms for multiple processor agreement. In: Proceedings of the Fourteenth ACM Symposium on Theory of Computing, pp. 401–407. ACM Press, New York (1982)Google Scholar
  16. 16.
    Dolev D., Strong H.R.: Authenticated algorithms for Byzantine agreement. SIAM J. Comput. 12(4), 656–666 (1983)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Elrad, T.E., Francez, N.: Decomposition of distributed programs into communication-closed-layers. Sci. Comput. Progr. 2(3), April 1982Google Scholar
  19. 19.
    Fischer M.J., Lynch N.A.: A lower bound for the time to assure interactive consistency. Inf. Process. Lett. 14, 183–186 (1982)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Fraigniaud P., Peyrat C.: Broadcasting in a hypercube when some calls fail. Inf. Process. Lett. 39(3), 115–119 (1991)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Gafni, E.: Round-by-round fault detectors: unifying synchrony and asynchrony. In: Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pp. 143–152 (1998)Google Scholar
  23. 23.
    Gopal, A., Toueg, S.: Reliable broadcast in synchronous and asynchronous environments (preliminary version). In: Bermond, J.-C., Raynal, M. (eds.) Proceedings of the Third International Workshop on Distributed Algorithms. Lecture Notes on Computer Science, vol. 392, pp. 110–123. Springer, Heidelberg (1989)Google Scholar
  24. 24.
    Herlihy, M., Rajsbaum, S., Tuttle, M.: Unifying synchronous and asynchronous message-passing models. In: Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pp. 123–132 (1998)Google Scholar
  25. 25.
    Keidar, I., Shraer, A.: Timeliness, failure-detectors, and consensus performance. In: ACM Symposium on Principles of Distributed Computing, pp. 169–178 (2006)Google Scholar
  26. 26.
    Kralovic, R., Kralovic, R., Ruzicka, P.: Broadcasting with many faulty links. In: SIROCCO, pp. 211–222 (2003)Google Scholar
  27. 27.
    Lamport L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  28. 28.
    Lamport L.: Fast Paxos. Distrib. Comput. 19(2), 79–103 (2006)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Lynch N.A.: Distributed Algorithms. Morgan Kaufmann, Menlo Park (1996)MATHGoogle Scholar
  30. 30.
    Malkhi, D., Oprea, F., Zhou, L.: Ω meets Paxos: leader election and stability without eventual timely links. In: International Conference on Distributeed Computing (DISC), pp. 199–213 (2005)Google Scholar
  31. 31.
    Merritt, M.J.: Unpublished Notes (1985)Google Scholar
  32. 32.
    Moses Y., Rajsbaum S.: A layered analysis of consensus. SIAM J. Comput. 31(4), 989–1021 (2002)MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Neiger G., Toueg S.: Automatically increasing the fault-tolerance of distributed algorithms. J. Algorithms 11(3), 374–419 (1990)MATHCrossRefMathSciNetGoogle Scholar
  34. 34.
    Pease M., Shostak R., Lamport L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)MATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Pedone, F., Schiper, A., Urban, P., Cavin, D.: Solving agreement problems with weak ordering oracles. In: Proceedings of the 4th European Dependable Computing Conference (EDCC-4), LNCS-2485, pp. 44–61, Toulouse, France. Springer, Heidelberg (2002)Google Scholar
  36. 36.
    Perry K.J., Toueg S.: Distributed agreement in the presence of processor and communication faults. IEEE Trans. Softw. Eng. 12(3), 477–482 (1986)Google Scholar
  37. 37.
    Santoro N., Widmayer P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384, 232–249 (2007)MATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Symposium on Theor. Aspects of Computer Science, pp. 304–313, Paderborn, Germany (1989)Google Scholar
  39. 39.
    Santoro, N., Widmayer, P.: Distributed function evaluation in the presence of transmission faults. In: SIGAL International Symposium on Algorithms, pp. 358–367 (1990)Google Scholar
  40. 40.
    Santoro, N., Widmayer, P.: Majority and unanimity in synchronous networks with ubiquitous dynamic faults. In: Proceedings of the 12th International Colloquium SIROCCO. Lecture Notes on Computer Science, vol. 3499, pp. 262–276, Mont Saint Michel, France. Springer, Heidelberg (2005)Google Scholar
  41. 41.
    Schmid, U.: How to model link failures: A perception-based fault model. In: IEEE International Conference on Dependable Systems and Networks (DSN), pp. 57–66 (2001)Google Scholar
  42. 42.
    Tsuchiya, T., Schiper, A.: Using bounded model checking to verify consensus algorithms. In: International Conference on Distributeed Computing (DISC), pp. 466–480 (2008)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  1. 1.École polytechniqueCNRSPalaiseau CédexFrance
  2. 2.EPFLLausanneSwitzerland

Personalised recommendations