Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

The Heard-Of model: computing in distributed systems with benign faults

Abstract

Problems in fault-tolerant distributed computing have been studied in a variety of models. These models are structured around two central ideas: (1) degree of synchrony and failure model are two independent parameters that determine a particular type of system, (2) the notion of faulty component is helpful and even necessary for the analysis of distributed computations when faults occur. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign faults: we present a computational model based only on the notion of transmission faults. In this model, computations evolve in rounds, and messages missed in a round are lost. Only information transmission is represented: for each round r and each process p, our model provides the set of processes that p “hears of” at round r (heard-of set), namely the processes from which p receives some message at round r. The features of a specific system are thus captured as a whole, just by a predicate over the collection of heard-of sets. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. We demonstrate how this approach leads to shorter and simpler proofs of important results (non-solvability, lower bounds). In particular, we prove that the Consensus problem cannot be generally solved without an implicit and permanent consensus on heard-of sets. We also examine Consensus algorithms in our model. In light of this specific agreement problem, we show how our approach allows us to devise new interesting solutions.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Ben-Or, M. : Another advantage of free choice: Completely asynchronous agreement protocols. In: Proceedings of the Second ACM Symposium on Principles of Distributed Computing, pp. 27–30 (1983)

  2. 2

    Biely, M., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A., Widder, J.: Tolerating corrupted communications. In: Proceedings of the Twentysixth ACM Symposium on Principles of Distributed Computing, pp. 244–253 (2007)

  3. 3

    Brasileiro, F., Greve, F., Mostéfaoui, A., Raynal, M.: Consensus in one communication step. In: 6th International Conference Parallel Computing Technologies (PaCT), pp. 42–50. Springer, LNCS 2127 (2001)

  4. 4

    Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)

  5. 5

    Chandra T.D., Toueg S.: Unreliable failure detectors for asynchronous systems. J. ACM 43(2), 225–267 (1996)

  6. 6

    Chandy K.M., Misra J.: How processes learn. Distrib. Comput. 1(1), 40–52 (1986)

  7. 7

    Charron-Bost, B., Hutle, M., Widder, J.: In search of lost time. Technical Report LSR/2008-006, EPFL (2008)

  8. 8

    Charron-Bost, B., Merz, S.: Formal verification of a Consensus algorithm in the Heard-Of model. TR 20009/07, LIX (2009)

  9. 9

    Chlebus B.S., Dicks K., Pelc A.: Broadcasting in synchronous networks with dynamic faults. Networks 27, 309–318 (1996)

  10. 10

    Chor B., Dwork C.: Randomization in byzantine agreement. Adv. Comput. Res. 5, 443–497 (1989)

  11. 11

    Dobrev S.: Computing input multiplicity in anonymous synchronous networks with dynamic faults. J. Discrete Algorithms 2(4), 425–438 (2004)

  12. 12

    Dolev D.: The Byzantine generals strike again. J. Algorithms 3(1), 14–30 (1982)

  13. 13

    Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)

  14. 14

    Dolev D., Reischuk R., Strong H.R.: Early stopping in Byzantine agreement. J. ACM 37(4), 720–741 (1990)

  15. 15

    Dolev, D., Strong, H.R.: Polynomial algorithms for multiple processor agreement. In: Proceedings of the Fourteenth ACM Symposium on Theory of Computing, pp. 401–407. ACM Press, New York (1982)

  16. 16

    Dolev D., Strong H.R.: Authenticated algorithms for Byzantine agreement. SIAM J. Comput. 12(4), 656–666 (1983)

  17. 17

    Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

  18. 18

    Elrad, T.E., Francez, N.: Decomposition of distributed programs into communication-closed-layers. Sci. Comput. Progr. 2(3), April 1982

  19. 19

    Fischer M.J., Lynch N.A.: A lower bound for the time to assure interactive consistency. Inf. Process. Lett. 14, 183–186 (1982)

  20. 20

    Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

  21. 21

    Fraigniaud P., Peyrat C.: Broadcasting in a hypercube when some calls fail. Inf. Process. Lett. 39(3), 115–119 (1991)

  22. 22

    Gafni, E.: Round-by-round fault detectors: unifying synchrony and asynchrony. In: Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pp. 143–152 (1998)

  23. 23

    Gopal, A., Toueg, S.: Reliable broadcast in synchronous and asynchronous environments (preliminary version). In: Bermond, J.-C., Raynal, M. (eds.) Proceedings of the Third International Workshop on Distributed Algorithms. Lecture Notes on Computer Science, vol. 392, pp. 110–123. Springer, Heidelberg (1989)

  24. 24

    Herlihy, M., Rajsbaum, S., Tuttle, M.: Unifying synchronous and asynchronous message-passing models. In: Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pp. 123–132 (1998)

  25. 25

    Keidar, I., Shraer, A.: Timeliness, failure-detectors, and consensus performance. In: ACM Symposium on Principles of Distributed Computing, pp. 169–178 (2006)

  26. 26

    Kralovic, R., Kralovic, R., Ruzicka, P.: Broadcasting with many faulty links. In: SIROCCO, pp. 211–222 (2003)

  27. 27

    Lamport L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)

  28. 28

    Lamport L.: Fast Paxos. Distrib. Comput. 19(2), 79–103 (2006)

  29. 29

    Lynch N.A.: Distributed Algorithms. Morgan Kaufmann, Menlo Park (1996)

  30. 30

    Malkhi, D., Oprea, F., Zhou, L.: Ω meets Paxos: leader election and stability without eventual timely links. In: International Conference on Distributeed Computing (DISC), pp. 199–213 (2005)

  31. 31

    Merritt, M.J.: Unpublished Notes (1985)

  32. 32

    Moses Y., Rajsbaum S.: A layered analysis of consensus. SIAM J. Comput. 31(4), 989–1021 (2002)

  33. 33

    Neiger G., Toueg S.: Automatically increasing the fault-tolerance of distributed algorithms. J. Algorithms 11(3), 374–419 (1990)

  34. 34

    Pease M., Shostak R., Lamport L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

  35. 35

    Pedone, F., Schiper, A., Urban, P., Cavin, D.: Solving agreement problems with weak ordering oracles. In: Proceedings of the 4th European Dependable Computing Conference (EDCC-4), LNCS-2485, pp. 44–61, Toulouse, France. Springer, Heidelberg (2002)

  36. 36

    Perry K.J., Toueg S.: Distributed agreement in the presence of processor and communication faults. IEEE Trans. Softw. Eng. 12(3), 477–482 (1986)

  37. 37

    Santoro N., Widmayer P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384, 232–249 (2007)

  38. 38

    Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Symposium on Theor. Aspects of Computer Science, pp. 304–313, Paderborn, Germany (1989)

  39. 39

    Santoro, N., Widmayer, P.: Distributed function evaluation in the presence of transmission faults. In: SIGAL International Symposium on Algorithms, pp. 358–367 (1990)

  40. 40

    Santoro, N., Widmayer, P.: Majority and unanimity in synchronous networks with ubiquitous dynamic faults. In: Proceedings of the 12th International Colloquium SIROCCO. Lecture Notes on Computer Science, vol. 3499, pp. 262–276, Mont Saint Michel, France. Springer, Heidelberg (2005)

  41. 41

    Schmid, U.: How to model link failures: A perception-based fault model. In: IEEE International Conference on Dependable Systems and Networks (DSN), pp. 57–66 (2001)

  42. 42

    Tsuchiya, T., Schiper, A.: Using bounded model checking to verify consensus algorithms. In: International Conference on Distributeed Computing (DISC), pp. 466–480 (2008)

Download references

Author information

Correspondence to Bernadette Charron-Bost.

Additional information

A. Schiper’s research funded by the Swiss National Science Foundation under grant number 200021-111701 and Hasler Foundation under grant number 2070.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Charron-Bost, B., Schiper, A. The Heard-Of model: computing in distributed systems with benign faults. Distrib. Comput. 22, 49–71 (2009). https://doi.org/10.1007/s00446-009-0084-6

Download citation

Keywords

  • Consensus
  • Benign fault
  • Computational model
  • HO (Heard-Of) model
  • Transmission fault