Skip to main content
Log in

Randomization can be a healer: consensus with dynamic omission failures

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Wireless ad-hoc networks are being increasingly used in diverse contexts, ranging from casual meetings to disaster recovery operations. A promising approach is to model these networks as distributed systems prone to dynamic communication failures. This captures transitory disconnections in communication due to phenomena like interference and collisions, and permits an efficient use of the wireless broadcasting medium. This model, however, is bound by the impossibility result of Santoro and Widmayer, which states that, even with strong synchrony assumptions, there is no deterministic solution to any non-trivial form of agreement if n − 1 or more messages can be lost per communication round in a system with n processes. In this paper we propose a novel way to circumvent this impossibility result by employing randomization. We present a consensus protocol that ensures safety in the presence of an unrestricted number of omission faults, and guarantees progress in rounds where such faults are bounded by \({f \,{\leq}\,\lceil \frac{n}{2} \rceil (n\,{-}\,k)\,{+}\,k\,{-}\,2}\), where k is the number of processes required to decide, eventually assuring termination with probability 1.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aguilera M., Chen W., Toueg S.: Failure detection and consensus in the crash-recovery model. Distrib. Comput. 13(2), 99–125 (2000)

    Article  Google Scholar 

  2. Akkoyunlu, E.A., Ekanadham, K., Huber, R.V.: Some constraints and tradeoffs in the design of network communications. In: Proceedings of the 5th ACM Symposium on Operating Systems Principles, pp. 67–74 (1975)

  3. Ben-Or, M.: Another advantage of free choice: Completely asynchronous agreement protocols. In: Proceedings of the 2nd ACM Symposium on Principles of Distributed Computing, pp. 27–30 (1983)

  4. Biely, M., Widder, J., Charron-Bost, B., Gaillard, A., Hutle, M., Schiper, A.: Tolerating corrupted communication. In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 244–253, (2007)

  5. Bracha, G.: An asynchronous \({\lfloor(n-1)/3\rfloor}\)-resilient consensus protocol. In: Proceedings of the 3rd ACM Symposium on Principles of Distributed Computing, pp. 154–162 (1984)

  6. Cachin C., Kursawe K., Shoup V.: Random oracles in Constantinople: Practical asynchronous byzantine agreement using cryptography. J. Cryptol. 18(3), 219–246 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Canetti, R., Rabin, T.: Fast asynchronous byzantine agreement with optimal resilience. In: Proceedings of the 25th Annual ACM Symposium on Theory of Computing, pp. 42–51 (1993)

  8. Chandra T., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  9. Charron-Bost, B., Schiper, A.: The heard-of model: Computing in distributed systems with benign failures. Technical Report LSR-REPORT-2007-001, EPFL (2007)

  10. Chockler G., Demirbas M., Gilbert S., Lynch N., Newport C., Nolte T.: Consensus and collision detectors in radio networks. Distrib. Comput. 21(1), 55–84 (2008)

    Article  Google Scholar 

  11. Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  12. Dolev, D., Friedman, R., Keidar, I., Malkhi, D.: Failure detectors in omission failure environments. In: Proceedings of the 16th ACM Symposium on Principles of Distributed Computing, pp. 286–295 (1997)

  13. Fischer, M.J.: The consensus problem in unreliable distributed systems (A brief survey). In: Karpinsky, M. (ed.) Foundations of Computing Theory, Lecture Notes in Computer Science, vol. 158, pp. 127–140, Springer (1983)

  14. Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  15. Gray, J.: Notes on data base operating systems. In: Bayer, R., Graham, R.M., Seegmüller, G., (eds) Operating Systems: An Advanced Course, Lecture Notes in Computer Science, vol. 60. Springer (1978)

  16. Hurfin, M., Mostefaoui, A., Raynal, M.: Consensus in asynchronous systems where processes can crash and recover. In: Proceedings of the the 17th IEEE Symposium on Reliable Distributed Systems, pp. 280–286 (1998)

  17. Lamport L.: Lower bounds for asynchronous consensus. Distrib. Comput. 19(2), 104–125 (2006)

    Article  MathSciNet  Google Scholar 

  18. Lamport L., Shostak R., Pease M.: The byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)

    Article  MATH  Google Scholar 

  19. Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, (1997)

  20. Moniz, H., Neves, N.F., Correia, M., Veríssimo, P.: Experimental comparison of local and shared coin randomized consensus protocols. In: Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems, pp. 235–244 (2006)

  21. Moniz, H., Neves, N.F., Correia, M., Veríssimo, P.: RITAS: Services for randomized intrusion tolerance. IEEE Transactions on Dependable and Secure Computing, (to appear) (2010)

  22. Neves N.F., Correia M., Veríssimo P.: Solving vector consensus with a wormhole. IEEE Trans. Parallel. Distrib. Syst. 16(12), 1120–1131 (2005)

    Article  Google Scholar 

  23. Oliveira, R., Guerraoui, R., Schiper, A.: Consensus in the crash-recover model. Technical Report, pp. 97–239, EPFL (1997)

  24. Pease M., Shostak R., Lamport L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  25. Perry K.J., Toueg S.: Distributed agreement in the presence of processor and communication faults. IEEE Trans. Softw. Eng. 12(3), 477–482 (1986)

    Google Scholar 

  26. Rabin, M.O.: Randomized Byzantine generals. In: Proceedings of the 24th Annual IEEE Symposium on Foundations of Computer Science, pp. 403–409 (1983)

  27. Raynal, M., Roy, M.: A note on a simple equivalence between round-based synchronous and asynchronous models. In: Proceedings of the 11th IEEE Pacific Rim International Symposium on Dependable Computing, pp. 387–392 (2005)

  28. Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings of the 6th Symposium on Theoretical Aspects of Computer Science, pp. 304–313 (1989)

  29. Santoro N., Widmayer P.: Agreement in synchronous networks with ubiquitous faults. Theor. Comput. Sci. 384(2–3), 232–249 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  30. Schmid U., Weiss B., Keidar I.: Impossibility results and lower bounds for consensus under link failures. SIAM J. Comput 38(5), 1912–1951 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  31. Varghese G., Lynch N.A.: A tradeoff between safety and liveness for randomized coordinated attack. Inf. Comput. 128(1), 57–71 (1996)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Henrique Moniz.

Additional information

This work was partially supported by the FCT through the Multiannual and the CMU-Portugal Programmes, and the project PTDC/EIAEIA/100894/2008 (DIVERSE).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moniz, H., Neves, N.F., Correia, M. et al. Randomization can be a healer: consensus with dynamic omission failures. Distrib. Comput. 24, 165–175 (2011). https://doi.org/10.1007/s00446-010-0116-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-010-0116-2

Keywords

Navigation