Skip to main content
Log in

On the weakest failure detector ever

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted \({\Upsilon}\) , that provides very little information about failures. In every run of the distributed system, \({\Upsilon}\) eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, \({\Upsilon}\) still captures non-trivial failure information. We show that \({\Upsilon}\) is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that \({\Upsilon}\) is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as \({\Upsilon}\) does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction \({\Upsilon^f}\) that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Afek Y., Attiya H., Dolev D., Gafni E., Merritt M., Shavit N.: Atomic snapshots of shared memory. J. ACM 40(4), 873–890 (1993)

    Article  Google Scholar 

  2. Borowsky, E., Gafni, E.: Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 91–100. ACM Press, New York (1993)

  3. Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)

    Article  MathSciNet  Google Scholar 

  4. Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)

    Article  MathSciNet  Google Scholar 

  5. Chaudhuri S.: More choices allow more faults: set consensus problems in totally asynchronous systems. Inf. Comput. 105(1), 132–158 (1993)

    Article  MathSciNet  Google Scholar 

  6. Chen, W., Zhang, J., Chen, Y., Liu, X.: Weakening failure detectors for k-set agreement via the partition approach. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 123–138 (2007)

  7. Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Koutnetzov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (2004)

  8. Delporte-Gallet C., Fauconnier H., Guerraoui R., Kouznetsov P.: Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput. 65(4), 492–505 (2005)

    Article  Google Scholar 

  9. Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)

    Article  MathSciNet  Google Scholar 

  10. Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)

    Article  MathSciNet  Google Scholar 

  11. Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)

    Article  MathSciNet  Google Scholar 

  12. Guerraoui, R., Herlihy, M., Kouznetsov, P., Lynch, N.A., Newport, C.C.: On the weakest failure detector ever. In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 235–243 (2007)

  13. Guerraoui R., Kouznetsov P.: Failure detectors as type boosters. Distrib. Comput. 20(5), 343–358 (2008)

    Article  Google Scholar 

  14. Herlihy, M., Shavit, N.: The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 111–120 (1993)

  15. Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)

    Article  Google Scholar 

  16. Jayanti P.: Robust wait-free hierarchies. J. ACM 44(4), 592–614 (1997)

    Article  MathSciNet  Google Scholar 

  17. Mostéfaoui, A., Raynal, M., Travers, C.: Exploring Gafni’s reduction land: from omega to wait-free adaptive (2p-[p/k])- renaming via k-set agreement. In: Proceedings of the 20th International Symposium on Distributed Computing, pp. 1–15 (2006)

  18. Neiger, G.: Failure detectors and the wait-free hierarchy. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing (1995)

  19. Raynal, M., Travers, C.: In search of the holy grail: looking for the weakest failure detector for wait-free set agreement. In: Proceedings of the 10th International Conference on Principles of Distributed Systems, pp. 3–19 (2006)

  20. Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: the topology of public knowledge. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 101–110. ACM Press, New York (1993)

  21. Yang, J., Neiger, G., Gafni, E.: Structured derivations of consensus algorithms for failure detectors. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 297–306 (1998)

  22. Zielinski, P.: Automatic classification of eventual failure detectors. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 465–479 (2007)

  23. Zielinski, P.: Anti-Omega: the weakest failure detector for set agreement. In: Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Kuznetsov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guerraoui, R., Herlihy, M., Kuznetsov, P. et al. On the weakest failure detector ever. Distrib. Comput. 21, 353–366 (2009). https://doi.org/10.1007/s00446-009-0079-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-009-0079-3

Navigation