Abstract
Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted \({\Upsilon}\) , that provides very little information about failures. In every run of the distributed system, \({\Upsilon}\) eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, \({\Upsilon}\) still captures non-trivial failure information. We show that \({\Upsilon}\) is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that \({\Upsilon}\) is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as \({\Upsilon}\) does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction \({\Upsilon^f}\) that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.
Similar content being viewed by others
References
Afek Y., Attiya H., Dolev D., Gafni E., Merritt M., Shavit N.: Atomic snapshots of shared memory. J. ACM 40(4), 873–890 (1993)
Borowsky, E., Gafni, E.: Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 91–100. ACM Press, New York (1993)
Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)
Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Chaudhuri S.: More choices allow more faults: set consensus problems in totally asynchronous systems. Inf. Comput. 105(1), 132–158 (1993)
Chen, W., Zhang, J., Chen, Y., Liu, X.: Weakening failure detectors for k-set agreement via the partition approach. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 123–138 (2007)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Koutnetzov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (2004)
Delporte-Gallet C., Fauconnier H., Guerraoui R., Kouznetsov P.: Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput. 65(4), 492–505 (2005)
Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)
Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Guerraoui, R., Herlihy, M., Kouznetsov, P., Lynch, N.A., Newport, C.C.: On the weakest failure detector ever. In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 235–243 (2007)
Guerraoui R., Kouznetsov P.: Failure detectors as type boosters. Distrib. Comput. 20(5), 343–358 (2008)
Herlihy, M., Shavit, N.: The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 111–120 (1993)
Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Jayanti P.: Robust wait-free hierarchies. J. ACM 44(4), 592–614 (1997)
Mostéfaoui, A., Raynal, M., Travers, C.: Exploring Gafni’s reduction land: from omega to wait-free adaptive (2p-[p/k])- renaming via k-set agreement. In: Proceedings of the 20th International Symposium on Distributed Computing, pp. 1–15 (2006)
Neiger, G.: Failure detectors and the wait-free hierarchy. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing (1995)
Raynal, M., Travers, C.: In search of the holy grail: looking for the weakest failure detector for wait-free set agreement. In: Proceedings of the 10th International Conference on Principles of Distributed Systems, pp. 3–19 (2006)
Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: the topology of public knowledge. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 101–110. ACM Press, New York (1993)
Yang, J., Neiger, G., Gafni, E.: Structured derivations of consensus algorithms for failure detectors. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 297–306 (1998)
Zielinski, P.: Automatic classification of eventual failure detectors. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 465–479 (2007)
Zielinski, P.: Anti-Omega: the weakest failure detector for set agreement. In: Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guerraoui, R., Herlihy, M., Kuznetsov, P. et al. On the weakest failure detector ever. Distrib. Comput. 21, 353–366 (2009). https://doi.org/10.1007/s00446-009-0079-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-009-0079-3