On the weakest failure detector ever

Guerraoui, Rachid; Herlihy, Maurice; Kuznetsov, Petr; Lynch, Nancy; Newport, Calvin

doi:10.1007/s00446-009-0079-3

On the weakest failure detector ever

Published: 30 January 2009

Volume 21, pages 353–366, (2009)
Cite this article

Distributed Computing Aims and scope Submit manuscript

Rachid Guerraoui^1,2,
Maurice Herlihy³,
Petr Kuznetsov⁴,
Nancy Lynch¹ &
…
Calvin Newport¹

107 Accesses
5 Citations
Explore all metrics

Abstract

Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted \({\Upsilon}\) , that provides very little information about failures. In every run of the distributed system, \({\Upsilon}\) eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, \({\Upsilon}\) still captures non-trivial failure information. We show that \({\Upsilon}\) is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that \({\Upsilon}\) is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as \({\Upsilon}\) does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction \({\Upsilon^f}\) that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Afek Y., Attiya H., Dolev D., Gafni E., Merritt M., Shavit N.: Atomic snapshots of shared memory. J. ACM 40(4), 873–890 (1993)
Article Google Scholar
Borowsky, E., Gafni, E.: Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 91–100. ACM Press, New York (1993)
Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)
Article MathSciNet Google Scholar
Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)
Article MathSciNet Google Scholar
Chaudhuri S.: More choices allow more faults: set consensus problems in totally asynchronous systems. Inf. Comput. 105(1), 132–158 (1993)
Article MathSciNet Google Scholar
Chen, W., Zhang, J., Chen, Y., Liu, X.: Weakening failure detectors for k-set agreement via the partition approach. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 123–138 (2007)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Koutnetzov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (2004)
Delporte-Gallet C., Fauconnier H., Guerraoui R., Kouznetsov P.: Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput. 65(4), 492–505 (2005)
Article Google Scholar
Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)
Article MathSciNet Google Scholar
Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)
Article MathSciNet Google Scholar
Guerraoui, R., Herlihy, M., Kouznetsov, P., Lynch, N.A., Newport, C.C.: On the weakest failure detector ever. In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 235–243 (2007)
Guerraoui R., Kouznetsov P.: Failure detectors as type boosters. Distrib. Comput. 20(5), 343–358 (2008)
Article Google Scholar
Herlihy, M., Shavit, N.: The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 111–120 (1993)
Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)
Article Google Scholar
Jayanti P.: Robust wait-free hierarchies. J. ACM 44(4), 592–614 (1997)
Article MathSciNet Google Scholar
Mostéfaoui, A., Raynal, M., Travers, C.: Exploring Gafni’s reduction land: from omega to wait-free adaptive (2p-[p/k])- renaming via k-set agreement. In: Proceedings of the 20th International Symposium on Distributed Computing, pp. 1–15 (2006)
Neiger, G.: Failure detectors and the wait-free hierarchy. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing (1995)
Raynal, M., Travers, C.: In search of the holy grail: looking for the weakest failure detector for wait-free set agreement. In: Proceedings of the 10th International Conference on Principles of Distributed Systems, pp. 3–19 (2006)
Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: the topology of public knowledge. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 101–110. ACM Press, New York (1993)
Yang, J., Neiger, G., Gafni, E.: Structured derivations of consensus algorithms for failure detectors. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 297–306 (1998)
Zielinski, P.: Automatic classification of eventual failure detectors. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 465–479 (2007)
Zielinski, P.: Anti-Omega: the weakest failure detector for set agreement. In: Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (2008)

Download references

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, USA
Rachid Guerraoui, Nancy Lynch & Calvin Newport
School of Computer and Communication Sciences, EPFL, Lausanne, Switzerland
Rachid Guerraoui
Computer Science Department, Brown University, Providence, USA
Maurice Herlihy
Deutsche Telekom Laboratories, Technische Universität Berlin, Berlin, Germany
Petr Kuznetsov

Authors

Rachid Guerraoui
View author publications
You can also search for this author in PubMed Google Scholar
Maurice Herlihy
View author publications
You can also search for this author in PubMed Google Scholar
Petr Kuznetsov
View author publications
You can also search for this author in PubMed Google Scholar
Nancy Lynch
View author publications
You can also search for this author in PubMed Google Scholar
Calvin Newport
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Kuznetsov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guerraoui, R., Herlihy, M., Kuznetsov, P. et al. On the weakest failure detector ever. Distrib. Comput. 21, 353–366 (2009). https://doi.org/10.1007/s00446-009-0079-3

Download citation

Received: 24 August 2007
Accepted: 25 November 2008
Published: 30 January 2009
Issue Date: February 2009
DOI: https://doi.org/10.1007/s00446-009-0079-3

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the weakest failure detector ever

Abstract

Access this article

Similar content being viewed by others

Weak Failures: Definitions, Algorithms and Impossibility Results

Set Agreement and the Loneliness Failure Detector in Crash-Recovery Systems

A Closer Look at Fault Tolerance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

On the weakest failure detector ever

Abstract

Access this article

Similar content being viewed by others

Weak Failures: Definitions, Algorithms and Impossibility Results

Set Agreement and the Loneliness Failure Detector in Crash-Recovery Systems

A Closer Look at Fault Tolerance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation