Advertisement

Distributed Computing

, Volume 21, Issue 5, pp 353–366 | Cite as

On the weakest failure detector ever

  • Rachid Guerraoui
  • Maurice Herlihy
  • Petr Kuznetsov
  • Nancy Lynch
  • Calvin Newport
Article

Abstract

Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted \({\Upsilon}\) , that provides very little information about failures. In every run of the distributed system, \({\Upsilon}\) eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, \({\Upsilon}\) still captures non-trivial failure information. We show that \({\Upsilon}\) is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n + 1-process consensus using n-process consensus. We show that \({\Upsilon}\) is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as \({\Upsilon}\) does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction \({\Upsilon^f}\) that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.

Keywords

Correct Process Failure Detector Failure Pattern Asynchronous System Faulty Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afek Y., Attiya H., Dolev D., Gafni E., Merritt M., Shavit N.: Atomic snapshots of shared memory. J. ACM 40(4), 873–890 (1993)zbMATHCrossRefGoogle Scholar
  2. 2.
    Borowsky, E., Gafni, E.: Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 91–100. ACM Press, New York (1993)Google Scholar
  3. 3.
    Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Chaudhuri S.: More choices allow more faults: set consensus problems in totally asynchronous systems. Inf. Comput. 105(1), 132–158 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Chen, W., Zhang, J., Chen, Y., Liu, X.: Weakening failure detectors for k-set agreement via the partition approach. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 123–138 (2007)Google Scholar
  7. 7.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Koutnetzov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23th ACM Symposium on Principles of Distributed Computing (2004)Google Scholar
  8. 8.
    Delporte-Gallet C., Fauconnier H., Guerraoui R., Kouznetsov P.: Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput. 65(4), 492–505 (2005)zbMATHCrossRefGoogle Scholar
  9. 9.
    Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Guerraoui, R., Herlihy, M., Kouznetsov, P., Lynch, N.A., Newport, C.C.: On the weakest failure detector ever. In: Proceedings of the 26th ACM Symposium on Principles of Distributed Computing, pp. 235–243 (2007)Google Scholar
  13. 13.
    Guerraoui R., Kouznetsov P.: Failure detectors as type boosters. Distrib. Comput. 20(5), 343–358 (2008)CrossRefGoogle Scholar
  14. 14.
    Herlihy, M., Shavit, N.: The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 111–120 (1993)Google Scholar
  15. 15.
    Herlihy M., Wing J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  16. 16.
    Jayanti P.: Robust wait-free hierarchies. J. ACM 44(4), 592–614 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Mostéfaoui, A., Raynal, M., Travers, C.: Exploring Gafni’s reduction land: from omega to wait-free adaptive (2p-[p/k])- renaming via k-set agreement. In: Proceedings of the 20th International Symposium on Distributed Computing, pp. 1–15 (2006)Google Scholar
  18. 18.
    Neiger, G.: Failure detectors and the wait-free hierarchy. In: Proceedings of the 14th ACM Symposium on Principles of Distributed Computing (1995)Google Scholar
  19. 19.
    Raynal, M., Travers, C.: In search of the holy grail: looking for the weakest failure detector for wait-free set agreement. In: Proceedings of the 10th International Conference on Principles of Distributed Systems, pp. 3–19 (2006)Google Scholar
  20. 20.
    Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: the topology of public knowledge. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 101–110. ACM Press, New York (1993)Google Scholar
  21. 21.
    Yang, J., Neiger, G., Gafni, E.: Structured derivations of consensus algorithms for failure detectors. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 297–306 (1998)Google Scholar
  22. 22.
    Zielinski, P.: Automatic classification of eventual failure detectors. In: Proceedings of the 21st International Symposium on Distributed Computing, pp. 465–479 (2007)Google Scholar
  23. 23.
    Zielinski, P.: Anti-Omega: the weakest failure detector for set agreement. In: Proceedings of the 27th ACM Symposium on Principles of Distributed Computing (2008)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Rachid Guerraoui
    • 1
    • 2
  • Maurice Herlihy
    • 3
  • Petr Kuznetsov
    • 4
  • Nancy Lynch
    • 1
  • Calvin Newport
    • 1
  1. 1.Computer Science and Artificial Intelligence Laboratory, MITCambridgeUSA
  2. 2.School of Computer and Communication Sciences, EPFLLausanneSwitzerland
  3. 3.Computer Science DepartmentBrown UniversityProvidenceUSA
  4. 4.Deutsche Telekom Laboratories, Technische Universität BerlinBerlinGermany

Personalised recommendations