Encyclopedia of Algorithms

2008 Edition
| Editors: Ming-Yang Kao

Failure Detectors

1996; Chandra, Toueg
  • Rachid Guerraoui
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30162-4_140

Keywords and Synonyms

Partial synchrony ; Time-outs ; Failure information; Distributed oracles          

Problem Definition

A distributed system is comprised of a collection of processes. The processes typically seek to achieve some common task by communicating through message passing or shared memory. Most interesting tasks require, at least at certain points of the computation, some form of agreement between the processes. An abstract form of such agreement is consensus where processes need to agree on a single value among a set of proposed values. Solving this seemingly elementary problem is at the heart of reliable distributed computing and, in particular, of distributed database commitment, total ordering of messages, and emulations of many shared object types.

Fischer, Lynch, and Paterson's seminal result in the theory of distributed computing [13] says that consensus cannot be deterministically solved in an asynchronousdistributed system that is prone to process failures. This...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: On implementing Omega with weak reliability and synchrony assumptions. In: 22th ACM Symposium on Principles of Distributed Computing, pp. 306–314 (2003)Google Scholar
  2. 2.
    Bertier, M., Marin, O., Sens, P.: Performance analysis of a hierarchical failure detector. In: International Conference on Dependable Systems and Networks (DSN 2003), San Francisco, CA, USA, Proceedings, pp. 635–644. 22–25 June 2003Google Scholar
  3. 3.
    Boroswsky, E., Gafni E.: Generalized FLP impossibility result for t-resilient asynchronous computations. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 91–100, ACM PressGoogle Scholar
  4. 4.
    Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Chauduri, S.: More choices allow more faults: Set consensus problems in totally asynchronous systems. Inf. Comput. 105(1), 132–158 (1993)CrossRefGoogle Scholar
  7. 7.
    Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(1), 13–32 (2002)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: Failure detection lower bounds on registers and consensus. In: Proceedings of the 16th International Symposium on Distributed Computing, LNCS 2508 (2002)Google Scholar
  9. 9.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: Implementing atomic objects in a message passing system. Technical report, EPFL Lausanne (2005)Google Scholar
  10. 10.
    Dwork, C., Lynch, N.A., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Felber, P., Guerraoui, R., Fayad, M.: Putting oo distributed programming to work. Commun. ACM 42(11), 97–101 (1999)CrossRefGoogle Scholar
  12. 12.
    Fernández, A., Jiménez, E., Raynal, M.: Eventual leader election with weak assumptions on initial knowledge, communication reliability and synchrony. In: Proc International Symposium on Dependable Systems and Networks (DSN), pp. 166–178 (2006)Google Scholar
  13. 13.
    Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Guerraoui, R.: Indulgent algorithms. In: Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing, Portland, Oregon, USA, pp. 289–297, ACM, July 2000Google Scholar
  15. 15.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Programm. Lang. Syst. 13(1), 123–149 (1991)CrossRefGoogle Scholar
  16. 16.
    Herlihy, M., Shavit, N.: The asynchronous computability theorem for t-resilient tasks. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 111–120, May 1993Google Scholar
  17. 17.
    Keidar, I., Rajsbaum, S.: On the cost of fault-tolerant consensus when there are no faults-a tutorial. In: Tutorial 21th ACM Symposium on Principles of Distributed Computing, July 2002Google Scholar
  18. 18.
    Lamport, L.: The Part-Time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  19. 19.
    Lo, W.-K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared memory systems. In: Proceedings of the 8th International Workshop on Distributed Algorithms, LNCS 857, pp. 280–295, September 1994Google Scholar
  20. 20.
    Lynch, N.: Distributed Algorithms. Morgan Kauffman (1996)zbMATHGoogle Scholar
  21. 21.
    Michel, R., Corentin, T.: In search of the holy grail: Looking for the weakest failure detector for wait-free set agreement. Technical Report TR 06-1811, INRIA, August 2006Google Scholar
  22. 22.
    Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: The topology of public knowledge. In: Proceedings of the 25th ACM Symposium on Theory of Computing, pp. 101–110, ACM Press, May 1993Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Rachid Guerraoui
    • 1
  1. 1.School of Computer and Communication SciencesEPFLLausanneSwitzerland