Distributed Computing

, Volume 25, Issue 4, pp 313–333 | Cite as

Failure detectors encapsulate fairness

  • Scott M. Pike
  • Srikanth Sastry
  • Jennifer L. Welch
Article

Abstract

Failure detectors have long been viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of partial synchrony are specified with respect to real time, but failure detectors do not encapsulate real time. Instead, we argue that failure detectors encapsulate the fairness in computation and communication. Fairness is a measure of the number of steps executed by one process relative either to the number of steps taken by another process or relative to the duration for which a message is in transit. We argue that failure detectors are substitutable for the fairness properties (rather than real-time properties) of partially synchronous systems. We propose four fairness-based models of partial synchrony and demonstrate that they are, in fact, the ‘weakest system models’ to implement the canonical failure detectors from the Chandra-Toueg hierarchy. We also propose a set of fairness-based models which encapsulate the \({\mathcal{G}_c}\) parametric failure detectors which eventually and permanently suspect crashed processes, and eventually and permanently trust some fixed set of c correct processes.

Keywords

Failure detectors Partial synchrony Fairness Crash faults Fault tolerance Schedulers 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguilera M.K., Chen W., Toueg S.: On quiescent reliable communication. SIAM J. Comput. 29(6), 2040–2073 (2000)MathSciNetMATHCrossRefGoogle Scholar
  2. 2.
    Aguilera, M. K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Stable leader election. In: Proceedings of the 15th International Conference on Distributed Computing, pp. 108–122. Springer, London, UK (2001)Google Scholar
  3. 3.
    Aguilera, M. K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Communication-efficient leader election and consensus with limited link synchrony. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 328–337 (2004)Google Scholar
  4. 4.
    Aguilera M.K., Delporte-Gallet C., Fauconnier H., Toueg S.: On implementing omega in systems with weak reliability and synchrony assumptions. Distrib. Comput. 21(4), 285–314 (2008)CrossRefGoogle Scholar
  5. 5.
    Anta A.F., Raynal M.: From an asynchronous intermittent rotating star to an eventual leader. IEEE Trans. Parallel Distrib. Syst. 21(9), 1290–1303 (2010)CrossRefGoogle Scholar
  6. 6.
    Biely, M., Hutle, M., Penso, L. D., Widder, J.: Relating stabilizing timing assumptions to stabilizing failure detectors regarding solvability and efficiency. In: Proceedings of the 9th International Symposium on Stabilization, Safety, and Security of Distributed Systems, pp. 4–20 (2007)Google Scholar
  7. 7.
    Biely, M., Hutle, M., Penso, L.D., Widder, J.: Relating stabilizing timing assumptions to stabilizing failure detectors regarding solvability and efficiency. Technical Report 54/2007, Technische Universität Wien, Institut für Technische Informatik (2007)Google Scholar
  8. 8.
    Biely, M., Robinson, P., Schmid, U.: Weak synchrony models and failure detectors for message passing k-set agreement. In: Proceedings of the 13th International Conference on Principles of Distributed Systems, pp. 285–299 (2009)Google Scholar
  9. 9.
    Bonnet, F., Raynal, M.: Looking for the weakest failure detector for k-set agreement in message-passing systems: is πk the end of the road? In: Procceding of 11th International Symposium on Stabilization, Safety, and Security of Distributed Systems, pp. 149–164 (2009)Google Scholar
  10. 10.
    Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)MathSciNetMATHCrossRefGoogle Scholar
  11. 11.
    Chandra T.D., Hadzilacos V., Toueg S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685–722 (1996)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Chandy K.M., Misra J.: The drinking philosophers problem. ACM Trans. Program. Lang. Syst. 6(4), 632–646 (1984)CrossRefGoogle Scholar
  13. 13.
    Charron-Bost, B., Guerraoui, R., Schiper, A.: Synchronous system and perfect failure detector: solvability and efficiency issues. In: International Conference on Dependable Systems and Networks, pp. 523–532 (2000)Google Scholar
  14. 14.
    Chen W., Toueg S., Aguilera M.K.: On the quality of service of failure detectors. IEEE Trans. Comput. 51(5), 561–580 (2002)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Hadzilacos, V., Kouznetsov, P., Toueg, S.: The weakest failure detectors to solve certain fundamental problems in distributed computing. In: Proceedings of the 23rd ACM Symposium on Principles of Distributed Computing, pp. 338–346 (2004)Google Scholar
  16. 16.
    Delporte-Gallet C., Fauconnier H., Guerraoui R., Kouznetsov P.: Mutual exclusion in asynchronous systems with failure detectors. J. Parallel Distrib. Comput. 65(4), 492–505 (2005)MATHCrossRefGoogle Scholar
  17. 17.
    Dolev D., Dwork C., Stockmeyer L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77–97 (1987)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Dwork C., Lynch N.A., Stockmeyer L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288–323 (1988)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Fetzer, C.: The message classification model. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 153–162 (1998)Google Scholar
  20. 20.
    Fetzer, C., Schmid, U., Susskraut, M.: On the possibility of consensus in asynchronous systems with finite average response times. In: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems, pp. 271–280 (2005)Google Scholar
  21. 21.
    Fich F., Ruppert E.: Hundreds of impossibility results for distributed computing. Distrib. Comput. 16(2–3), 121–163 (2003)CrossRefGoogle Scholar
  22. 22.
    Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Gafni, E., Kouznetsov, P.: The weakest failure detector for solving k-set agreement. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, pp. 83–91 (2009)Google Scholar
  24. 24.
    Guerraoui, R., Oliveira, R., Schiper, A.: Stubborn communication channels. Technical Report LSR-REPORT-1998-009, Ecole Polytechnique Federale de Lausanne (1998)Google Scholar
  25. 25.
    Guerraoui R., Kapalka M., Kouznetsov P.: The weakest failure detectors to boost obstruction-freedom. Distrib. Comput. 20(6), 415–433 (2008)CrossRefGoogle Scholar
  26. 26.
    Hermant, J. F., Widder, J.: Implementing reliable distributed real-time systems with the Θ-model. In: Proceedings of the 9th International Conference on the Principles of Distributed Systems, pp. 334–350 (2005)Google Scholar
  27. 27.
    Hutle M., Malkhi D., Schmid U., Zhou L.: Chasing the weakest system model for implementing Ω and consensus. IEEE Trans. Dependable Secur. Comput. 6(4), 269–281 (2009)CrossRefGoogle Scholar
  28. 28.
    Jayanti, P., Toueg, S.: Every problem has a weakest failure detector. In: Proceedings of the 27th ACM Symposium on Principles of Distributed Computing, pp. 75–84 (2008)Google Scholar
  29. 29.
    Lo, W. K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared-memory systems (extended abstract). In: Proceedings of the 8th International Workshop on Distributed Algorithms, pp. 280–295 (1994)Google Scholar
  30. 30.
    Malkhi, D., Oprea, F., Zhou, L.:Ω meets Paxos: Leader election and stability without eventual timely links. In: Proceedings of the 19th International Symposium on Distributed Computing, pp. 199–213 (2005)Google Scholar
  31. 31.
    Mostefaoui A., Mourgaya E., Raynal M.: An introduction to oracles for asynchronous distributed systems. Futur. Gener. Comput. Syst. 18(6), 757–767 (2002)MATHCrossRefGoogle Scholar
  32. 32.
    Mostéfaoui, A., Mourgaya, E., Raynal, M.: Asynchronous implementation of failure detectors. In: Proceedings of the 33rd International Conference on Dependable Systems and Networks, pp. 351–360 (2003)Google Scholar
  33. 33.
    Mostéfaoui A., Mourgaya E., Raynal M., Travers C.: A time-free assmption to implement eventual leadership. Parallel Process. Lett. 16(2), 189–207 (2006)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Mostefaoui A., Rajsbaum S., Raynal M., Travers C.: On the computability power and the robustness of set agreement-oriented failure detector classes. Distrib. Comput. 21(3), 201–222 (2008)CrossRefGoogle Scholar
  35. 35.
    Pike, S. M., Sivilotti, P. A.: Dining philosophers with crash locality 1. In: Proceedings of the 24th IEEE International Conference on Distributed Computing Systems, pp. 22–29 (2004)Google Scholar
  36. 36.
    Pike, S. M., Song, Y., Sastry, S.: Wait-free dining under eventual weak exclusion. In: Proceedings of the 9th International Conference on Distributed Computing and Networking, pp. 135–146 (2008)Google Scholar
  37. 37.
    Rajsbaum, S., Raynal, M., Travers, C.: Failure detectors as schedulers (an algorithmically-reasoned characterization). Technical Report 1838, IRISA, Université de Rennes, France (2007)Google Scholar
  38. 38.
    Rajsbaum, S., Raynal, M., Travers, C.:The iterated restricted immediate snapshot model. In: Proceedings of 14th International Conference on Computing and Combinatorics, pp. 487–497 (2008)Google Scholar
  39. 39.
    Robinson, P., Schmid, U.: The Asynchronous Bounded-Cycle Model. In: Proceedings of the 10th International Symposium on Stabilization, Safety, and Security of Distributed Systems, pp. 246–262 (2008)Google Scholar
  40. 40.
    Sastry, S., Pike, S. M.: Eventually perfect failure detection using ADD channels. In: Proceedings of the 5th international Symposium on Parallel and Distributed Processing and Applications, pp. 483–496 (2007)Google Scholar
  41. 41.
    Sastry, S., Pike, S. M., Welch, J. L.: Crash fault detection in celerating environments. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium, pp. 1–12 (2009)Google Scholar
  42. 42.
    Yang, J., Neiger, G., Gafni, E.:Structured derivations of consensus algorithms for failure detectors. In: Proceedings of the 17th ACM Symposium on Principles of Distributed Computing, pp. 297–306 (1998)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Scott M. Pike
    • 1
  • Srikanth Sastry
    • 2
  • Jennifer L. Welch
    • 1
  1. 1.Department of Computer Science and EngineeringTexas A&M UniversityCollege StationUSA
  2. 2.Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations