Theory of Computing Systems

, Volume 62, Issue 5, pp 1085–1108 | Cite as

A Closer Look at Fault Tolerance

  • Gadi Taubenfeld


The traditional notion of fault tolerance requires that all the correct participating processes eventually terminate, and thus, is not sensitive to the number of correct processes that should terminate as a result of failures. Intuitively, an algorithm that in the presence of any number of faults always guarantees that all the correct processes except maybe one terminate, is more resilient to faults than an algorithm that in the presence of a single fault does not even guarantee that a single correct process ever terminates. However, according to the standard notion of fault tolerance both algorithms are classified as algorithms that can not tolerate a single fault. To overcome this difficulty, we generalize the traditional notion of fault tolerance in a way which enables to capture more sensitive information about the resiliency of an algorithm. Then, we present several algorithms for solving classical problems which are resilient under the new notion. It is well known that, in an asynchronous systems where processes communicate either by reading and writing atomic registers or by sending and receiving messages, important problems such as, consensus, set-consensus, election, perfect renaming, implementations of a test-and-set bit, a shared stack, a swap object and a fetch-and-add object have no deterministic solutions which can tolerate even a single fault. We show that while, some of these problems have solutions which guarantee that in the presence of any number of faults most of the correct processes will terminate; other problems do not even have solutions which guarantee that in the presence of just one fault at least one correct process terminates. All our results are presented in the context of crash failures in asynchronous systems.


Fault tolerance Crash failures Shared memory Message passing Election Test-and-set Renaming Consensus Set-consensus Stack Swap Fetch-and-add 



I wish to thank the three anonymous referees for their constructive suggestions and corrections.


  1. 1.
    Afek, Y., Attiya, H., Fouren, A., Stupp, G., Touitou, D.: Long-lived renaming made adaptive Proceedings 18th ACM Symp. on Principles of Distributed Computing, pp 91–103 (1999)Google Scholar
  2. 2.
    Afek, Y., Gafni, E., Morrison, A.: Common2 extended to stacks and unbounded concurrency Proceedings 25th ACM Symp. on Principles of Distributed Computing, pp 218–227 (2006)Google Scholar
  3. 3.
    Afek, Y., Weisberger, E., Weisman, H.: A completeness theorem for a class of synchronization objects Proceedings 12th ACM Symp. on Principles of Distributed Computing, pp 159–170 (1993)Google Scholar
  4. 4.
    Anderson, J.H., Moir, M.: Using k-exclusion to implement resilient, scalable shared objects Proceedings 14th ACM Symp. on Principles of Distributed Computing, pp 141–150 (1994)Google Scholar
  5. 5.
    Attiya, H., Bar-noy, A., Dolev, D., Koller, D., Peleg, D., Reischuk, R.: Achievable cases in an asynchronous environment Proceedings 28th IEEE Symp. on Foundations of Computer Science, pp 337–346 (1987)Google Scholar
  6. 6.
    Attiya, H., Bar-Noy, A., Dolev, D., Koller, D., Peleg, D., Reischuk, R.: Renaming in an asynchronous environment. J. Assoc. Comput. Mach. 37(3), 524–548 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Attiya, H., Fouren, A.: Polynomial and adaptive long-lived (2k − 1)-renaming Proceedings 14th International Symp. on Distributed Computing: Lecture Notes in Computer Science, vol. 1914, pp 149–163 (2000)Google Scholar
  8. 8.
    Attiya, H., Fouren, A.: Algorithms adapting to point contention. J. ACM 50(4), 144–468 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bar-Noy, A., Dolev, D.: Shared memory versus message-passing in an asynchronous distributed environment Proceedings 8th ACM Symp. on Principles of Distributed Computing, pp 307–318 (1989)Google Scholar
  10. 10.
    Borowsky, E., Gafni, E.: Generalizecl FLP impossibility result for t-resilient asynchronous computations Proceedings 25th ACM Symp. on Theory of Computing, pp 91–100 (1993)Google Scholar
  11. 11.
    Borowsky, E., Gafni, E., Lynch, N.A., Rajsbaum, S.: The BG distributed simulation algorithm. Distrib. Comput. 14(3), 127–146 (2001)CrossRefGoogle Scholar
  12. 12.
    Brodsky, A., Ellen, F., Woelfel, P.: Fully-adaptive algorithms for long-lived renaming. Distrib. Comput. 24(2), 119–134 (2011)CrossRefzbMATHGoogle Scholar
  13. 13.
    Burns, J.E., Fischer, M.J., Jackson, P., Lynch, N.A., Peterson, G.L.: Shared data requirements for implementation of mutual exclusion using a test-and-set primitive Proceedings of the International Conf. on Parallel Processing, pp 79–87 (1978)Google Scholar
  14. 14.
    Burns, J.E., Jackson, P., Lynch, N.A., Fischer, M.J., Peterson, G.L.: Data requirements for implementation of N-process mutual exclusion using a single shared variable. J. Assoc. Comput. Mach. 29(1), 183–205 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Burns, J.E., Lynch, A.N.: Mutual exclusion using indivisible reads and writes 18th annual allerton conference on communication, control and computing, pp 833–842 (1980)Google Scholar
  16. 16.
    Burns, J.E., Peterson, G.L.: The ambiguity of choosing Proceedings 8th ACM Symp. on Principles of Distributed Computing, pp 145–158 (1989)Google Scholar
  17. 17.
    Burns, J.N., Lynch, N.A.: Bounds on shared-memory for mutual exclusion. Inf. Comput. 107(2), 171–184 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Castaneda, A., Rajsbaum, S., Raynal, M.: The renaming problem in shared memory systems: an introduction. Computer Science Review 5(3), 229–251 (2011)CrossRefzbMATHGoogle Scholar
  19. 19.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R., Tielmanns, A.: The disagreement power of an adversary Proceedings 28th ACM Symp. on Principles of Distributed Computing, pp 288–289 (2009)Google Scholar
  20. 20.
    Dijkstra, E.W.: Solution of a problem in concurrent programming control. Commun. ACM 8(9), 569 (1965)CrossRefGoogle Scholar
  21. 21.
    Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Gafni, E., Merritt, M., Taubenfeld, G.: The concurrency hierarchy, and algorithms for unbounded concurrency Proceedings 20th ACM Symp. on Principles of Distributed Computing, pp 161–169 (2001)Google Scholar
  23. 23.
    Herlihy, M.P.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)CrossRefGoogle Scholar
  24. 24.
    Herlihy, M.P., Shavit, N.: The topological structure of asynchronous computability. J. ACM 46(6), 858–923 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. 12(3), 463–492 (1990)CrossRefGoogle Scholar
  26. 26.
    Imbs, D., Raynal, M., Taubenfeld, G.: On asymmetric progress conditions Proceedings 29th ACM Symp. on Principles of Distributed Computing, pp 55–64 (2010)Google Scholar
  27. 27.
    Inoue, M., Umetani, S., Masuzawa, T., Fujiwara, H.: Adaptive long-lived O(k 2)-renaming with O(k 2) steps 15th international symposium on distributed computing (2001)Google Scholar
  28. 28.
    Kushilevitz, E., Rabin, M.O.: Randomized mutual exclusion algorithms revisited Proceedings 11th ACM Symp. on Principles of Distributed Computing, pp 275–283 (1992)Google Scholar
  29. 29.
    Kuznetsov, P.: Understanding non-uniform failure models. Distributed computing column of the Bulletin of the European Association for Theoretical Computer Science (BEATCS) 106, 54–77 (2012)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  31. 31.
    Loui, M.C., Abu-Amara, H.: Memory requirements for agreement among unreliable asynchronous processes. Adv. Compet. Res. 4, 163–183 (1987)MathSciNetGoogle Scholar
  32. 32.
    Moir, M., Anderson, J.H.: Wait-free algorithms for fast, long-lived renaming. Sci. Comput. Program. 25(1), 1–39 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Moran, S., Wolfstahl, Y.: Extended impossibility results for asynchronous complete networks. Inf. Process. Lett. 26(3), 145–151 (1987)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Peterson, G.L.: New Bounds on Mutual Exclusion Problems. Technical Report TR68, University of Rochester, February 1980 (1994)Google Scholar
  36. 36.
    Raynal, M.: Algorithms for Mutual Exclusion The MIT Press, 1986. Translation of Algorithmique du parallélisme (1984)Google Scholar
  37. 37.
    Saks, M., Zaharoglou, F.: Wait-free k-set agreement is impossible: The topology of public knowledge. SIAM J. Comput. 29 (2000)Google Scholar
  38. 38.
    Styer, E., Peterson, G.L.: Tight bounds for shared memory symmetric mutual exclusion problems Proceedings 8th ACM Symp. on Principles of Distributed Computing, pp 177–191 (1989)Google Scholar
  39. 39.
    Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Pearson / Prentice-Hall, 2006. ISBN 0-131-97259-6, 423 pagesGoogle Scholar
  40. 40.
    Taubenfeld, G.: The computational structure of progress conditions 24th international symposium on distributed computing (DISC 2010), September 2010. LNCS 6343, vol. 2010, pp 221–235. Springer VerlagGoogle Scholar
  41. 41.
    Taubenfeld, G.: Brief Announcement: Computing in the Presence of Weak Crash Failures Proceedings 35Th ACM Symp. on Principles of Distributed Computing (PODC ’16), pp 349–351 (2016)CrossRefGoogle Scholar
  42. 42.
    Taubenfeld, G., Moran, S.: Possibility and impossibility results in a shared memory environment. Acta Informatica 33(1), 1–20 (1996)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.The Interdisciplinary CenterHerzliyaIsrael

Personalised recommendations