Distributed Computing

, Volume 32, Issue 6, pp 535–564 | Cite as

Recoverable mutual exclusion

  • Wojciech GolabEmail author
  • Aditya Ramaraju


Mutex locks have traditionally been the most common mechanism for protecting shared data structures in concurrent programs. However, the robustness of such locks against process failures has not been studied thoroughly. The vast majority of mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their efficiency. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of efficiency, which we define in terms of remote memory references.


Mutual exclusion Fault tolerance Recovery Concurrency Synchronization Shared memory Non-volatile main memory Multi-core algorithms Durable data structures 



Sincere thanks to Peter Buhr, Patrick Lam, and the anonymous referees of PODC’16 and Distributed Computing for detailed feedback and helpful suggestions on earlier drafts of this work. We are grateful also to Vassos Hadzilacos, Danny Hendler, Prasad Jayanti, Gadi Taubenfeld, and Sam Toueg for stimulating technical discussions.


  1. 1.
    Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231–1274 (1995)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Anderson, J., Kim, Y.-J.: A new fast-path mechanism for mutual exclusion. Distrib. Comput. 14(1), 17–29 (2001)CrossRefGoogle Scholar
  3. 3.
    Anderson, J., Kim, Y.-J.: An improved lower bound for the time complexity of mutual exclusion. Distrib. Comput. 15(4), 221–253 (2002)CrossRefGoogle Scholar
  4. 4.
    Anderson, J., Kim, Y.-J., Herman, T.: Shared-memory mutual exclusion: major research trends since 1986. Distrib. Comput. 16(2–3), 75–110 (2003)CrossRefGoogle Scholar
  5. 5.
    Anderson, T.: The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1(1), 6–16 (1990)CrossRefGoogle Scholar
  6. 6.
    Attiya, H., Hendler, D., Woelfel, P.: Tight RMR lower bounds for mutual exclusion and other problems. In: Proceedings of the 40th ACM symposium on theory of computing (STOC), pp. 217–226 (2008)Google Scholar
  7. 7.
    Bender, M.A., Gilbert, S.: Mutual exclusion with \(O(\log ^{2}\log n)\) amortized work. In: Proceedings of the 52nd symposium on foundations of computer science (FOCS), pp. 728–737 (2011)Google Scholar
  8. 8.
    Bohannon, P., Lieuwen, D.F., Silberschatz, A.: Recovering scalable spin locks. In: Proceedings of the 8th IEEE symposium on parallel and distributed processing (SPDP), pp. 314–322 (1996)Google Scholar
  9. 9.
    Bohannon, P., Lieuwen, D.F., Silberschatz, A., Sudarshan, S., Gava, J.: Recoverable user-level mutual exclusion. In: Proceedings of the 7th IEEE symposium on parallel and distributed processing (SPDP), pp. 293–301 (1995)Google Scholar
  10. 10.
    Burns, J.E., Lynch, N.A.: Bounds on shared memory for mutual exclusion. Inf. Comput. 107(2), 171–184 (1993)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Cypher, R.: The communication requirements of mutual exclusion. In: Proceedings of the 7th ACM symposium on parallel algorithms and architectures (SPAA), pp. 147–156 (1995)Google Scholar
  12. 12.
    Dijkstra, E.W.: Solution of a problem in concurrent programming control. Commun. ACM 8(9), 569 (1965)CrossRefGoogle Scholar
  13. 13.
    Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643–644 (1974)CrossRefGoogle Scholar
  14. 14.
    Fan, R., Lynch, N.: An \(\Omega (n \log n)\) lower bound on the cost of mutual exclusion. In: Proceedings of the 25th ACM symposium on principles of distributed computing (PODC), pp. 275–284 (2006)Google Scholar
  15. 15.
    Giakkoupis, G., Woelfel, P.: Randomized mutual exclusion with constant amortized RMR complexity on the DSM. In: Proceedings of the 55th symposium on foundations of computer science (FOCS), pp. 504–513 (2014)Google Scholar
  16. 16.
    Gibbons, P.B.: How emerging memory technologies will have you rethinking algorithm design. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), p. 303 (2016)Google Scholar
  17. 17.
    Golab, W., Hadzilacos, V., Hendler, D., Woelfel, P.: RMR-efficient implementations of comparison primitives using read and write operations. Distrib. Comput. 25(2), 109–162 (2012)CrossRefGoogle Scholar
  18. 18.
    Golab, W., Hendler, D.: Recoverable mutual exclusion in sub-logarithmic time. In: Proceedings of the 36th annual ACM symposium on principles of distributed computing (PODC), pp. 211–220 (2017)Google Scholar
  19. 19.
    Golab, W., Hendler, D.: Recoverable mutual exclusion under system-wide failures. In: Proceedings of the 37th annual ACM symposium on principles of distributed computing (PODC), pp. 17–26 (2018)Google Scholar
  20. 20.
    Golab, W., Ramaraju, A.: Recoverable mutual exclusion. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), pp. 65–74 (2016)Google Scholar
  21. 21.
    Graunke, G., Thakkar, S.: Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23(6), 60–69 (1990)CrossRefGoogle Scholar
  22. 22.
    Gray, J., Reuter, A.: Transaction processing: concepts and techniques. Morgan Kaufmann, Burlington (1993)zbMATHGoogle Scholar
  23. 23.
    Hendler, D., Woelfel, P.: Randomized mutual exclusion with sub-logarithmic RMR-complexity. Distrib. Comput. 24(1), 3–19 (2011)CrossRefGoogle Scholar
  24. 24.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)CrossRefGoogle Scholar
  25. 25.
    Hoepman, J.-H., Papatriantafilou, M., Tsigas, P.: Self-stabilization of wait-free shared memory objects. In: Proceedings of the 9th international workshop on distributed algorithms (WDAG), pp. 273–287 (1995)Google Scholar
  26. 26.
  27. 27.
    Jayanti, P.: F-arrays: implementation and applications. In: Proceedings of the 21st annual ACM symposium on principles of distributed computing (PODC), pp. 270–279 (2002)Google Scholar
  28. 28.
    Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451–500 (1998)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Jayanti, P., Joshi, A.: Recoverable FCFS mutual exclusion with wait-free recovery. In: Proceedings of the 31st international symposium on distributed computing (DISC), pp. 30:1–30:15 (2017)Google Scholar
  30. 30.
    Jayanti, P., Jayanti, S., Joshi, A.: A recoverable Mutex algorithm with sub-logarithmic RMR on both CC and DSM. In: Proceedings of the 38th annual ACM symposium on principles of distributed computing (PODC), pp. 177–186 (2019)Google Scholar
  31. 31.
    Johnen, C., Higham, L.: Fault-tolerant implementations of regular registers by safe registers with applications to networks. In: Proceedings of 10th international conference of distributed computing and networking (ICDCN), pp. 337–348 (2009)Google Scholar
  32. 32.
    Kim, Y.-J., Anderson, J.H.: A space- and time-efficient local-spin spin lock. Inf. Process. Lett. 84(1), 47–55 (2002)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Kessels, J.: Arbitration without common modifiable variables. Acta Informatica 17, 135–141 (1982)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Lamport, L.: The mutual exclusion problem: part I—a theory of interprocess communication. J. ACM 33(2), 313–326 (1986)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Lamport, L.: The mutual exclusion problem: part II—statement and solutions. J. ACM 33(2), 327–348 (1986)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Lamport, L.: A fast mutual exclusion algorithm. ACM Trans. Comput. Syst. 5(1), 1–11 (1987)CrossRefGoogle Scholar
  38. 38.
    Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th international parallel processing symposium (IPPS), pp. 165–171 (1994)Google Scholar
  39. 39.
    Mellor-Crummey, J., Scott, M.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)CrossRefGoogle Scholar
  40. 40.
    Michael, M., Kim, Y.: Fault tolerant mutual exclusion locks for shared memory systems. US Patent (2009)Google Scholar
  41. 41.
    Mittal, S., Vetter, J.S.: A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans. Parallel Distrib. Syst. 27(5), 1537–1550 (2016)CrossRefGoogle Scholar
  42. 42.
    Mogul, J.C., Argollo, E., Shah, M.A., Faraboschi, P.: Operating system support for NVM + DRAM hybrid main memory. In: Proceedings of the 12th workshop on hot topics in operating systems (HotOS) (2009)Google Scholar
  43. 43.
    Moscibroda, T., Oshman, R.: Resilience of mutual exclusion algorithms to transient memory faults. In: Proceedings of the 30th ACM symposium on principles of distributed computing (PODC), pp. 69–78 (2011)Google Scholar
  44. 44.
    Narayanan, D., Hodson, O.: Whole-system persistence. In: Proceedings of the 17th international conference on architectural support for programming languages and operating systems (ASPLOS), pp. 401–410 (2012)Google Scholar
  45. 45.
    Ramaraju, A.: RGLock: Recoverable mutual exclusion for non-volatile main memory systems. Master’s thesis, University of Waterloo (2015). Accessed 31 Oct 2019
  46. 46.
    Raynal, M.: Algorithms for Mutual Exclusion. MIT Press, Cambridge (1986)zbMATHGoogle Scholar
  47. 47.
    Scott, M., Scherer, W.: Scalable queue-based spin locks with timeout. In: Proceedings of the 8th ACM SIGPLAN symposium on principles and practices of parallel programming (PPoPP), pp. 44–52 (2001)Google Scholar
  48. 48.
    Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, Upper Saddle (2006)Google Scholar
  49. 49.
    Yang, J.-H., Anderson, J.: A fast, scalable mutual exclusion algorithm. Distrib. Comput. 9(1), 51–60 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations