Abstract
Mutex locks have traditionally been the most common mechanism for protecting shared data structures in concurrent programs. However, the robustness of such locks against process failures has not been studied thoroughly. The vast majority of mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their efficiency. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of efficiency, which we define in terms of remote memory references.
Similar content being viewed by others
Notes
The term bounded in reference to a piece of code means that there exists a function f of the number of processes N such that the code performs at most f(N) shared memory operations in all executions of the algorithm instantiated for N processes.
As explained later on in the model near the discussion of First-Come-First-Served fairness, we assume that the doorway is well-defined and bounded only in a subset of execution histories that are relevant to our weaker notion of FCFS.
In a practical implementation, the code of and can be packaged in a single procedure for simplicity.
The term cleanup-concurrent defined in the conference version of this paper [20] is analogous to 1-failure-concurrent in this model.
The Bounded Recovery property defined in the conference version of this paper [20] is analogous to 1-BR in this model.
Despite the prevalence of cache-coherent architectures, the DSM model remains important in practice because of its inherent scalability. Intel’s Single-chip Cloud Computer, for example, sacrifices cache-coherence “to simplify the design, reduce power consumption and to encourage the exploration of datacenter distributed memory software models” [26].
The RMR complexity of is unbounded if F does not exist for a given history H.
The “\(\wedge \)” operator at line 94 should be interpreted like&& in C++, meaning that the right operand is evaluated only if the left operand is true.
References
Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231–1274 (1995)
Anderson, J., Kim, Y.-J.: A new fast-path mechanism for mutual exclusion. Distrib. Comput. 14(1), 17–29 (2001)
Anderson, J., Kim, Y.-J.: An improved lower bound for the time complexity of mutual exclusion. Distrib. Comput. 15(4), 221–253 (2002)
Anderson, J., Kim, Y.-J., Herman, T.: Shared-memory mutual exclusion: major research trends since 1986. Distrib. Comput. 16(2–3), 75–110 (2003)
Anderson, T.: The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1(1), 6–16 (1990)
Attiya, H., Hendler, D., Woelfel, P.: Tight RMR lower bounds for mutual exclusion and other problems. In: Proceedings of the 40th ACM symposium on theory of computing (STOC), pp. 217–226 (2008)
Bender, M.A., Gilbert, S.: Mutual exclusion with \(O(\log ^{2}\log n)\) amortized work. In: Proceedings of the 52nd symposium on foundations of computer science (FOCS), pp. 728–737 (2011)
Bohannon, P., Lieuwen, D.F., Silberschatz, A.: Recovering scalable spin locks. In: Proceedings of the 8th IEEE symposium on parallel and distributed processing (SPDP), pp. 314–322 (1996)
Bohannon, P., Lieuwen, D.F., Silberschatz, A., Sudarshan, S., Gava, J.: Recoverable user-level mutual exclusion. In: Proceedings of the 7th IEEE symposium on parallel and distributed processing (SPDP), pp. 293–301 (1995)
Burns, J.E., Lynch, N.A.: Bounds on shared memory for mutual exclusion. Inf. Comput. 107(2), 171–184 (1993)
Cypher, R.: The communication requirements of mutual exclusion. In: Proceedings of the 7th ACM symposium on parallel algorithms and architectures (SPAA), pp. 147–156 (1995)
Dijkstra, E.W.: Solution of a problem in concurrent programming control. Commun. ACM 8(9), 569 (1965)
Dijkstra, E.W.: Self-stabilizing systems in spite of distributed control. Commun. ACM 17(11), 643–644 (1974)
Fan, R., Lynch, N.: An \(\Omega (n \log n)\) lower bound on the cost of mutual exclusion. In: Proceedings of the 25th ACM symposium on principles of distributed computing (PODC), pp. 275–284 (2006)
Giakkoupis, G., Woelfel, P.: Randomized mutual exclusion with constant amortized RMR complexity on the DSM. In: Proceedings of the 55th symposium on foundations of computer science (FOCS), pp. 504–513 (2014)
Gibbons, P.B.: How emerging memory technologies will have you rethinking algorithm design. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), p. 303 (2016)
Golab, W., Hadzilacos, V., Hendler, D., Woelfel, P.: RMR-efficient implementations of comparison primitives using read and write operations. Distrib. Comput. 25(2), 109–162 (2012)
Golab, W., Hendler, D.: Recoverable mutual exclusion in sub-logarithmic time. In: Proceedings of the 36th annual ACM symposium on principles of distributed computing (PODC), pp. 211–220 (2017)
Golab, W., Hendler, D.: Recoverable mutual exclusion under system-wide failures. In: Proceedings of the 37th annual ACM symposium on principles of distributed computing (PODC), pp. 17–26 (2018)
Golab, W., Ramaraju, A.: Recoverable mutual exclusion. In: Proceedings of the 35th ACM symposium on principles of distributed computing (PODC), pp. 65–74 (2016)
Graunke, G., Thakkar, S.: Synchronization algorithms for shared-memory multiprocessors. IEEE Comput. 23(6), 60–69 (1990)
Gray, J., Reuter, A.: Transaction processing: concepts and techniques. Morgan Kaufmann, Burlington (1993)
Hendler, D., Woelfel, P.: Randomized mutual exclusion with sub-logarithmic RMR-complexity. Distrib. Comput. 24(1), 3–19 (2011)
Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Lang. Syst. 13(1), 124–149 (1991)
Hoepman, J.-H., Papatriantafilou, M., Tsigas, P.: Self-stabilization of wait-free shared memory objects. In: Proceedings of the 9th international workshop on distributed algorithms (WDAG), pp. 273–287 (1995)
Intel Corporation. Single-chip cloud computer. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-single-chip-cloud-overview-paper.pdf. Accessed 31 Oct 2019
Jayanti, P.: F-arrays: implementation and applications. In: Proceedings of the 21st annual ACM symposium on principles of distributed computing (PODC), pp. 270–279 (2002)
Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451–500 (1998)
Jayanti, P., Joshi, A.: Recoverable FCFS mutual exclusion with wait-free recovery. In: Proceedings of the 31st international symposium on distributed computing (DISC), pp. 30:1–30:15 (2017)
Jayanti, P., Jayanti, S., Joshi, A.: A recoverable Mutex algorithm with sub-logarithmic RMR on both CC and DSM. In: Proceedings of the 38th annual ACM symposium on principles of distributed computing (PODC), pp. 177–186 (2019)
Johnen, C., Higham, L.: Fault-tolerant implementations of regular registers by safe registers with applications to networks. In: Proceedings of 10th international conference of distributed computing and networking (ICDCN), pp. 337–348 (2009)
Kim, Y.-J., Anderson, J.H.: A space- and time-efficient local-spin spin lock. Inf. Process. Lett. 84(1), 47–55 (2002)
Kessels, J.: Arbitration without common modifiable variables. Acta Informatica 17, 135–141 (1982)
Lamport, L.: A new solution of Dijkstra’s concurrent programming problem. Commun. ACM 17(8), 453–455 (1974)
Lamport, L.: The mutual exclusion problem: part I—a theory of interprocess communication. J. ACM 33(2), 313–326 (1986)
Lamport, L.: The mutual exclusion problem: part II—statement and solutions. J. ACM 33(2), 327–348 (1986)
Lamport, L.: A fast mutual exclusion algorithm. ACM Trans. Comput. Syst. 5(1), 1–11 (1987)
Magnusson, P., Landin, A., Hagersten, E.: Queue locks on cache coherent multiprocessors. In: Proceedings of the 8th international parallel processing symposium (IPPS), pp. 165–171 (1994)
Mellor-Crummey, J., Scott, M.: Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9(1), 21–65 (1991)
Michael, M., Kim, Y.: Fault tolerant mutual exclusion locks for shared memory systems. US Patent (2009)
Mittal, S., Vetter, J.S.: A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans. Parallel Distrib. Syst. 27(5), 1537–1550 (2016)
Mogul, J.C., Argollo, E., Shah, M.A., Faraboschi, P.: Operating system support for NVM + DRAM hybrid main memory. In: Proceedings of the 12th workshop on hot topics in operating systems (HotOS) (2009)
Moscibroda, T., Oshman, R.: Resilience of mutual exclusion algorithms to transient memory faults. In: Proceedings of the 30th ACM symposium on principles of distributed computing (PODC), pp. 69–78 (2011)
Narayanan, D., Hodson, O.: Whole-system persistence. In: Proceedings of the 17th international conference on architectural support for programming languages and operating systems (ASPLOS), pp. 401–410 (2012)
Ramaraju, A.: RGLock: Recoverable mutual exclusion for non-volatile main memory systems. Master’s thesis, University of Waterloo (2015). https://uwspace.uwaterloo.ca/handle/10012/9473. Accessed 31 Oct 2019
Raynal, M.: Algorithms for Mutual Exclusion. MIT Press, Cambridge (1986)
Scott, M., Scherer, W.: Scalable queue-based spin locks with timeout. In: Proceedings of the 8th ACM SIGPLAN symposium on principles and practices of parallel programming (PPoPP), pp. 44–52 (2001)
Taubenfeld, G.: Synchronization Algorithms and Concurrent Programming. Prentice Hall, Upper Saddle (2006)
Yang, J.-H., Anderson, J.: A fast, scalable mutual exclusion algorithm. Distrib. Comput. 9(1), 51–60 (1995)
Acknowledgements
Sincere thanks to Peter Buhr, Patrick Lam, and the anonymous referees of PODC’16 and Distributed Computing for detailed feedback and helpful suggestions on earlier drafts of this work. We are grateful also to Vassos Hadzilacos, Danny Hendler, Prasad Jayanti, Gadi Taubenfeld, and Sam Toueg for stimulating technical discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research is supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada, Discovery Grants Program; the Ontario Early Researcher Awards Program; and the Google Faculty Research Awards Program.
Rights and permissions
About this article
Cite this article
Golab, W., Ramaraju, A. Recoverable mutual exclusion. Distrib. Comput. 32, 535–564 (2019). https://doi.org/10.1007/s00446-019-00364-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-019-00364-0