Advertisement

Distributed Computing

, Volume 23, Issue 4, pp 225–272 | Cite as

Rambo: a robust, reconfigurable atomic memory service for dynamic networks

  • Seth Gilbert
  • Nancy A. Lynch
  • Alexander A. Shvartsman
Article

Abstract

In this paper, we present Rambo, an algorithm for emulating a read/write distributed shared memory in a dynamic, rapidly changing environment. Rambo provides a highly reliable, highly available service, even as participants join, leave, and fail. In fact, the entire set of participants may change during an execution, as the initial devices depart and are replaced by a new set of devices. Even so, Rambo ensures that data stored in the distributed shared memory remains available and consistent. There are two basic techniques used by Rambo to tolerate dynamic changes. Over short intervals of time, replication suffices to provide fault-tolerance. While some devices may fail and leave, the data remains available at other replicas. Over longer intervals of time, Rambo copes with changing participants via reconfiguration, which incorporates newly joined devices while excluding devices that have departed or failed. The main novelty of Rambo lies in the combination of an efficient reconfiguration mechanism with a quorum-based replication strategy for read/write shared memory. The Rambo algorithm can tolerate a wide variety of aberrant behavior, including lost and delayed messages, participants with unsynchronized clocks, and, more generally, arbitrary asynchrony. Despite such behavior, Rambo guarantees that its data is stored consistency. We analyze the performance of Rambo during periods when the system is relatively well-behaved: messages are delivered in a timely fashion, reconfiguration is not too frequent, etc. We show that in these circumstances, read and write operations are efficient, completing in at most eight message delays.

Keywords

Dynamic distributed systems Atomic register Distributed shared memory Fault-tolerance Reconfigurable Eventual synchrony 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abraham I., Malkhi D.: Probabilistic quorums for dynamic systems. Distrib. Comput. 18(2), 113–124 (2005)CrossRefGoogle Scholar
  2. 2.
    Agrawal, D., El Abbadi, A.: Resilient logical structures for efficient management of replicated data. In: Proceedings of the International Conference on Very Large Data Bases, pp. 151–162 (1992)Google Scholar
  3. 3.
    Aguilera, M.K., Keidar, I., Malkhi, D., Shraer, A.: Dynamic atomic storage without consensus. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 17–25 (2009)Google Scholar
  4. 4.
    Albrecht, J.R., Saito, Y.: Rambo for Dummies. Technical Report HPL-2005-39, Hewlett-Packard (2005)Google Scholar
  5. 5.
    Alvisi L., Malkhi D., Pierce E.T., Reiter M.K.: Fault detection for Byzantine quorum systems. Trans. Parallel Distrib. Syst. 12(9), 996–1007 (2001)CrossRefGoogle Scholar
  6. 6.
    Amir, Y., Dolev, D., Melliar-Smith, P.M., Moser, L.: Robust and Efficient Replication Using Group Communication. Technical Report 1994-20, Hebrew University (1994)Google Scholar
  7. 7.
    Amir, Y., Wool, A.: Evaluating quorum systems over the internet. In: Proceedings of the International Symposium on Fault-Tolerant Computing, pp. 26–35 (1996)Google Scholar
  8. 8.
    Attiya H., Bar-Noy A., Dolev D.: Sharing memory robustly in message-passing systems. J. ACM 42(1), 124–142 (1995)zbMATHCrossRefGoogle Scholar
  9. 9.
    Beal, J., Gilbert, S.: RamboNodes for the metropolitan ad hoc network. In: Workshop on Dependability Issues in Wireless Ad Hoc Networks and Sensor Networks (2004)Google Scholar
  10. 10.
    Bearden, M., Bianchini, R.P., Jr.: A fault-tolerant algorithm for decentralized on-line quorum adaptation. In: Proceedings of the International Symposium on Fault-Tolerant Computing Systems, pp. 262–271 (1998)Google Scholar
  11. 11.
    Bernstein P.A., Hadzilacos V., Goodman N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)Google Scholar
  12. 12.
    Chandra T.D., Toueg S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Charron-Bost, B., Schiper, A.: Improving fast Paxos: being optimistic with no overhead. In: Proceedings of the Pacific Rim International Symposium on Dependable Computing, pp. 287–295 (2006)Google Scholar
  14. 14.
    Chockler, G., Gilbert, S., Gramoli, V., Musial, P.M., Shvartsman, A.A.: Reconfigurable distributed storage for dynamic networks. In: Proceedings of the International Conference on Principles of Distributed Systems, pp. 214–219 (2005)Google Scholar
  15. 15.
    Davidson S.B., Garcia-Molina H., Skeen D.: Consistency in partitioned networks. ACM Comput. Surv. 17(3), 341–370 (1985)CrossRefGoogle Scholar
  16. 16.
    Dolev S., Gilbert S., Lynch N.A., Shvartsman A.A., Welch J.L.: Geoquorums: implementing atomic memory in mobile ad hoc networks. Distrib. Comput. 18(2), 125–155 (2005)CrossRefGoogle Scholar
  17. 17.
    El Abbadi, A., Skeen, D., Cristian, F.: An efficient fault-tolerant protocol for replicated data management. In: Proceedings of the Symposium on Principles of Databases, pp. 215–228 (1985)Google Scholar
  18. 18.
    El Abbadi A., Toueg S.: Maintaining availability in partitioned replicated databases. Trans. Database Syst. 14(2), 264–290 (1989)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Englert, B., Shvartsman, A.A.: Graceful quorum reconfiguration in a robust emulation of shared memory. In: Proceedings of the International Conference on Distributed Computer Systems, pp. 454–463 (2000)Google Scholar
  20. 20.
    Fekete A., Lynch N.A., Shvartsman A.A.: Specifying and using a partitionable group communication service. Trans. Comput. Syst. 19(2), 171–216 (2001)CrossRefGoogle Scholar
  21. 21.
    Fischer M.J., Lynch N.A., Paterson M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374–382 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Garcia-Molina H., Barbara D.: How to assign votes in a distributed system. J. ACM 32(4), 841–860 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Georgiou, C., Musial, P.M., Shvartsman, A.A.: Developing a consistent domain-oriented distributed object service. In: Proceedings of the International Symposium on Network Computing and Applications, pp. 149–158 (2005)Google Scholar
  24. 24.
    Georgiou C., Musial P.M., Shvartsman A.A.: Long-lived Rambo: Trading knowledge for communication. Theor. Comput. Sci. 383(1), 59–85 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of the Symposium on Operating Systems Principles, pp. 150–162 (1979)Google Scholar
  26. 26.
    Gilbert, S.: Rambo II: Rapidly Reconfigurable Atomic Memory for Dynamic Networks. Master’s thesis, MIT (2003)Google Scholar
  27. 27.
    Gilbert, S., Lynch, N.A., Shvartsman, A.A.: Rambo II: Rapidly reconfigurable atomic memory for dynamic networks. In: Proceedings of the International Conference on Dependable Systems and Networks, pp. 259–268 (2003)Google Scholar
  28. 28.
    Goldman K., Lynch N.A.: Quorum consensus in nested transaction systems. Trans. Database Syst. 19(4), 537–585 (1994)CrossRefGoogle Scholar
  29. 29.
    Gramoli, V.: Rambo III: Speeding-up the Reconfiguration of an Atomic Memory Service in Dynamic Distributed System. Master’s thesis, Université Paris Sud, Orsay (2004)Google Scholar
  30. 30.
    Gramoli, V., Musial, P.M., Shvartsman, A.A.: Operation liveness and gossip management in a dynamic distributed atomic data service. In: Proceedings of the International Conference on Parallel and Distributed Computing Systems, pp. 206–211 (2005)Google Scholar
  31. 31.
    Herlihy, M.: Replication Methods for Abstract Data Types. PhD thesis, Massachusettes Institute of Technology (1984)Google Scholar
  32. 32.
    Herlihy M.: Dynamic quorum adjustment for partitioned data. Trans. Database Syst. 12(2), 170–194 (1987)CrossRefGoogle Scholar
  33. 33.
    Jajodia S., Mutchler D.: Dynamic voting algorithms for maintaining the consistency of a replicated database. Trans. Database Syst. 15(2), 230–280 (1990)CrossRefGoogle Scholar
  34. 34.
    Kaynar, D.K., Lynch, N.A., Segala, R., Vaandrager, F.: The Theory of Timed I/O Automata. Technical Report MIT-LCS-TR-917a, MIT (2004)Google Scholar
  35. 35.
    Keidar, I.: A highly Available Paradigm for Consistent Object Replication. Master’s thesis, Hebrew University, Jerusalem (1994)Google Scholar
  36. 36.
    Keidar, I., Dolev, D.: Efficient message ordering in dynamic networks. In: Proceedings of the Symposium on Principles of Distributed Domputing, pp. 68–76 (1996)Google Scholar
  37. 37.
    Konwar, K.M., Musial, P.M., Nicolaou, N.C., Shvartsman, A.A.: Implementing atomic data through indirect learning in dynamic networks. In: Proceedings of the International Symposium on Network Computing and Applications, pp. 223–230 (2007)Google Scholar
  38. 38.
    Lamport L.: Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21(7), 558–565 (1978)zbMATHCrossRefGoogle Scholar
  39. 39.
    Lamport L.: The part-time parliament. Trans. Comput. Syst. 16(2), 133–169 (1998)CrossRefGoogle Scholar
  40. 40.
    Lamport, L.: Fast Paxos. Technical Report MSR-TR-2005-12, Microsoft (2005)Google Scholar
  41. 41.
    Lamport L.: Fast Paxos. Distrib. Comput. 19(2), 79–103 (2006)CrossRefMathSciNetGoogle Scholar
  42. 42.
    Liu, M., Agrawal, D., El Abaddi, A.: On the implementation of the quorum consensus protocol. In: Proceedings of the International Conference on Parallel and Distributed Computing Systems, pp. 318–325 (1995)Google Scholar
  43. 43.
    Lotem, E.Y., Keidar, I., Dolev, D.: Dynamic voting for consistent primary components. In: Proceedings of the Symposium on Principles of Distributed Computing pp. 63–71 (1997)Google Scholar
  44. 44.
    Lynch N.A.: Distributed Algorithms. Morgan Kaufman, San Francisco (1996)zbMATHGoogle Scholar
  45. 45.
    Lynch, N.A., Shvartsman, A.A.: Robust emulation of shared memory using dynamic quorum-acknowledged broadcasts. In: Proceedings of the International Symposium on Fault-Tolerant Computing, pp. 272–281 (1997)Google Scholar
  46. 46.
    Lynch, N.A., Shvartsman, A.A.: Rambo: A reconfigurable atomic memory service for dynamic networks. In: Proceedings of the International Symposium on Distributed Computing, pp. 173–190 (2002)Google Scholar
  47. 47.
    Malkhi, D., Reiter, M.K.: Byzantine quorum systems. In: Proceedings of the Symposium on Theory of Computing, pp. 569–578 (1997)Google Scholar
  48. 48.
    Musial, P.M.: From High Level Specification to Executable Code: Specification, Refinement, and Implementation of a Survivable and Consistent Data Service for Dynamic Networks. PhD thesis, University of Connecticut, Storrs (2007)Google Scholar
  49. 49.
    Musial, P.M., Shvartsman, A.A.: Implementing a reconfigurable atomic memory service for dynamic networks. In: Proceedings of the International Parallel and Distributed Processing Symposium, p. 208b (2004)Google Scholar
  50. 50.
    Muthitacharoen, A., Gilbert, S., Morris, R.: Etna: A Fault-Tolerant Algorithm for Atomic Mutable DHT Data. Technical Report MIT-LCS-TR-993, MIT (2005)Google Scholar
  51. 51.
    Naor, M., Wieder, U.: Scalable and dynamic quorum systems. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 114–122 (2003)Google Scholar
  52. 52.
    Naor M., Wool A.: The load, capacity, and availability of quorum systems. J. Comput. 27(2), 423–447 (1998)zbMATHMathSciNetGoogle Scholar
  53. 53.
    Peleg D., Wool A.: The availability of quorum systems. Inf. Comput. 123(2), 210–223 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  54. 54.
    Peleg, D., Wool, A.: How to be an efficient snoop, or the probe complexity of quorum systems. In: Proceedings of the Symposium on Principles of Distributed Computing, pp. 290–299 (1996)Google Scholar
  55. 55.
    De Prisco, R., Fekete, A., Lynch, N.A., Shvartsman, A.A.: A dynamic primary configuration group communication service. In: Proceedings of the International Symposium on Distributed Computing, pp. 64–78 (1999)Google Scholar
  56. 56.
    De Priso R., Lampson B., Lynch N.: Revisiting the Paxos algorithm. Theor. Comput. Sci. 243(1–2), 35–91 (2000)CrossRefGoogle Scholar
  57. 57.
    Rangarajan, S., Tripathi, S.: A robust distributed mutual exclusion algorithm. In: Proceedings of the International Workshop on Distributed Algorithms, pp. 295–308 (1991)Google Scholar
  58. 58.
    Saito, Y., Frølund, S., Veitch, A.C., Merchant, A., Spence, S.: FAB: building distributed enterprise disk arrays from commodity components. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 48–58 (2004)Google Scholar
  59. 59.
    Sanders B.A.: The information structure of distributed mutual exclusion algorithms. Trans. Comput. Syst. 5(3), 284–299 (1987)CrossRefGoogle Scholar
  60. 60.
    Shraer, A., Martin, J.-P., Malkhi, D., Keidar, I.: Data-centric reconfiguration with network attached disks. In: Proceedings of LADIS (2010)Google Scholar
  61. 61.
    Upfal E., Wigderson A.: How to share memory in a distributed system. J. ACM 34(1), 116–127 (1987)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Seth Gilbert
    • 1
  • Nancy A. Lynch
    • 2
  • Alexander A. Shvartsman
    • 3
  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.MITCambridgeUSA
  3. 3.University of ConnecticutStorrsUSA

Personalised recommendations