Skip to main content

Part of the book series: Texts in Computer Science ((TCS))

  • 3187 Accesses

Abstract

The context established by Chap. 10 provides conceptual tools to develop an automated and highly dynamic membership tracking service, in which membership of a system varies as service instances are launched and join an active system, shut down and must leave it, or crash. We solve the problem in steps, first showing how a system can track its own membership, and then showing how the resulting group membership system (GMS) can be used to support the creation of protocols that, themselves, can largely ignore dynamic membership.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Readers interested in learning more about that first step might want to look at Shlomi Dolev’s work on self-stabilization (Dolev 2000). A self-stabilizing bootstrap algorithm would work as follows. During periods when applications can find an active membership service, they would do so. But if an application is launched and cannot find the service, it would run Dolev’s self-stabilizing leader election protocol. That protocol can automatically handle various numbers of concurrent processes and is guaranteed to eventually converge to a state in which a single leader has been picked and every process knows who the leader is (the eventual convergence does require that any severe churn that may be happening settle down). At any rate, once this bootstrap step selects a leader, one would delay for a sufficient amount of time to have reasonable confidence that two leaders have not been picked (Dolev’s theory lets us calculate the needed delay). Finally, the leader could boot our dynamic membership protocol. Non-leaders, in contrast, simply wait until the leader is running the membership protocol, at which point they join the now-running system. In effect, we would run the self-stabilization protocol only while no copies of the membership service are active. Thus, the normal case becomes one in which the membership service is running, and some processes either wish to join, or are terminating deliberately, or seem to have failed (other processes are reporting timeouts). We would revert to the self-stabilization approach again if all copies of the membership service crash, but otherwise will not use it again.

References

  • Abraham, I., Malkhi, D.: Probabilistic quorums for dynamic systems. In: The 17th International Symposium on Distributed Computing (DISC 2003), Sorento, Italy, October 2003

    Google Scholar 

  • Agarwal, D.A.: Totem: A reliable ordered delivery protocol for interconnected local area networks. Ph.D. diss., Department of Electrical and Computer Engineering, University of California, Santa Barbara (1994)

    Google Scholar 

  • Alvisi, L., Malkhi, D., Pierce, E., Reiter, M.: Fault detection for byzantine quorum systems. IEEE Trans. Parallel Distrib. Syst. 12(9), 996–1007 (2001b)

    Article  Google Scholar 

  • Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Membership algorithms in broadcast domains. In: Proceedings of the Sixth WDAG, Israel, June 1992. Lecture Notes in Computer Science, vol. 647, pp. 292–312. Springer, Berlin (1992a)

    Google Scholar 

  • Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Transis: A communication subsystem for high availability. In: Proceedings of the Twenty-Second Symposium on Fault-Tolerant Computing Systems, Boston, July 1992, pp. 76–84. IEEE Computer Society Press, New York (1992b)

    Google Scholar 

  • Anceaume, E., Charron-Bost, B., Minet, P., Toueg, S.: On the formal specification of group membership services. Technical Report 95-1534, Department of Computer Science, Cornell University, August (1995)

    Google Scholar 

  • Babaoglu, O., Davoli, R., Giachini, L.A., Baker, M.B.: RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project Deliverable Report, Department of Computing Science, University of Newcastle upon Tyne, United Kingdom (1994)

    Google Scholar 

  • Babaoglu, O., Davoli, R., Montresor, A.: Failure detectors, group membership, and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-19, Department of Computer Science, University of Bologna, November (1995)

    Google Scholar 

  • Babaoglu, O., Bartoli, A., Dini, G.: Enriched view synchrony: A paradigm for programming dependable applications in partitionable asynchronous distributed systems. Technical Report, Department of Computer Science, University of Bologna, May (1996)

    Google Scholar 

  • Ben-Or, M.: Fast asynchronous byzantine agreement. In: Proceedings of the Fourth ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985, pp. 149–151 (1985)

    Chapter  Google Scholar 

  • Birman, K.P., Glade, B.B.: Consistent failure reporting in reliable communications systems. IEEE Softw., Special Issue on Reliability (1995)

    Google Scholar 

  • Birman, K.P., Joseph, T.A.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, November 1987, pp. 123–138. ACM Press, New York (1987a)

    Chapter  Google Scholar 

  • Birman, K.P., Joseph, T.A.: Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5(1), 47–76 (1987b)

    Article  Google Scholar 

  • Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)

    Article  Google Scholar 

  • Castro, M., Druschel, P., Hu, Y.C., Rowstron, A.: Topology-aware routing in structured peer-to-peer overlay networks. In: Future Directions in Distributed Computing 2003, pp. 103–107. Springer, Berlin (2003a)

    Chapter  Google Scholar 

  • Castro, M., Rodrigues, R., Liskov, B.: BASE: Using abstraction to improve fault tolerance. ACM Trans. Comput. Syst. 21(3), 236–269 (2003b)

    Article  Google Scholar 

  • Chandra, T., Toueg, S.: Unreliable failure detectors for asynchronous systems. J. ACM (in press). Previous version in ACM Symposium on Principles of Distributed Computing (Montreal, 1991), pp. 325–340

    Google Scholar 

  • Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: ACM Symposium on Principles of Distributed Computing, August 1992, pp. 147–158 (1992)

    Google Scholar 

  • Chandra, T., Hadzilacos, V., Toueg, S., Charron-Bost, B.: On the impossibility of group membership. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, Vancouver, May 1996

    Google Scholar 

  • Chockler, G., Keidar, I., Vitenberg, R.: Group communication specifications: A comprehensive study. ACM Comput. Surv. 33(4), 1–43 (2001)

    Article  Google Scholar 

  • Coan, B., Thomas, G.: Agreeing on a leader in real time. In: Proceedings of the Eleventh Real-Time Systems Symposium, December 1990, pp. 166–172 (1990)

    Chapter  Google Scholar 

  • Coan, B., Oki, B.M., Kolodner, E.K.: Limitations on database availability when networks partition. In: Proceedings of the Fifth ACM Symposium on Principles of Distributed Computing, Calgary, August 1986, pp. 187–194 (1986)

    Chapter  Google Scholar 

  • Cristian, F.: Reaching agreement on processor group membership in synchronous distributed systems. Distrib. Comput. 4(4), 175–187 (1991a)

    Article  MATH  Google Scholar 

  • Cristian, F.: Synchronous and asynchronous group communication. Commun. ACM 39(4), 88–97 (1996)

    Article  Google Scholar 

  • Cristian, F., Schmuck, F.: Agreeing on process group membership in asynchronous distributed systems. Technical Report CSE95-428, Department of Computer Science and Engineering, University of California, San Diego (1995)

    Google Scholar 

  • Cristian, F., Aghili, H., Strong, R., Dolev, D.: Atomic broadcast: From simple message diffusion to byzantine agreement. In: Proceedings of the Fifteenth International Symposium on Fault-Tolerant Computing, pp. 200–206. IEEE Computer Society Press, New York (1985). Revised as IBM Technical Report RJ5244

    Google Scholar 

  • Cristian, F., Dolev, D., Strong, R., Aghili, H.: Atomic broadcast in a real-time environment. In: Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol. 448, pp. 51–71. Springer, Berlin (1990)

    Chapter  Google Scholar 

  • Dolev, S.: Self-stabilization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  • Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171–216 (2001)

    Article  Google Scholar 

  • Fisher, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: Proceedings of the Fourth Annual ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985. ACM Press, New York (1985a)

    Google Scholar 

  • Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed computing with one faulty process. J. ACM 32(2), 374–382 (1985b)

    Article  Google Scholar 

  • Friedman, R., Birman, K.P.: Using group communication technology to implement a reliable and scalable distributed IN coprocessor. In: TINA’96: The Convergence of Telecommunications and Distributed Computing Technologies, Heidelberg, September 1996, pp. 25–42. VDE-Verlag, Berlin (1996). Also Technical Report, Department of Computer Science, Cornell University, March (1996)

    Google Scholar 

  • Friedman, R., Keider, I., Malkhi, D., Birman, K.P., Dolev, D.: Deciding in partitionable networks. Technical Report 95-1554, Department of Computer Science, Cornell University, October (1995)

    Google Scholar 

  • Golding, R.A.: Weak consistency group communication and membership. Ph.D. diss., Computer and Information Sciences Department, University of California, Santa Cruz (1992)

    Google Scholar 

  • Guerraoui, R.: Revisiting the relationship between nonblocking atomic commitment and consensus. In: International Workshop on Distributed Algorithms, September 1995, pp. 87–100 (1995)

    Chapter  Google Scholar 

  • Guerraoui, R., Schiper, A.: Gamma-accurate failure detectors. Technical Report APFL, Lausanne, Switzerland: Départment d’Informatique (1996)

    Google Scholar 

  • Guerraoui, R., Knežević, N., Quéma, V., Vukolić, M.: The next 700 BFT protocols. In: Proceedings of EuroSys, Paris, France, April 2010, pp. 363–376 (2010)

    Google Scholar 

  • Haeberlen, A., Kuznetsov, P., Druschel, P.: PeerReview: Practical accountability for distributed systems. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP ’07), Stevenson, WA, October 2007

    Google Scholar 

  • Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: Speculative byzantine fault tolerance. ACM Trans. Comput. Syst. 27(4) (2009)

    Google Scholar 

  • Lynch, N.: Distributed Algorithms. Morgan Kaufmann, San Mateo (1996)

    MATH  Google Scholar 

  • Malkhi, D.: Multicast communication for high availability. Ph.D. diss., Hebrew University of Jerusalem (1994)

    Google Scholar 

  • Malkhi, D., Reiter, M.K.: Byzantine quorum systems. Distrib. Comput. 11(4), 203–213 (1998)

    Article  Google Scholar 

  • Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large distributed systems. IEEE Trans. Knowl. Data Eng. 12(2), 187–202 (2000)

    Article  Google Scholar 

  • Malkhi, D., Reiter, M., Wool, A., Wright, R.: Probabilistic quorum systems. Inf. Comput. J. 170(2) (2001a)

    Google Scholar 

  • Malkhi, D., Reiter, M.K., Tulone, D., Ziskind, E.: Persistent objects in the Fleet system. In: Proceedings of the 2nd DARPA Information Survivability Conference and Exposition (DISCEX II), June 2001, vol. II, pp. 126–136 (2001b)

    Chapter  Google Scholar 

  • Melliar-Smith, P.M., Moser, L.E., Agrawala, V.: Membership algorithms for asynchronous distributed systems. In: Proceedings of the IEEE Eleventh ICDCS, May 1991, pp. 480–488 (1991)

    Google Scholar 

  • Mishra, S., Peterson, L.L., Schlichting, R.D.: A membership protocol based on partial order. In: Proceedings of the IEEE International Working Conference on Dependable Computing for Critical Applications, February 1991, pp. 137–145 (1991)

    Google Scholar 

  • Moser, L.E., Amir, Y., Melliar-Smith, P.M., Agarwal, D.A.: Extended virtual synchrony. In: Proceedings of the Fourteenth International Conference on Distributed Computing Systems, June 1994, pp. 56–65. IEEE Computer Society Press, New York (1994a). Also Technical Report TR-93-22, Department of ECE, University of California, Santa Barbara, December (1993)

    Google Scholar 

  • Moser, L.E., Melliar-Smith, P.M., Agarwal, U.: Processor membership in asynchronous distributed systems. IEEE Trans. Parallel Distrib. Syst. 5(5), 459–473 (1994b)

    Article  Google Scholar 

  • Moser, L.E., Melliar-Smith, P.M., Agarwal, D.A., Budhia, R.K., Lingley-Papadopoulos, C.A.: Totem: A fault-tolerant multicast group communication system. Commun. ACM 39(4), 54–63 (1996)

    Article  Google Scholar 

  • Neiger, G.: A new look at membership services. In: Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, Vancouver (1996). In press

    Google Scholar 

  • Rabin, M.: Randomized Byzantine generals. In: Proceedings of the Twenty-Fourth Annual Symposium on Foundations of Computer Science, pp. 403–409. IEEE Computer Society Press, New York (1983)

    Google Scholar 

  • Reiter, M.K., A secure group membership protocol. In: Proceedings of the 1994 Symposium on Research in Security and Privacy, Oakland, May 1994, pp. 89–99. IEEE Computer Society Press, New York (1994b)

    Google Scholar 

  • Ricciardi, A.M.: The group membership problem in asynchronous systems. Ph.D. diss., Cornell University, January (1993)

    Google Scholar 

  • Ricciardi, A.: The impossibility of (repeated) reliable broadcast. Technical Report TR-PDS-1996-003, Department of Electrical and Computer Engineering, University of Texas, Austin, April (1996)

    Google Scholar 

  • Ricciardi, A., Birman, K.P.: Using process groups to implement failure detection in asynchronous environments. In: Proceedings of the Eleventh ACM Symposium on Principles of Distributed Computing, Quebec, August 1991, pp. 341–351. ACM Press, New York (1991)

    Chapter  Google Scholar 

  • Ricciardi, A., Birman, K.P., Stephenson, P.: The cost of order in asynchronous systems. In: WDAG 1992. Lecture Notes in Computer Science, pp. 329–345. Springer, Berlin (1992)

    Google Scholar 

  • Rodrigues, L., Verissimo, P., Rufino, J.: A low-level processor group membership protocol for LANs. In: Proceedings of the Thirteenth International Conference on Distributed Computing Systems, May 1993, pp. 541–550 (1993)

    Google Scholar 

  • Rodrigues, L., Guo, K., Verissimo, P., Birman, K.P.: A dynamic light-weight group service. J. Parallel Distrib. Comput. 60, 1449–1479 (2000)

    Article  MATH  Google Scholar 

  • Sabel, L., Marzullo, K.: Simulating fail-stop in asynchronous distributed systems. In: Proceedings of the Thirteenth Symposium on Reliable Distributed Systems, Dana Point, CA, October 1994, pp. 138–147. IEEE Computer Society Press, New York (1994)

    Google Scholar 

  • Schneider, F.B.: Byzantine generals in action: Implementing fail-stop processors. ACM Trans. Comput. Syst. 2(2), 145–154 (1984)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London Limited

About this chapter

Cite this chapter

Birman, K.P. (2012). Dynamic Membership. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2416-0_11

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2415-3

  • Online ISBN: 978-1-4471-2416-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics