Dynamic Membership

Birman, Kenneth P.

doi:10.1007/978-1-4471-2416-0_11

Kenneth P. Birman²

Part of the book series: Texts in Computer Science ((TCS))

3187 Accesses

Abstract

The context established by Chap. 10 provides conceptual tools to develop an automated and highly dynamic membership tracking service, in which membership of a system varies as service instances are launched and join an active system, shut down and must leave it, or crash. We solve the problem in steps, first showing how a system can track its own membership, and then showing how the resulting group membership system (GMS) can be used to support the creation of protocols that, themselves, can largely ignore dynamic membership.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Readers interested in learning more about that first step might want to look at Shlomi Dolev’s work on self-stabilization (Dolev 2000). A self-stabilizing bootstrap algorithm would work as follows. During periods when applications can find an active membership service, they would do so. But if an application is launched and cannot find the service, it would run Dolev’s self-stabilizing leader election protocol. That protocol can automatically handle various numbers of concurrent processes and is guaranteed to eventually converge to a state in which a single leader has been picked and every process knows who the leader is (the eventual convergence does require that any severe churn that may be happening settle down). At any rate, once this bootstrap step selects a leader, one would delay for a sufficient amount of time to have reasonable confidence that two leaders have not been picked (Dolev’s theory lets us calculate the needed delay). Finally, the leader could boot our dynamic membership protocol. Non-leaders, in contrast, simply wait until the leader is running the membership protocol, at which point they join the now-running system. In effect, we would run the self-stabilization protocol only while no copies of the membership service are active. Thus, the normal case becomes one in which the membership service is running, and some processes either wish to join, or are terminating deliberately, or seem to have failed (other processes are reporting timeouts). We would revert to the self-stabilization approach again if all copies of the membership service crash, but otherwise will not use it again.

References

Abraham, I., Malkhi, D.: Probabilistic quorums for dynamic systems. In: The 17th International Symposium on Distributed Computing (DISC 2003), Sorento, Italy, October 2003
Google Scholar
Agarwal, D.A.: Totem: A reliable ordered delivery protocol for interconnected local area networks. Ph.D. diss., Department of Electrical and Computer Engineering, University of California, Santa Barbara (1994)
Google Scholar
Alvisi, L., Malkhi, D., Pierce, E., Reiter, M.: Fault detection for byzantine quorum systems. IEEE Trans. Parallel Distrib. Syst. 12(9), 996–1007 (2001b)
Article Google Scholar
Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Membership algorithms in broadcast domains. In: Proceedings of the Sixth WDAG, Israel, June 1992. Lecture Notes in Computer Science, vol. 647, pp. 292–312. Springer, Berlin (1992a)
Google Scholar
Amir, Y., Dolev, D., Kramer, S., Malkhi, D.: Transis: A communication subsystem for high availability. In: Proceedings of the Twenty-Second Symposium on Fault-Tolerant Computing Systems, Boston, July 1992, pp. 76–84. IEEE Computer Society Press, New York (1992b)
Google Scholar
Anceaume, E., Charron-Bost, B., Minet, P., Toueg, S.: On the formal specification of group membership services. Technical Report 95-1534, Department of Computer Science, Cornell University, August (1995)
Google Scholar
Babaoglu, O., Davoli, R., Giachini, L.A., Baker, M.B.: RELACS: A communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project Deliverable Report, Department of Computing Science, University of Newcastle upon Tyne, United Kingdom (1994)
Google Scholar
Babaoglu, O., Davoli, R., Montresor, A.: Failure detectors, group membership, and view-synchronous communication in partitionable asynchronous systems. Technical Report UBLCS-95-19, Department of Computer Science, University of Bologna, November (1995)
Google Scholar
Babaoglu, O., Bartoli, A., Dini, G.: Enriched view synchrony: A paradigm for programming dependable applications in partitionable asynchronous distributed systems. Technical Report, Department of Computer Science, University of Bologna, May (1996)
Google Scholar
Ben-Or, M.: Fast asynchronous byzantine agreement. In: Proceedings of the Fourth ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985, pp. 149–151 (1985)
Chapter Google Scholar
Birman, K.P., Glade, B.B.: Consistent failure reporting in reliable communications systems. IEEE Softw., Special Issue on Reliability (1995)
Google Scholar
Birman, K.P., Joseph, T.A.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, November 1987, pp. 123–138. ACM Press, New York (1987a)
Chapter Google Scholar
Birman, K.P., Joseph, T.A.: Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5(1), 47–76 (1987b)
Article Google Scholar
Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Article Google Scholar
Castro, M., Druschel, P., Hu, Y.C., Rowstron, A.: Topology-aware routing in structured peer-to-peer overlay networks. In: Future Directions in Distributed Computing 2003, pp. 103–107. Springer, Berlin (2003a)
Chapter Google Scholar
Castro, M., Rodrigues, R., Liskov, B.: BASE: Using abstraction to improve fault tolerance. ACM Trans. Comput. Syst. 21(3), 236–269 (2003b)
Article Google Scholar
Chandra, T., Toueg, S.: Unreliable failure detectors for asynchronous systems. J. ACM (in press). Previous version in ACM Symposium on Principles of Distributed Computing (Montreal, 1991), pp. 325–340
Google Scholar
Chandra, T., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. In: ACM Symposium on Principles of Distributed Computing, August 1992, pp. 147–158 (1992)
Google Scholar
Chandra, T., Hadzilacos, V., Toueg, S., Charron-Bost, B.: On the impossibility of group membership. In: Proceedings of the ACM Symposium on Principles of Distributed Computing, Vancouver, May 1996
Google Scholar
Chockler, G., Keidar, I., Vitenberg, R.: Group communication specifications: A comprehensive study. ACM Comput. Surv. 33(4), 1–43 (2001)
Article Google Scholar
Coan, B., Thomas, G.: Agreeing on a leader in real time. In: Proceedings of the Eleventh Real-Time Systems Symposium, December 1990, pp. 166–172 (1990)
Chapter Google Scholar
Coan, B., Oki, B.M., Kolodner, E.K.: Limitations on database availability when networks partition. In: Proceedings of the Fifth ACM Symposium on Principles of Distributed Computing, Calgary, August 1986, pp. 187–194 (1986)
Chapter Google Scholar
Cristian, F.: Reaching agreement on processor group membership in synchronous distributed systems. Distrib. Comput. 4(4), 175–187 (1991a)
Article MATH Google Scholar
Cristian, F.: Synchronous and asynchronous group communication. Commun. ACM 39(4), 88–97 (1996)
Article Google Scholar
Cristian, F., Schmuck, F.: Agreeing on process group membership in asynchronous distributed systems. Technical Report CSE95-428, Department of Computer Science and Engineering, University of California, San Diego (1995)
Google Scholar
Cristian, F., Aghili, H., Strong, R., Dolev, D.: Atomic broadcast: From simple message diffusion to byzantine agreement. In: Proceedings of the Fifteenth International Symposium on Fault-Tolerant Computing, pp. 200–206. IEEE Computer Society Press, New York (1985). Revised as IBM Technical Report RJ5244
Google Scholar
Cristian, F., Dolev, D., Strong, R., Aghili, H.: Atomic broadcast in a real-time environment. In: Fault-Tolerant Distributed Computing. Lecture Notes in Computer Science, vol. 448, pp. 51–71. Springer, Berlin (1990)
Chapter Google Scholar
Dolev, S.: Self-stabilization. MIT Press, Cambridge (2000)
MATH Google Scholar
Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171–216 (2001)
Article Google Scholar
Fisher, M.J., Lynch, N.A., Merritt, M.: Easy impossibility proofs for distributed consensus problems. In: Proceedings of the Fourth Annual ACM Symposium on Principles of Distributed Computing, Minaki, Canada, August 1985. ACM Press, New York (1985a)
Google Scholar
Fisher, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed computing with one faulty process. J. ACM 32(2), 374–382 (1985b)
Article Google Scholar
Friedman, R., Birman, K.P.: Using group communication technology to implement a reliable and scalable distributed IN coprocessor. In: TINA’96: The Convergence of Telecommunications and Distributed Computing Technologies, Heidelberg, September 1996, pp. 25–42. VDE-Verlag, Berlin (1996). Also Technical Report, Department of Computer Science, Cornell University, March (1996)
Google Scholar
Friedman, R., Keider, I., Malkhi, D., Birman, K.P., Dolev, D.: Deciding in partitionable networks. Technical Report 95-1554, Department of Computer Science, Cornell University, October (1995)
Google Scholar
Golding, R.A.: Weak consistency group communication and membership. Ph.D. diss., Computer and Information Sciences Department, University of California, Santa Cruz (1992)
Google Scholar
Guerraoui, R.: Revisiting the relationship between nonblocking atomic commitment and consensus. In: International Workshop on Distributed Algorithms, September 1995, pp. 87–100 (1995)
Chapter Google Scholar
Guerraoui, R., Schiper, A.: Gamma-accurate failure detectors. Technical Report APFL, Lausanne, Switzerland: Départment d’Informatique (1996)
Google Scholar
Guerraoui, R., Knežević, N., Quéma, V., Vukolić, M.: The next 700 BFT protocols. In: Proceedings of EuroSys, Paris, France, April 2010, pp. 363–376 (2010)
Google Scholar
Haeberlen, A., Kuznetsov, P., Druschel, P.: PeerReview: Practical accountability for distributed systems. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP ’07), Stevenson, WA, October 2007
Google Scholar
Kotla, R., Alvisi, L., Dahlin, M., Clement, A., Wong, E.: Zyzzyva: Speculative byzantine fault tolerance. ACM Trans. Comput. Syst. 27(4) (2009)
Google Scholar
Lynch, N.: Distributed Algorithms. Morgan Kaufmann, San Mateo (1996)
MATH Google Scholar
Malkhi, D.: Multicast communication for high availability. Ph.D. diss., Hebrew University of Jerusalem (1994)
Google Scholar
Malkhi, D., Reiter, M.K.: Byzantine quorum systems. Distrib. Comput. 11(4), 203–213 (1998)
Article Google Scholar
Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large distributed systems. IEEE Trans. Knowl. Data Eng. 12(2), 187–202 (2000)
Article Google Scholar
Malkhi, D., Reiter, M., Wool, A., Wright, R.: Probabilistic quorum systems. Inf. Comput. J. 170(2) (2001a)
Google Scholar
Malkhi, D., Reiter, M.K., Tulone, D., Ziskind, E.: Persistent objects in the Fleet system. In: Proceedings of the 2nd DARPA Information Survivability Conference and Exposition (DISCEX II), June 2001, vol. II, pp. 126–136 (2001b)
Chapter Google Scholar
Melliar-Smith, P.M., Moser, L.E., Agrawala, V.: Membership algorithms for asynchronous distributed systems. In: Proceedings of the IEEE Eleventh ICDCS, May 1991, pp. 480–488 (1991)
Google Scholar
Mishra, S., Peterson, L.L., Schlichting, R.D.: A membership protocol based on partial order. In: Proceedings of the IEEE International Working Conference on Dependable Computing for Critical Applications, February 1991, pp. 137–145 (1991)
Google Scholar
Moser, L.E., Amir, Y., Melliar-Smith, P.M., Agarwal, D.A.: Extended virtual synchrony. In: Proceedings of the Fourteenth International Conference on Distributed Computing Systems, June 1994, pp. 56–65. IEEE Computer Society Press, New York (1994a). Also Technical Report TR-93-22, Department of ECE, University of California, Santa Barbara, December (1993)
Google Scholar
Moser, L.E., Melliar-Smith, P.M., Agarwal, U.: Processor membership in asynchronous distributed systems. IEEE Trans. Parallel Distrib. Syst. 5(5), 459–473 (1994b)
Article Google Scholar
Moser, L.E., Melliar-Smith, P.M., Agarwal, D.A., Budhia, R.K., Lingley-Papadopoulos, C.A.: Totem: A fault-tolerant multicast group communication system. Commun. ACM 39(4), 54–63 (1996)
Article Google Scholar
Neiger, G.: A new look at membership services. In: Proceedings of the Fifteenth ACM Symposium on Principles of Distributed Computing, Vancouver (1996). In press
Google Scholar
Rabin, M.: Randomized Byzantine generals. In: Proceedings of the Twenty-Fourth Annual Symposium on Foundations of Computer Science, pp. 403–409. IEEE Computer Society Press, New York (1983)
Google Scholar
Reiter, M.K., A secure group membership protocol. In: Proceedings of the 1994 Symposium on Research in Security and Privacy, Oakland, May 1994, pp. 89–99. IEEE Computer Society Press, New York (1994b)
Google Scholar
Ricciardi, A.M.: The group membership problem in asynchronous systems. Ph.D. diss., Cornell University, January (1993)
Google Scholar
Ricciardi, A.: The impossibility of (repeated) reliable broadcast. Technical Report TR-PDS-1996-003, Department of Electrical and Computer Engineering, University of Texas, Austin, April (1996)
Google Scholar
Ricciardi, A., Birman, K.P.: Using process groups to implement failure detection in asynchronous environments. In: Proceedings of the Eleventh ACM Symposium on Principles of Distributed Computing, Quebec, August 1991, pp. 341–351. ACM Press, New York (1991)
Chapter Google Scholar
Ricciardi, A., Birman, K.P., Stephenson, P.: The cost of order in asynchronous systems. In: WDAG 1992. Lecture Notes in Computer Science, pp. 329–345. Springer, Berlin (1992)
Google Scholar
Rodrigues, L., Verissimo, P., Rufino, J.: A low-level processor group membership protocol for LANs. In: Proceedings of the Thirteenth International Conference on Distributed Computing Systems, May 1993, pp. 541–550 (1993)
Google Scholar
Rodrigues, L., Guo, K., Verissimo, P., Birman, K.P.: A dynamic light-weight group service. J. Parallel Distrib. Comput. 60, 1449–1479 (2000)
Article MATH Google Scholar
Sabel, L., Marzullo, K.: Simulating fail-stop in asynchronous distributed systems. In: Proceedings of the Thirteenth Symposium on Reliable Distributed Systems, Dana Point, CA, October 1994, pp. 138–147. IEEE Computer Society Press, New York (1994)
Google Scholar
Schneider, F.B.: Byzantine generals in action: Implementing fail-stop processors. ACM Trans. Comput. Syst. 2(2), 145–154 (1984)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, USA
Kenneth P. Birman

Authors

Kenneth P. Birman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Birman, K.P. (2012). Dynamic Membership. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_11

Download citation

DOI: https://doi.org/10.1007/978-1-4471-2416-0_11
Publisher Name: Springer, London
Print ISBN: 978-1-4471-2415-3
Online ISBN: 978-1-4471-2416-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics