Skip to main content
Log in

Collaborative Group Membership

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

In this paper we present a novel approach to fault-tolerant group membership for use predominantly in collaborative computing environments. As an exemplar, we use the Collaborative Computing Transport Layer which offers reliable atomic multicast capabilities for use in collaborative environments such as the Collaborative Computing Frameworks (CCF). Specific design goals of the approach are the elimination of processing overhead due to heartbeats, support for partial failures and extensibility. These goals are satisfied in an approach, termed Collaborative Group Membership (CGM), which uses a quiescent weak failure detector and two election based algorithms to form consensus on the membership of a failing group. Failure detection operates through a reliable multicast primitive and as such eliminates the need for explicit keep-alive packets; thus in a failure free environment, CGM imposes no overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. Attiya and J. Welch. Distributed Computing: Fundamentals, Simulations and Advanced Topics. McGraw-Hill, New York, 1998.

    Google Scholar 

  2. C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, Englewood Cliffs, NJ, 1986.

    Google Scholar 

  3. S. Chodrow, S. Cheung, P. Hutto, A. Krantz, P. Gray, T. Goddard, I. Rhee, and V. Sunderam. CCF: a collaborative computing frameworks. IEEE Internet Computing, January/February 2000.

  4. D. A. Agarwal. Totem: A reliable ordered delivery protocol for interconnected local-area networks. Ph.D. thesis, University of California, Santa Barbara, August 1994.

    Google Scholar 

  5. D. Dolev and D. Malki. The transis approach to high availability cluster communication. Communications of the ACM, April 1996.

  6. G. J. Holzmann. Basic spin manual. Available from: http://cm.bell-labs.com/cm/cs/what/spin/Man/ Manual.html.

  7. G. J. Holzmann. Design and Validation of Computer Protocols. Prentice Hall, Englewood Cliffs, NJ, 1991. An online version is available at: http://cm.bell-labs.com/cm/cs/what/spin/Doc/ Book91.html.

    Google Scholar 

  8. J. S. Pascoe, R. J. Loader, and V. S. Sunderam. Working towards the agreement problem protocol verification environment. In Proceedings of WoTUG 24: Communicating Process Architectures, pp. 213–230. IOS Press, 2001.

  9. K. Berket. The InterGroup protocols: Scalable group communication for the internet. PhD thesis, University of California, Santa Barbara, 2000.

    Google Scholar 

  10. K. P. Birman. Building Secure and Reliable Network Applications. Prentice Hall, Englewood Cliffs, N.J., 1997.

    Google Scholar 

  11. L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia, and C. A. Lingley-Papadopoulos. Totem: a fault-tolerant multicast group communication system. Communications of the ACM, April 1996.

  12. R. J. Loader, J. S. Pascoe, and V. S. Sunderam. An electorial approach to fault-tolerance in multicast networks. Technical report RUCS/2000/TR/011/A, The University of Reading, Department of Computer Science, 2000.

  13. O. Rodeh, K. P. Birman, and D. Dolev. The architecture and performance of security protocols in the ensemble group communication system. Technical report TR2000–1791, Cornell University, March 2000.

  14. R. van Renesse, K. P. Birman, and S. Maffeis. Horus, a flexible group communication system. Communications of the ACM, April 1996.

  15. I. Rhee, S. Cheung, P. Hutto, A. Krantz, and V. Sunderam. Group communication support for distributed collaboration systems. In Proceedings of Cluster Computing: Networks, Software Tools and Applications, December 1998.

  16. R. Gerth. Concise Promela reference. Available from: http://cm.bell-labs.com/cm/cs/what/spin/Man/ Quick.html.

  17. T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the Association for Computing Machinery, 43(2), 1996.

  18. Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems: Specification. Springer-Verlag, Berlin, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pascoe, J.S., Loader, R.J. & Sunderam, V.S. Collaborative Group Membership. The Journal of Supercomputing 22, 55–68 (2002). https://doi.org/10.1023/A:1014306520634

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014306520634

Navigation