Advertisement

The design of the Transis system

  • Danny Dolev
  • Dalia Malki
Group Communication
Part of the Lecture Notes in Computer Science book series (LNCS, volume 938)

Abstract

Transis is a high availability distributed system, being developed in the Hebrew University. It supports reliable group communication for high availability applications. The system provides enhanced services for information dissemination and replication in a dynamic environment where machines may crash, for arbitrarily long periods, and may recover; where the network may partition and re-merge. Transis contains novel protocols for reliable message delivery, it optimizes the performance for existing network hardware, and offers a variety of different handles to upper applications. The paper presents the experience gained in the design and the implementation of the Transis communication subsystem.

Keywords

Multicast Group Multicast Service Safe Message Message Loss Membership Change 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    O. Amir, Y. Amir, and D. Dolev. A Highly Available Application in the Transis Environment. In Proceedings of the Hardware and Software Architectures for Fault Tolerance Workshop, at Le Mont Saint-Michel, France (LNCS 774), June 1993.Google Scholar
  2. 2.
    Y. Amir, D. Dolev, S. Kramer, and D. Malki. Membership Algorithms for Multicast Communication Groups. In 6th Intl. Workshop on Distributed Algorithms proceedings (WDAG-6), (LCNS, 647), pages 292–312, November 1992.Google Scholar
  3. 3.
    Y. Amir, D. Dolev, S. Kramer, and D. Malki. Transis: A Communication Sub-System for High Availability. In 22nd Annual International Symposium on Fault-Tolerant Computing, pages 76–84, July 1992.Google Scholar
  4. 4.
    Y. Amir, D. Dolev, P. M. Melliar-Smith, and L. E. Moser. Robust and Efficient Replication using Group Communication. Technical Report CS94-20, Institute of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel, 1994.Google Scholar
  5. 5.
    Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella. Fast Message Ordering and Membership Using a Logical Token-Passing Ring. In Intl. Conference on Distributed Computing Systems, pages 551–560, May 1993.Google Scholar
  6. 6.
    K. P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(12), December 1993.Google Scholar
  7. 7.
    K. P. Birman. Reliable Distributed Computing with the Isis Toolkit, chapter Virtual Synchrony Model. IEEE Press, 1994. to appear.Google Scholar
  8. 8.
    K. P. Birman, R. Cooper, and B. Gleeson. Programming with Process Groups: Group and Multicast Semantics. TR 91-1185, dept. of Computer Science, Cornell University, Jan 1991.Google Scholar
  9. 9.
    K. P. Birman, A. Schiper, and P. Stephenson. Lightweight Causal and Atomic Group Multicast. ACM Trans. Comp. Syst., 9(3):272–314, 1991.Google Scholar
  10. 10.
    K. P. Birman and R. van Renesse. Reliable Distributed Computing with the Isis Toolkit. IEEE Press, 1994.Google Scholar
  11. 11.
    F. Cristian. Reaching agreement on processor group membership in synchronous distributed systems. Distributed Computing, 4(4):175–187, April 1991.Google Scholar
  12. 12.
    D. Dolev, S. Kramer, and D. Malki. Early Delivery Totally Ordered Broadcast in Asynchronous Environments. In 23rd Annual International Symposium on Fault-Tolerant Computing, pages 544–553, June 1993.Google Scholar
  13. 13.
    D. Dolev, D. Malki, and H. R. Strong. An Asynchronous Membership Protocol that Tolerates Partitions. submitted for publication. Available as CS TR94-6, Institute of Computer Science, the Hebrew University of Jerusalem, 1994.Google Scholar
  14. 14.
    J. Y. Halpern and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment. In 3rd Annual ACM Symp. on Principles of Distributed Computing, pages 50–61, 1984.Google Scholar
  15. 15.
    M. F. Kaashoek and A. S. Tanenbaum. Group Communication in the Amoeba Distributed Operating System. In 11th Intl. Conference on Distributed Computing Systems, pages 882–891, May 1991.Google Scholar
  16. 16.
    M. F. Kaashoek, A. S. Tanenbaum, S. F. Hummel, and H. E. Bal. An Efficient Reliable Broadcast Protocol. Operating Systems Review, 23(4):5–19, October 1989.Google Scholar
  17. 17.
    I. Keidar. A Highly Available Paradigm for Consistent Object Replication. Master's thesis, Inst. of Computer Science, The Hebrew University of Jerusalem, 1994. Also available as Technical Report CS95-5. submitted for publication.Google Scholar
  18. 18.
    R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Lazy Replication: Exploiting the Semantics of Distributed Services. In 9th Ann. Symp. Principles of Distributed Computing, pages 43–58, August 90.Google Scholar
  19. 19.
    L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, 21(7):558–565, July 78.Google Scholar
  20. 20.
    D. Malki, Y. Amir, D. Dolev, and S. Kramer. The Transis Approach to High Availability Cluster Communication. TR 94-14, Inst. of Comp. Sci., The Hebrew University of Jerusalem, June 1994.Google Scholar
  21. 21.
    D. Malki and R. van Renesse. The Replication Service Layer. internal manuscript, 1994.Google Scholar
  22. 22.
    P. M. Melliar-Smith, L. E. Moser, and V. Agrawala. Broadcast Protocols for Distributed Systems. IEEE Trans. Parallel & Distributed Syst., 1(1):17–25, Jan 1990.Google Scholar
  23. 23.
    S. Mishra, L. L. Peterson, and R. L. Schlichting. Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. TR 91-32, dept. of Computer Science, University of Arizona, 1991.Google Scholar
  24. 24.
    L. E. Moser, Y. Amir, P. M. Melliar-Smith, and D. A. Agarwal. Extended virtual synchrony. In Proceedings of the Fourteenth Intl. Conference on Distributed Computing Systems, pages 56–65, Poznan, Poland, June 1994. IEEE. Also available as technical report ECE93-22, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA.Google Scholar
  25. 25.
    L. L. Peterson, N. C. Buchholz, and R. D. Schlichting. Preserving and Using Context Information in Interprocess Communication. ACM Trans. Comput. Syst., 7(3):217–246, August 89.Google Scholar
  26. 26.
    D. Powell. Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag, 1991.Google Scholar
  27. 27.
    A. M. Ricciardi and K. P. Birman. Using Process Groups to Implement Failure Detection in Asynchronous Environments. In proc. annual ACM Symposium on Principles of Distributed Computing, pages 341–352, August 1991.Google Scholar
  28. 28.
    R. van Renesse, R. Cooper, B. Glade, and P. Stephenson. A RISC Approach to Process Groups. In Proceedings of the 5th ACM SIGOPS Workshop, pages 21–23, September 1992.Google Scholar
  29. 29.
    R. van Renesse, T. M. Hickey, and K. P. Birman. Design and Performance of Horus: A Lightweight Group Communications System. Technical Report 94-1442, Cornell University, Dept. of Computer Science, Aug. 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1995

Authors and Affiliations

  • Danny Dolev
    • 1
  • Dalia Malki
    • 1
  1. 1.Computer Science InstituteHebrew UniversityJerusalemIsrael

Personalised recommendations