The design of the Transis system

Dolev, Danny; Malki, Dalia

doi:10.1007/3-540-60042-6_6

The design of the Transis system

Danny Dolev¹ &
Dalia Malki¹

Group Communication
Conference paper
First Online: 01 January 2005

135 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 938))

Abstract

Transis is a high availability distributed system, being developed in the Hebrew University. It supports reliable group communication for high availability applications. The system provides enhanced services for information dissemination and replication in a dynamic environment where machines may crash, for arbitrarily long periods, and may recover; where the network may partition and re-merge. Transis contains novel protocols for reliable message delivery, it optimizes the performance for existing network hardware, and offers a variety of different handles to upper applications. The paper presents the experience gained in the design and the implementation of the Transis communication subsystem.

This work was supported in part by GIF I-207-199.6/91

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

O. Amir, Y. Amir, and D. Dolev. A Highly Available Application in the Transis Environment. In Proceedings of the Hardware and Software Architectures for Fault Tolerance Workshop, at Le Mont Saint-Michel, France (LNCS 774), June 1993.
Google Scholar
Y. Amir, D. Dolev, S. Kramer, and D. Malki. Membership Algorithms for Multicast Communication Groups. In 6th Intl. Workshop on Distributed Algorithms proceedings (WDAG-6), (LCNS, 647), pages 292–312, November 1992.
Google Scholar
Y. Amir, D. Dolev, S. Kramer, and D. Malki. Transis: A Communication Sub-System for High Availability. In 22nd Annual International Symposium on Fault-Tolerant Computing, pages 76–84, July 1992.
Google Scholar
Y. Amir, D. Dolev, P. M. Melliar-Smith, and L. E. Moser. Robust and Efficient Replication using Group Communication. Technical Report CS94-20, Institute of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel, 1994.
Google Scholar
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella. Fast Message Ordering and Membership Using a Logical Token-Passing Ring. In Intl. Conference on Distributed Computing Systems, pages 551–560, May 1993.
Google Scholar
K. P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(12), December 1993.
Google Scholar
K. P. Birman. Reliable Distributed Computing with the Isis Toolkit, chapter Virtual Synchrony Model. IEEE Press, 1994. to appear.
Google Scholar
K. P. Birman, R. Cooper, and B. Gleeson. Programming with Process Groups: Group and Multicast Semantics. TR 91-1185, dept. of Computer Science, Cornell University, Jan 1991.
Google Scholar
K. P. Birman, A. Schiper, and P. Stephenson. Lightweight Causal and Atomic Group Multicast. ACM Trans. Comp. Syst., 9(3):272–314, 1991.
Google Scholar
K. P. Birman and R. van Renesse. Reliable Distributed Computing with the Isis Toolkit. IEEE Press, 1994.
Google Scholar
F. Cristian. Reaching agreement on processor group membership in synchronous distributed systems. Distributed Computing, 4(4):175–187, April 1991.
Google Scholar
D. Dolev, S. Kramer, and D. Malki. Early Delivery Totally Ordered Broadcast in Asynchronous Environments. In 23rd Annual International Symposium on Fault-Tolerant Computing, pages 544–553, June 1993.
Google Scholar
D. Dolev, D. Malki, and H. R. Strong. An Asynchronous Membership Protocol that Tolerates Partitions. submitted for publication. Available as CS TR94-6, Institute of Computer Science, the Hebrew University of Jerusalem, 1994.
Google Scholar
J. Y. Halpern and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment. In 3rd Annual ACM Symp. on Principles of Distributed Computing, pages 50–61, 1984.
Google Scholar
M. F. Kaashoek and A. S. Tanenbaum. Group Communication in the Amoeba Distributed Operating System. In 11th Intl. Conference on Distributed Computing Systems, pages 882–891, May 1991.
Google Scholar
M. F. Kaashoek, A. S. Tanenbaum, S. F. Hummel, and H. E. Bal. An Efficient Reliable Broadcast Protocol. Operating Systems Review, 23(4):5–19, October 1989.
Google Scholar
I. Keidar. A Highly Available Paradigm for Consistent Object Replication. Master's thesis, Inst. of Computer Science, The Hebrew University of Jerusalem, 1994. Also available as Technical Report CS95-5. submitted for publication.
Google Scholar
R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Lazy Replication: Exploiting the Semantics of Distributed Services. In 9th Ann. Symp. Principles of Distributed Computing, pages 43–58, August 90.
Google Scholar
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, 21(7):558–565, July 78.
Google Scholar
D. Malki, Y. Amir, D. Dolev, and S. Kramer. The Transis Approach to High Availability Cluster Communication. TR 94-14, Inst. of Comp. Sci., The Hebrew University of Jerusalem, June 1994.
Google Scholar
D. Malki and R. van Renesse. The Replication Service Layer. internal manuscript, 1994.
Google Scholar
P. M. Melliar-Smith, L. E. Moser, and V. Agrawala. Broadcast Protocols for Distributed Systems. IEEE Trans. Parallel & Distributed Syst., 1(1):17–25, Jan 1990.
Google Scholar
S. Mishra, L. L. Peterson, and R. L. Schlichting. Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. TR 91-32, dept. of Computer Science, University of Arizona, 1991.
Google Scholar
L. E. Moser, Y. Amir, P. M. Melliar-Smith, and D. A. Agarwal. Extended virtual synchrony. In Proceedings of the Fourteenth Intl. Conference on Distributed Computing Systems, pages 56–65, Poznan, Poland, June 1994. IEEE. Also available as technical report ECE93-22, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA.
Google Scholar
L. L. Peterson, N. C. Buchholz, and R. D. Schlichting. Preserving and Using Context Information in Interprocess Communication. ACM Trans. Comput. Syst., 7(3):217–246, August 89.
Google Scholar
D. Powell. Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag, 1991.
Google Scholar
A. M. Ricciardi and K. P. Birman. Using Process Groups to Implement Failure Detection in Asynchronous Environments. In proc. annual ACM Symposium on Principles of Distributed Computing, pages 341–352, August 1991.
Google Scholar
R. van Renesse, R. Cooper, B. Glade, and P. Stephenson. A RISC Approach to Process Groups. In Proceedings of the 5th ACM SIGOPS Workshop, pages 21–23, September 1992.
Google Scholar
R. van Renesse, T. M. Hickey, and K. P. Birman. Design and Performance of Horus: A Lightweight Group Communications System. Technical Report 94-1442, Cornell University, Dept. of Computer Science, Aug. 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Institute, Hebrew University, Jerusalem, Israel
Danny Dolev & Dalia Malki

Authors

Danny Dolev
View author publications
You can also search for this author in PubMed Google Scholar
Dalia Malki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kenneth P. Birman Friedemann Mattern André Schiper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dolev, D., Malki, D. (1995). The design of the Transis system. In: Birman, K.P., Mattern, F., Schiper, A. (eds) Theory and Practice in Distributed Systems. Lecture Notes in Computer Science, vol 938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60042-6_6

Download citation

DOI: https://doi.org/10.1007/3-540-60042-6_6
Published: 05 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60042-8
Online ISBN: 978-3-540-49409-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics