Abstract
Transis is a high availability distributed system, being developed in the Hebrew University. It supports reliable group communication for high availability applications. The system provides enhanced services for information dissemination and replication in a dynamic environment where machines may crash, for arbitrarily long periods, and may recover; where the network may partition and re-merge. Transis contains novel protocols for reliable message delivery, it optimizes the performance for existing network hardware, and offers a variety of different handles to upper applications. The paper presents the experience gained in the design and the implementation of the Transis communication subsystem.
This work was supported in part by GIF I-207-199.6/91
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
O. Amir, Y. Amir, and D. Dolev. A Highly Available Application in the Transis Environment. In Proceedings of the Hardware and Software Architectures for Fault Tolerance Workshop, at Le Mont Saint-Michel, France (LNCS 774), June 1993.
Y. Amir, D. Dolev, S. Kramer, and D. Malki. Membership Algorithms for Multicast Communication Groups. In 6th Intl. Workshop on Distributed Algorithms proceedings (WDAG-6), (LCNS, 647), pages 292–312, November 1992.
Y. Amir, D. Dolev, S. Kramer, and D. Malki. Transis: A Communication Sub-System for High Availability. In 22nd Annual International Symposium on Fault-Tolerant Computing, pages 76–84, July 1992.
Y. Amir, D. Dolev, P. M. Melliar-Smith, and L. E. Moser. Robust and Efficient Replication using Group Communication. Technical Report CS94-20, Institute of Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel, 1994.
Y. Amir, L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, and P. Ciarfella. Fast Message Ordering and Membership Using a Logical Token-Passing Ring. In Intl. Conference on Distributed Computing Systems, pages 551–560, May 1993.
K. P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM, 36(12), December 1993.
K. P. Birman. Reliable Distributed Computing with the Isis Toolkit, chapter Virtual Synchrony Model. IEEE Press, 1994. to appear.
K. P. Birman, R. Cooper, and B. Gleeson. Programming with Process Groups: Group and Multicast Semantics. TR 91-1185, dept. of Computer Science, Cornell University, Jan 1991.
K. P. Birman, A. Schiper, and P. Stephenson. Lightweight Causal and Atomic Group Multicast. ACM Trans. Comp. Syst., 9(3):272–314, 1991.
K. P. Birman and R. van Renesse. Reliable Distributed Computing with the Isis Toolkit. IEEE Press, 1994.
F. Cristian. Reaching agreement on processor group membership in synchronous distributed systems. Distributed Computing, 4(4):175–187, April 1991.
D. Dolev, S. Kramer, and D. Malki. Early Delivery Totally Ordered Broadcast in Asynchronous Environments. In 23rd Annual International Symposium on Fault-Tolerant Computing, pages 544–553, June 1993.
D. Dolev, D. Malki, and H. R. Strong. An Asynchronous Membership Protocol that Tolerates Partitions. submitted for publication. Available as CS TR94-6, Institute of Computer Science, the Hebrew University of Jerusalem, 1994.
J. Y. Halpern and Y. Moses. Knowledge and Common Knowledge in a Distributed Environment. In 3rd Annual ACM Symp. on Principles of Distributed Computing, pages 50–61, 1984.
M. F. Kaashoek and A. S. Tanenbaum. Group Communication in the Amoeba Distributed Operating System. In 11th Intl. Conference on Distributed Computing Systems, pages 882–891, May 1991.
M. F. Kaashoek, A. S. Tanenbaum, S. F. Hummel, and H. E. Bal. An Efficient Reliable Broadcast Protocol. Operating Systems Review, 23(4):5–19, October 1989.
I. Keidar. A Highly Available Paradigm for Consistent Object Replication. Master's thesis, Inst. of Computer Science, The Hebrew University of Jerusalem, 1994. Also available as Technical Report CS95-5. submitted for publication.
R. Ladin, B. Liskov, L. Shrira, and S. Ghemawat. Lazy Replication: Exploiting the Semantics of Distributed Services. In 9th Ann. Symp. Principles of Distributed Computing, pages 43–58, August 90.
L. Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM, 21(7):558–565, July 78.
D. Malki, Y. Amir, D. Dolev, and S. Kramer. The Transis Approach to High Availability Cluster Communication. TR 94-14, Inst. of Comp. Sci., The Hebrew University of Jerusalem, June 1994.
D. Malki and R. van Renesse. The Replication Service Layer. internal manuscript, 1994.
P. M. Melliar-Smith, L. E. Moser, and V. Agrawala. Broadcast Protocols for Distributed Systems. IEEE Trans. Parallel & Distributed Syst., 1(1):17–25, Jan 1990.
S. Mishra, L. L. Peterson, and R. L. Schlichting. Consul: A Communication Substrate for Fault-Tolerant Distributed Programs. TR 91-32, dept. of Computer Science, University of Arizona, 1991.
L. E. Moser, Y. Amir, P. M. Melliar-Smith, and D. A. Agarwal. Extended virtual synchrony. In Proceedings of the Fourteenth Intl. Conference on Distributed Computing Systems, pages 56–65, Poznan, Poland, June 1994. IEEE. Also available as technical report ECE93-22, Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA.
L. L. Peterson, N. C. Buchholz, and R. D. Schlichting. Preserving and Using Context Information in Interprocess Communication. ACM Trans. Comput. Syst., 7(3):217–246, August 89.
D. Powell. Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag, 1991.
A. M. Ricciardi and K. P. Birman. Using Process Groups to Implement Failure Detection in Asynchronous Environments. In proc. annual ACM Symposium on Principles of Distributed Computing, pages 341–352, August 1991.
R. van Renesse, R. Cooper, B. Glade, and P. Stephenson. A RISC Approach to Process Groups. In Proceedings of the 5th ACM SIGOPS Workshop, pages 21–23, September 1992.
R. van Renesse, T. M. Hickey, and K. P. Birman. Design and Performance of Horus: A Lightweight Group Communications System. Technical Report 94-1442, Cornell University, Dept. of Computer Science, Aug. 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dolev, D., Malki, D. (1995). The design of the Transis system. In: Birman, K.P., Mattern, F., Schiper, A. (eds) Theory and Practice in Distributed Systems. Lecture Notes in Computer Science, vol 938. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60042-6_6
Download citation
DOI: https://doi.org/10.1007/3-540-60042-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60042-8
Online ISBN: 978-3-540-49409-6
eBook Packages: Springer Book Archive