Dependable Systems

Schiper, André

doi:10.1007/11808107_2

André Schiper¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 4028))

395 Accesses
1 Citations

Abstract

Improving the dependability of computer systems is a critical and essential task. In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained. Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques. Finally the difficulty of implementing group communication is discussed, and the most important algorithms are presented.

Almost the same paper appears under the title Group Communication: from practice to theory in Proceedings SOFSEM 2006: Theory and Practice of Computer Science, Merin, Czech Republic, January 2006, Springer, LNCS 383, pages 117-137, 2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aguilera, M.K., Chen, W., Toueg, S.: Heartbeat: a timeout-free failure detector for quiescent reliable communication. In: Mavronicolas, M. (ed.) WDAG 1997. LNCS, vol. 1320, pp. 126–140. Springer, Heidelberg (1997)
Chapter Google Scholar
Aguilera, M.K., Delporte-Gallet, C., Fauconnier, H., Toueg, S.: Thrifty generic broadcast. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, p. 268. Springer, Heidelberg (2000)
Chapter Google Scholar
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Distributed Database Systems. Addison-Wesley, Reading (1987)
Google Scholar
Birman, K., Joseph, T.: Reliable Communication in the Presence of Failures. ACM Trans. on Computer Systems 5(1), 47–76 (1987)
Article Google Scholar
Bünzli, D.C., Fuzzati, R., Mena, S., Nestmann, U., Rütti, O., Schiper, A., Wojciechowski, P.T.: Advances in the Design and Implementation of Group Communication Middleware. In: Kohlas, J., Meyer, B., Schiper, A. (eds.) Dependable Systems: Software, Computing, Networks. LNCS, vol. 4028, pp. 172–194. Springer, Heidelberg (2006)
Chapter Google Scholar
Chandra, T.D., Hadzilacos, V., Toueg, S.: The Weakest Failure Detector for Solving Consensus. Journal of ACM 43(4), 685–722 (1996)
Article MATH MathSciNet Google Scholar
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of ACM 43(2), 225–267 (1996)
Article MATH MathSciNet Google Scholar
Chockler, G.V., Keidar, I., Vitenberg, R.: Group Communication Specifications: A Comprehensive Study. ACM Computing Surveys 4(33), 1–43 (2001)
Google Scholar
Défago, X., Schiper, A., Urban, P.: Totally Ordered Broadcast and Multicast Algorithms: Taxonomy and Survey. ACM Computing Surveys 4(36), 1–50 (2004)
Google Scholar
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchrony needed for distributed consensus. Journal of ACM 34(1), 77–97 (1987)
Article MATH MathSciNet Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Ekwall, R., Schiper, A.: Replication: Understanding the Advantage of Atomic Broadcast over Quorum Systems. Journal of Universal Computer Science 11(5), 703–711 (2005)
Google Scholar
Elnozahy, E.N., Alvisi, L., Wang, Y.-M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)
Article Google Scholar
Fischer, M., Lynch, N., Paterson, M.: Impossibility of Distributed Consensus with One Faulty Process. Journal of ACM 32, 374–382 (1985)
Article MATH MathSciNet Google Scholar
Guerraoui, R., Larrea, M., Schiper, A.: Reducing the cost for Non-Blocking in Atomic Commitment. In: IEEE 16th Intl. Conf. Distributed Computing Systems, pp. 692–697 (May 1996)
Google Scholar
Hadzilacos, V., Toueg, S.: Fault-Tolerant Broadcasts and Related Problems. Technical Report 94-1425, Department of Computer Science, Cornell University (May 1994)
Google Scholar
Herlihy, M., Wing, J.: Linearizability: a correctness condition for concurrent objects. ACM Trans. on Progr. Languages and Syst. 12(3), 463–492 (1990)
Article Google Scholar
Hermant, J.-F., Le Lann, G.: Fast Asynchronous Uniform Consensus in Real-Time Distributed Systems. IEEE Transactions on Computers 51(8), 931–944 (2002)
Article Google Scholar
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Comm. ACM 21(7), 558–565 (1978)
Article MATH Google Scholar
Lamport, L.: How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. on Computers C28(9), 690–691 (1979)
Article Google Scholar
Lamport, L.: The Part-Time Parliament. TR 49, Digital SRC (September 1989)
Google Scholar
Lamport, L.: The Part-Time Parliament. ACM Trans. on Computer Systems 16(2), 133–169 (1998)
Article Google Scholar
Laprie, J.C. (ed.): Dependability: Basic Concepts and Terminology. Springer, Heidelberg (1992)
MATH Google Scholar
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
MATH Google Scholar
Misra, J.: Axioms for memory access in asynchronous hardware systems. ACM Trans. on Progr. Languages and Syst. 8(1), 142–153 (1986)
Article MATH Google Scholar
Pedone, F., Schiper, A.: Handling Message Semanticas with Generic Broadcast Protocols. Distributed Computing 15(2), 97–107 (2002)
Article Google Scholar
Schiper, A.: Dynamic Group Communication. Distributed Computing 18(5), 359–374 (2006)
Article Google Scholar
Schiper, A., Toueg, S.: From Set Membership to Group Membership: A Separation of Concerns. IEEE Transactions on Dependable and Secure Computing (TDSC) 3(1), 2–12 (2006)
Article Google Scholar
Schneider, F.B.: Implementing Fault Tolerant Services Using the State Machine Approach: A Tutorial. Computing Surveys 22(4), 299–319 (1990)
Article Google Scholar
Skeen, D.: Nonblocking Commit Protocols. In: ACM SIGMOD Intl. Conf. on Management of Data, pp. 133–142 (1981)
Google Scholar
Urbán, P., Shnayderman, I., Schiper, A.: Comparison of failure detectors and group membership: Performance study of two atomic broadcast algorithms. In: Proc. Int’l Conf. on Dependable Systems and Networks, San Francisco, CA, USA, pp. 645–654 (June 2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
André Schiper

Authors

André Schiper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, University of Fribourg, Bd. de Pérolles 90, CH-1700, Fribourg, Switzerland
Jürg Kohlas
Eiffel Software, USA
Bertrand Meyer
École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
André Schiper

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schiper, A. (2006). Dependable Systems. In: Kohlas, J., Meyer, B., Schiper, A. (eds) Dependable Systems: Software, Computing, Networks. Lecture Notes in Computer Science, vol 4028. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11808107_2

Download citation

DOI: https://doi.org/10.1007/11808107_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36821-2
Online ISBN: 978-3-540-36823-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics