Encyclopedia of Database Systems

Living Edition
| Editors: Ling Liu, M. Tamer Özsu

Replication for Availability and Fault Tolerance

  • Bettina Kemme
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7993-3_80723-1



Replication is a common mechanism to increase the availability of a data service. The idea is to have several copies of the database, each of them installed on a different site (machine or set of machines). Using replication, the data remains available as long as one site is running and accessible. Fault tolerance is related to availability, and the two terms are often used interchangeably. A system is considered fault tolerant if it continues to work correctly despite the failure of individual components. Replicating data and processes over several sites, the failure of any individual site can be masked since the tasks executed by the failed site can be transferred to one of the available sites. In its strict definition, a fault-tolerant system must behave exactly as a system where components never fail. This requires making failures transparent to clients and typically means that all data copies have to be consistent at all...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Bernstein PA, Goodman N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst (TODS). 1984;9(4):596–615.MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.Google Scholar
  3. 3.
    Budhiraja N, Marzullo K, Schneider FB, Toueg S. The primary-backup approach. In: Mullender S, editor. Distributed systems. 2nd ed. Harlow/Munich: Addison Wesley; 1993. p. 199–216.Google Scholar
  4. 4.
    Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner: Google’s globally distributed database. ACM Trans Comput Syst. 2013;31(3):8CrossRefGoogle Scholar
  5. 5.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: Amazon’s highly available key-value store. In: ACM symposium on operating system principles (SOSP); 2007. p. 205–20Google Scholar
  6. 6.
    Ghemawat S, Gobioff H, Leung S. The google file system. In: ACM symposium on operating system principles (SOSP); 2003. p. 29–43Google Scholar
  7. 7.
    Gilbert S, Lynch NA. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News. 2002;33(2):51–9.CrossRefGoogle Scholar
  8. 8.
    Gray J, Helland P, O’Neil P, Shasha D. The dangers of replication and a solution. In: Proceedings of the ACM SIGMOD international conference on management of data; 1996. p. 173–82.Google Scholar
  9. 9.
    Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: wait-free coordination for internet-scale systems. In: USENIX annual technical conference; 2010.Google Scholar
  10. 10.
    Jiménez-Peris R, Patiño-Martínez M, Alonso G, Kemme B. Are quorums an alternative for data replication? ACM Trans Database Syst (TODS). 2003;28(3):257–94.CrossRefGoogle Scholar
  11. 11.
    Kemme B, Bartoli A, Babaoglu Ö. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the IEEE international conference on dependable systems and networks (DSN); 2001. p. 117–30.Google Scholar
  12. 12.
    Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Oper Syst Rev. 2010;44(2):35–40.CrossRefGoogle Scholar
  13. 13.
    Lamport L. The part-time parliament. ACM Trans Comput Syst. 1998;16(2):133–69.CrossRefGoogle Scholar
  14. 14.
    Rao J, Shekita EJ, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. PVLDB. 2011;4(4):243–54.Google Scholar
  15. 15.
    Satyanarayanan M, Kistler JJ, Kumar P, Okasaki ME, Siegel EH, Steere DC. Coda: a highly available file system for a distributed workstation environment. IEEE Trans Comput. 1990;39(4):447–59.CrossRefGoogle Scholar
  16. 16.
    Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C. Managing update conflicts in Bayou, a weakly connected replicated storage system. In: ACM symposium on operating system principles (SOSP); 1995. p. 172–83Google Scholar
  17. 17.
    Thomas RH. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans Database Syst (TODS). 1979;4(2):180–209.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.School of Computer ScienceMcGill UniversityMontrealCanada

Section editors and affiliations

  • Bettina Kemme
    • 1
  1. 1.School of Computer Science, McGill UniversityMontrealCanada