Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Replication for Availability and Fault Tolerance

  • Bettina KemmeEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80723


Backup mechanisms; Fault-tolerance


Replication is a common mechanism to increase the availability of a data service. The idea is to have several copies of the database, each of them installed on a different site (machine or set of machines). Using replication, the data remains available as long as one site is running and accessible. Fault tolerance is related to availability, and the two terms are often used interchangeably. A system is considered fault tolerant if it continues to work correctly despite the failure of individual components. Replicating data and processes over several sites, the failure of any individual site can be masked since the tasks executed by the failed site can be transferred to one of the available sites. In its strict definition, a fault-tolerant system must behave exactly as a system where components never fail. This requires making failures transparent to clients and typically means that all data copies have to be consistent at all...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Bernstein PA, Goodman N. An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst. 1984;9(4):596–615.MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bernstein PA, Hadzilacos V, Goodman N. Concurrency control and recovery in database systems. Reading: Addison Wesley; 1987.Google Scholar
  3. 3.
    Budhiraja N, Marzullo K, Schneider FB, Toueg S. The primary-backup approach. In: Mullender S, editor. Distributed systems. 2nd ed. Harlow/Munich: Addison Wesley; 1993. p. 199–216.Google Scholar
  4. 4.
    Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh WC, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D. Spanner: Google’s globally distributed database. ACM Trans Comput Syst. 2013;31(3):8CrossRefGoogle Scholar
  5. 5.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W. Dynamo: Amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating System Principles; 2007. p. 205–20Google Scholar
  6. 6.
    Ghemawat S, Gobioff H, Leung S. The google file system. In: Proceedings of the 19th ACM Symposium on Operating System Principles; 2003. p. 29–43Google Scholar
  7. 7.
    Gilbert S, Lynch NA. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News. 2002;33(2): 51–9.CrossRefGoogle Scholar
  8. 8.
    Gray J, Helland P, O’Neil P, Shasha D. The dangers of replication and a solution. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 173–82.Google Scholar
  9. 9.
    Hunt P, Konar M, Junqueira FP, Reed B. Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX 2010 Annual Technical Conference; 2010.Google Scholar
  10. 10.
    Jiménez-Peris R, Patiño-Martínez M, Alonso G, Kemme B. Are quorums an alternative for data replication? ACM Trans Database Syst. 2003;28(3):257–94.CrossRefGoogle Scholar
  11. 11.
    Kemme B, Bartoli A, Babaoglu Ö. Online reconfiguration in replicated databases based on group communication. In: Proceedings of the International Conference on Dependable Systems and Networks; 2001. p. 117–30.Google Scholar
  12. 12.
    Lakshman A, Malik P. Cassandra: a decentralized structured storage system. Oper Syst Rev. 2010;44(2):35–40.CrossRefGoogle Scholar
  13. 13.
    Lamport L. The part-time parliament. ACM Trans Comput Syst. 1998;16(2):133–69.CrossRefGoogle Scholar
  14. 14.
    Rao J, Shekita EJ, Tata S. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow. 2011;4(4):243–54.CrossRefGoogle Scholar
  15. 15.
    Satyanarayanan M, Kistler JJ, Kumar P, Okasaki ME, Siegel EH, Steere DC. Coda: a highly available file system for a distributed workstation environment. IEEE Trans Comput. 1990;39(4):447–59.CrossRefGoogle Scholar
  16. 16.
    Terry DB, Theimer M, Petersen K, Demers AJ, Spreitzer M, Hauser C. Managing update conflicts in Bayou, a weakly connected replicated storage system. In: Proceedings of the 15th ACM Symposium on Operating System Principles; 1995. p. 172–83.Google Scholar
  17. 17.
    Thomas RH. A majority consensus approach to concurrency control for multiple copy databases. ACM Trans Database Syst. 1979;4(2): 180–209.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer ScienceMcGill UniversityMontrealCanada