Abstract
In this paper, we present a technique for building a high-availability (HA) database management system (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Our approach is based on Remus, a commodity HA solution implemented in the virtualization layer, that uses asynchronous virtual machine state replication to provide transparent HA and failover capabilities. We show that while Remus and similar systems can protect a DBMS, database workloads incur a performance overhead of up to 32 % as compared to an unprotected DBMS. We identify the sources of this overhead and develop optimizations that mitigate the problems. We present an experimental evaluation using two popular database systems and industry standard benchmarks showing that for certain workloads, our optimized approach provides fast failover (\(\le \)3 s of downtime) with low performance overhead when compared to an unprotected DBMS. Our approach provides a practical means for existing, deployed database systems to be made more reliable with a minimum of risk, cost, and effort. Furthermore, this paper invites new discussion about whether the complexity of HA is best implemented within the DBMS, or as a service by the infrastructure below it.
Similar content being viewed by others
References
Aboulnaga, A., Salem, K., Soror, A.A., Minhas, U.F., Kokosielis, P., Kamath, S.: Deploying database appliances in the cloud. IEEE Data Eng. Bull. 32(1), 13–20 (2009)
Altekar, G., Stoica, I.: ODR: output-deterministic replay for multicore debugging. In: Symposium on Operating Systems Principles (2009)
Baker, M., Sullivan, M.: The recovery box: using fast recovery to provide high availability in the UNIX environment. In: USENIX Summer Conference (1992)
Barham, P.T., Dragovic, B., Fraser, K., Hand, S., Harris, T.L., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Symposium on Operating Systems Principles (SOSP) (2003)
Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault-tolerance. In: Symposium on Operating Systems Principles (SOSP) (1995)
Chen, W.J., Otsuki, M., Descovich, P., Arumuggharaj, S., Kubo, T., Bi, Y.J.: High availability and disaster recovery options for DB2 on Linux, Unix, and Windows. Tech. Rep. IBM Redbook SG24-7363-01, IBM (2009)
Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: Symposium on Networked Systems Design and Implementation (NSDI) (2005)
Cully, B., Lefebvre, G., Meyer, D., Feeley, M., Hutchinson, N., Warfield, A.: Remus: High availability via asynchronous virtual machine replication. In: Symposium Networked Systems Design and Implementation (NSDI) (2008)
Distributed Replicated Block Device (DRBD): http://www.drbd.org/ (2008)
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In: Symposium on Operating Systems Design and Implementation (OSDI) (2002)
Dunlap, G.W., Lucchetti, D.G., Fetterman, M.A., Chen, P.M.: Execution replay of multiprocessor virtual machines. In: Virtual Execution Environments (VEE) (2008)
Gifford, D.K.: Weighted voting for replicated data. In: Symposium on Operating Systems Principles (SOSP) (1979)
Gray, J., Helland, P., O’Neil, P., Shasha, D.: The dangers of replication and a solution. In: International Conference on Management of Data (SIGMOD) (1996)
Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques. Morgan Kaufmann, Los Altos (1993)
Java TPC-W implementation, PHARM group, University of Wisconsin. http://www.ece.wisc.edu/pharm/tpcw/ (1999)
Kemme, B., Alonso, G.: Don’t be lazy, be consistent: Postgres-R, a new way to implement database replication. In: International Conference on Very Large Data Bases (VLDB) (2000)
Komo, D.: Microsoft SQL Server 2008 R2 High Availability Technologies White Paper. Microsoft (2010)
Lee, D., Wester, B., Veeraraghavan, K., Narayanasamy, S., Chen, P.M., Flinn, J.: Respec: efficient online multiprocessor replayvia speculation and external determinism. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2010)
Linux-HA Project: http://www.linux-ha.org/doc/ (1999)
Llanos, D.R.: TPCC-UVa: an open-source TPC-C implementation for global performance measurement of computer systems. SIGMOD Rec. 35(4), 6–15 (2006)
Minhas, U.F., Rajagopalan, S., Cully, B., Aboulnaga, A., Salem, K., Warfield, A.: RemusDB: Transparent high availability for database systems. Proc. VLDB Endow. (PVLDB) 4(11), 738–748 (2011)
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. Trans. Database Syst. (TODS) 17(1), 94–162 (1992)
MySQL Cluster 7.0 and 7.1: Architecture and New Features. A MySQL Technical White Paper by Oracle (2010)
Oracle: Oracle Data Guard Concepts and Administration, 11g Release 1 edn (2008)
Oracle: Oracle Real Application Clusters 11g Release 2. Oracle (2009)
Oracle: MySQL 5.0 Reference Manual. Revision 23486, http://dev.mysql.com/doc/refman/5.0/en/ (2010)
Percona Tools TPC-C MySQL Benchmark: https://code.launchpad.net/percona-dev/perconatools/tpcc-mysql (2008)
Polyzois, C.A., Garcia-Molina, H.: Evaluation of remote backup algorithms for transaction processing systems. In: International Conference on Management of Data (SIGMOD) (1992)
Rajagopalan, S., Cully, B., O’Connor, R., Warfield, A.: SecondSite: disaster tolerance as a service. In: Virtual Execution Environments (VEE) (2012)
Scales, D.J., Nelson, M., Venkitachalam, G.: The design and evaluation of a practical system for fault-tolerant virtual machines. Tech. Rep. VMWare-RT-2010-001, VMWare (2010)
Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis, P., Kamath, S.: Automatic virtual machine configuration for database workloads. Trans. Database Syst. (TODS) 35(1), 1–47 (2010)
Strom, R., Yemini, S.: Optimistic recovery in distributed systems. Trans. Comput. Syst. (TOCS) 3(3), 204–226 (1985)
TCP/IP Tutorial and Technical Overview: http://www.redbooks.ibm.com/redbooks/pdfs/gg243376.pdf (2006)
Thomas, R.H.: A majority consensus approach to concurrency control for multiple copy databases. Trans. Database Syst. (TODS) 4(2) (1979)
The TPC-C Benchmark: http://www.tpc.org/tpcc/ (1992)
The TPC-H Benchmark: http://www.tpc.org/tpch/ (1999)
The TPC-W Benchmark: http://www.tpc.org/tpcw/ (1999)
Xen Blktap2 Driver: http://wiki.xensource.com/xenwiki/blktap2 (2010)
Xu, M., Bodik, R., Hill, M.D.: A “flight data recorder” for enabling full-system multiprocessor deterministic replay. Comput. Archit. News 31(2), 122–135 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Minhas, U.F., Rajagopalan, S., Cully, B. et al. RemusDB: transparent high availability for database systems. The VLDB Journal 22, 29–45 (2013). https://doi.org/10.1007/s00778-012-0294-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0294-6