Abstract
We describe a recovery protocol which boosts availability, fault tolerance and performance by enabling failed network nodes to resume an active role immediately after they start recovering. The protocol is designed to work in tandem with middleware-based eager update-everywhere strategies and related group communication systems. The latter provide view synchrony, i.e., knowledge about currently reachable nodes and about the status of messages delivered by faulty and alive nodes. That enables a fast replay of missed updates which defines dynamic database recovery partition. Thus, speeding up the recovery of failed nodes which, together with the rest of the network, may seamlessly continue to process transactions even before their recovery has completed. We specify the protocol in terms of the procedures executed with every message and event of interest and outline a correctness proof.
This work has been supported by the Spanish government research grant TIC2003-09420-C02.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gray, J., Helland, P., O’Neil, P.E., Shasha, D.: The dangers of replication and a solution. In: SIGMOD Conference, pp. 173–182 (1996)
Wiesmann, M., Pedone, F., Schiper, A., Kemme, B., Alonso, G.: Understanding replication in databases and distributed systems. In: ICDCS, pp. 464–474 (2000)
Muñoz-Escoí, F.D., Irún-Briz, L., Decker, H.: Database replication protocols. In: Encyclopedia of Database Technologies and Applications, pp. 153–157 (2005)
Kemme, B., Alonso, G.: A new approach to developing and implementing eager database replication protocols. ACM Trans. Database Syst. 25(3), 333–379 (2000)
Kemme, B., Pedone, F., Alonso, G., Schiper, A., Wiesmann, M.: Using optimistic atomic broadcast in transaction processing systems. IEEE TKDE 15(4), 1018–1032 (2003)
Patiño-Martínez, M., Jiménez-Peris, R., Kemme, B., Alonso, G.: MIDDLE-R: Consistent database replication at the middleware level. ACM TOCS 23(4), 375–423 (2005)
Wu, S., Kemme, B.: Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In: ICDE, pp. 422–433. IEEE-CS, Los Alamitos (2005)
Lin, Y., Kemme, B., Patiño-Martínez, M., Jiménez-Peris, R.: Middleware based data replication providing snapshot isolation. In: SIGMOD Conference (2005)
Armendáriz, J.E., Juárez, J.R., Garitagoitia, J.R., González de Mendívil, J.R., Muñoz-Escoí, F.D.: Implementing database replication protocols based on O2PL in a middleware architecture. In: DBA 2006, pp. 176–181 (2006)
Holliday, J., Steinke, R.C., Agrawal, D., Abbadi, A.E.: Epidemic algorithms for replicated databases. IEEE Trans. Knowl. Data Eng. 15(5), 1218–1238 (2003)
Chockler, G., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 427–469 (2001)
Holliday, J.: Replicated database recovery using multicast communication. In: NCA, pp. 104–107. IEEE-CS, Los Alamitos (2001)
Elnikety, S., Pedone, F., Zwaenopoel, W.: Database replication using generalized snapshot isolation. In: SRDS, IEEE-CS, Los Alamitos (2005)
Kemme, B., Bartoli, A., Babaoglu, Ö.: Online reconfiguration in replicated databases based on group communication. In: DSN, pp. 117–130. IEEE-CS, Los Alamitos (2001)
Armendáriz-Iñigo, J.E., González de Mendívil, J.R., Muñoz-Escoí, F.D.: A lock based algorithm for concurrency control and recovery in a middleware replication software architecture. In: HICSS, IEEE Computer Society Press, Los Alamitos (2005)
Armendáriz, J.E., Garitagoitia, J.R., González de Mendívil, J.R., Muñoz-Escoí, F.D.: Design of a MidO2PL database replication protocol in the MADIS middleware architecture. In: AINA, vol. 2, pp. 861–865. IEEE-CS, Los Alamitos (2006)
Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)
Cristian, F.: Understanding fault-tolerant distributed systems. Commun. ACM 34(2), 56–78 (1991)
Jiménez-Peris, R., Patiño-Martínez, M., Alonso, G.: Non-intrusive, parallel recovery of replicated data. In: SRDS, pp. 150–159. IEEE-CS, Los Alamitos (2002)
Berenson, H., Bernstein, P.A., Gray, J., Melton, J., O’Neil, E.J., O’Neil, P.E.: A critique of ANSI SQL isolation levels. In: SIGMOD Conference, pp. 1–10. ACM Press, New York (1995)
Birman, K., Cooper, R., Joseph, T., Marzullo, K., Makpangou, M., Kane, K., Schmuck, F., Wood, M.: The ISIS - system manual, Version 2.1. Technical report, Dept. of Computer Science, Cornell University (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Armendáriz-Iñigo, J.E., Muñoz-Escoí, F.D., Decker, H., Juárez-Rodríguez, J.R., de Mendívil, J.R.G. (2006). A Protocol for Reconciling Recovery and High-Availability in Replicated Databases. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds) Computer and Information Sciences – ISCIS 2006. ISCIS 2006. Lecture Notes in Computer Science, vol 4263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11902140_67
Download citation
DOI: https://doi.org/10.1007/11902140_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47242-1
Online ISBN: 978-3-540-47243-8
eBook Packages: Computer ScienceComputer Science (R0)