Instant Restore After a Media Failure

Sauer, Caetano; Graefe, Goetz; Härder, Theo

doi:10.1007/978-3-319-66917-5_21

Instant Restore After a Media Failure

Caetano Sauer¹⁶,
Goetz Graefe¹⁷ &
Theo Härder¹⁶

Conference paper
First Online: 25 August 2017

1083 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10509))

Abstract

Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore—even data not yet restored—by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds.

This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second. The net effect is that a few “nines” of availability are added to the system using simple and low-overhead software techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://github.com/caetanosauer/zero.

References

Arulraj, J., Pavlo, A., Dulloor, S.: Let’s talk about storage & recovery methods for non-volatile memory database systems. In: Proceedings of SIGMOD, pp. 707–722 (2015)
Google Scholar
Bitton, D., Gray, J.: Disk shadowing. In: Proceedings of VLDB, pp. 331–338 (1988)
Google Scholar
Chen, P.M., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv. 26(2), 145–185 (1994)
Article Google Scholar
Eich, M.H.: A classification and comparison of main memory database recovery techniques. In: Proceedings of ICDE, pp. 332–339 (1987)
Google Scholar
GLIBC: The GNU C Library Reference Manual (2014), http://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html. Accessed 06 Oct 2014
Graefe, G., Guy, W., Sauer, C.: Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, Media Restore, and System Failover, 2nd edn. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2016)
Google Scholar
Graefe, G., Kimura, H., Kuno, H.A.: Foster B-trees. ACM Trans. Database Syst. 37(3), 17 (2012)
Article Google Scholar
Graefe, G., Kuno, H.A.: Definition, detection, and recovery of single-page failures, a fourth class of database failures. PVLDB 5(7), 646–655 (2012)
Google Scholar
Graefe, G., Kuno, H.A., Seeger, B.: Self-diagnosing and self-healing indexes. In: Proceedings of DBTest, p. 8 (2012)
Google Scholar
Gray, J.N.: Notes on data base operating systems. In: Bayer, R., Graham, R.M., Seegmüller, G. (eds.) Operating Systems. LNCS, vol. 60, pp. 393–481. Springer, Heidelberg (1978). doi:10.1007/3-540-08755-9_9
Chapter Google Scholar
Gray, J.: Why do computers stop and what can be done about it? In: Symposium on Reliability in Distributed Software and Database Systems, pp. 3–12 (1986)
Google Scholar
Gray, J.: What next?: a dozen information-technology research goals. J. ACM 50(1), 41–57 (2003)
Article Google Scholar
Haderle, D.J., Majithia, T.: Fast log apply, US Patent 6,289,355, 11 September 2001
Google Scholar
Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)
Article MathSciNet Google Scholar
Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: Proceedings of EDBT, pp. 24–35 (2009)
Google Scholar
Lehman, T.J., Carey, M.J.: A recovery algorithm for a high-performance memory-resident database system. In: Proceedings of SIGMOD, pp. 104–117 (1987)
Google Scholar
Levy, E., Silberschatz, A.: Incremental recovery in main memory database systems. IEEE Trans. Knowl. Data Eng. 4(6), 529–540 (1992)
Article Google Scholar
Malviya, N., Weisberg, A., Madden, S., Stonebraker, M.: Rethinking main memory OLTP recovery. In: Proceedings of ICDE, pp. 604–615 (2014)
Google Scholar
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17(1), 94–162 (1992)
Article Google Scholar
Mohan, C., Narang, I.: An efficient and flexible method for archiving a data base. SIGMOD Rec. 22(2), 139–146 (1993)
Article Google Scholar
Mohan, C., Treiber, K., Obermarck, R.: Algorithms for the management of remote backup data bases for disaster recovery. In: Proceedings of ICDE, pp. 511–518 (1993)
Google Scholar
Oracle Corporation: RMAN Incremental Backups, Oracle Database Documentation 10g, Sect. 4.4 (2015)
Google Scholar
Oukid, I., et al.: SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery. In: Proceedings of DaMoN, pp. 8:1–8:7 (2014)
Google Scholar
Sauer, C., Graefe, G., Härder, T.: Single-pass restore after a media failure. In: Proceedings of BTW. LNI, vol. 241, pp. 217–236 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

TU Kaiserslautern, Kaiserslautern, Germany
Caetano Sauer & Theo Härder
Google, Madison, WI, USA
Goetz Graefe

Authors

Caetano Sauer
View author publications
You can also search for this author in PubMed Google Scholar
Goetz Graefe
View author publications
You can also search for this author in PubMed Google Scholar
Theo Härder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Caetano Sauer .

Editor information

Editors and Affiliations

Riga Technical University , Riga, Latvia
Mārīte Kirikova
Norwegian University of Science and Technology, Trondheim, Norway
Kjetil Nørvåg
University of Cyprus , Nicosia, Cyprus
George A. Papadopoulos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sauer, C., Graefe, G., Härder, T. (2017). Instant Restore After a Media Failure. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-66917-5_21
Published: 25 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66916-8
Online ISBN: 978-3-319-66917-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics