Skip to main content

Instant Restore After a Media Failure

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10509))

Abstract

Media failures usually leave database systems unavailable for several hours until recovery is complete, especially in applications with large devices and high transaction volume. Previous work introduced a technique called single-pass restore, which increases restore bandwidth and thus substantially decreases time to repair. Instant restore goes further as it permits read/write access to any data on a device undergoing restore—even data not yet restored—by restoring individual data segments on demand. Thus, the restore process is guided primarily by the needs of applications, and the observed mean time to repair is effectively reduced from several hours to a few seconds.

This paper presents an implementation and evaluation of instant restore. The technique is incrementally implemented on a system starting with the traditional ARIES design for logging and recovery. Experiments show that the transaction latency perceived after a media failure can be cut down to less than a second. The net effect is that a few “nines” of availability are added to the system using simple and low-overhead software techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://github.com/caetanosauer/zero.

References

  1. Arulraj, J., Pavlo, A., Dulloor, S.: Let’s talk about storage & recovery methods for non-volatile memory database systems. In: Proceedings of SIGMOD, pp. 707–722 (2015)

    Google Scholar 

  2. Bitton, D., Gray, J.: Disk shadowing. In: Proceedings of VLDB, pp. 331–338 (1988)

    Google Scholar 

  3. Chen, P.M., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv. 26(2), 145–185 (1994)

    Article  Google Scholar 

  4. Eich, M.H.: A classification and comparison of main memory database recovery techniques. In: Proceedings of ICDE, pp. 332–339 (1987)

    Google Scholar 

  5. GLIBC: The GNU C Library Reference Manual (2014), http://www.gnu.org/software/libc/manual/html_node/Renaming-Files.html. Accessed 06 Oct 2014

  6. Graefe, G., Guy, W., Sauer, C.: Instant Recovery with Write-Ahead Logging: Page Repair, System Restart, Media Restore, and System Failover, 2nd edn. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2016)

    Google Scholar 

  7. Graefe, G., Kimura, H., Kuno, H.A.: Foster B-trees. ACM Trans. Database Syst. 37(3), 17 (2012)

    Article  Google Scholar 

  8. Graefe, G., Kuno, H.A.: Definition, detection, and recovery of single-page failures, a fourth class of database failures. PVLDB 5(7), 646–655 (2012)

    Google Scholar 

  9. Graefe, G., Kuno, H.A., Seeger, B.: Self-diagnosing and self-healing indexes. In: Proceedings of DBTest, p. 8 (2012)

    Google Scholar 

  10. Gray, J.N.: Notes on data base operating systems. In: Bayer, R., Graham, R.M., Seegmüller, G. (eds.) Operating Systems. LNCS, vol. 60, pp. 393–481. Springer, Heidelberg (1978). doi:10.1007/3-540-08755-9_9

    Chapter  Google Scholar 

  11. Gray, J.: Why do computers stop and what can be done about it? In: Symposium on Reliability in Distributed Software and Database Systems, pp. 3–12 (1986)

    Google Scholar 

  12. Gray, J.: What next?: a dozen information-technology research goals. J. ACM 50(1), 41–57 (2003)

    Article  Google Scholar 

  13. Haderle, D.J., Majithia, T.: Fast log apply, US Patent 6,289,355, 11 September 2001

    Google Scholar 

  14. Härder, T., Reuter, A.: Principles of transaction-oriented database recovery. ACM Comput. Surv. 15(4), 287–317 (1983)

    Article  MathSciNet  Google Scholar 

  15. Johnson, R., Pandis, I., Hardavellas, N., Ailamaki, A., Falsafi, B.: Shore-MT: a scalable storage manager for the multicore era. In: Proceedings of EDBT, pp. 24–35 (2009)

    Google Scholar 

  16. Lehman, T.J., Carey, M.J.: A recovery algorithm for a high-performance memory-resident database system. In: Proceedings of SIGMOD, pp. 104–117 (1987)

    Google Scholar 

  17. Levy, E., Silberschatz, A.: Incremental recovery in main memory database systems. IEEE Trans. Knowl. Data Eng. 4(6), 529–540 (1992)

    Article  Google Scholar 

  18. Malviya, N., Weisberg, A., Madden, S., Stonebraker, M.: Rethinking main memory OLTP recovery. In: Proceedings of ICDE, pp. 604–615 (2014)

    Google Scholar 

  19. Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P.: ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17(1), 94–162 (1992)

    Article  Google Scholar 

  20. Mohan, C., Narang, I.: An efficient and flexible method for archiving a data base. SIGMOD Rec. 22(2), 139–146 (1993)

    Article  Google Scholar 

  21. Mohan, C., Treiber, K., Obermarck, R.: Algorithms for the management of remote backup data bases for disaster recovery. In: Proceedings of ICDE, pp. 511–518 (1993)

    Google Scholar 

  22. Oracle Corporation: RMAN Incremental Backups, Oracle Database Documentation 10g, Sect. 4.4 (2015)

    Google Scholar 

  23. Oukid, I., et al.: SOFORT: a hybrid SCM-DRAM storage engine for fast data recovery. In: Proceedings of DaMoN, pp. 8:1–8:7 (2014)

    Google Scholar 

  24. Sauer, C., Graefe, G., Härder, T.: Single-pass restore after a media failure. In: Proceedings of BTW. LNI, vol. 241, pp. 217–236 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Caetano Sauer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sauer, C., Graefe, G., Härder, T. (2017). Instant Restore After a Media Failure. In: Kirikova, M., Nørvåg, K., Papadopoulos, G. (eds) Advances in Databases and Information Systems. ADBIS 2017. Lecture Notes in Computer Science(), vol 10509. Springer, Cham. https://doi.org/10.1007/978-3-319-66917-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66917-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66916-8

  • Online ISBN: 978-3-319-66917-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics