Skip to main content

DARM: A Deduplication-Aware Redundancy Management Approach for Reliable-Enhanced Storage Systems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11335))

Abstract

Chunk-based deduplication has been widely used in storage systems to save storage space. However, deduplication impairs data reliability due to the inter-file chunk sharing. The loss of shared chunks will make these referenced files inaccessible. Meanwhile, we find that inter-file and highly-referenced chunks are important that need higher reliability assurance, but occupy a small fraction of physical storage. Traditional deduplication systems utilize erasure coding or replication techniques to ensure data reliability. With the growth of shared chunks, promoting the reliability of erasure-coded systems incurs large I/O cost because of the weakness of coding scalability. Although replication is easy to scale, it incurs larger storage overhead. In this paper, we present DARM, a Deduplication-Aware Redundancy Management approach via exploiting deduplication semantics (e.g., inter-/intra-file duplicates, chunk size and reference count) to improve data reliability with low overhead. DARM leverages erasure coding for storing unique and low-referenced chunks to improve both storage reliability and space efficiency, and employs Selective and Dynamic Chunk-based Replication (SDCR) for maintaining inter-file and highly-referenced chunks to enhance storage reliability. Experimental results based on real-world datasets show that DARM reduces storage overhead by up to 43.4% and achieves at most 12.7% reliability improvements over the state-of-the-art schemes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Fsl traces and snapshots public archive (2014). http://tracer.filesystems.org

  2. The future of data: Data age 2025 (2017). http://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm

  3. Bairavasundaram, L.N., Goodson, G.R., Pasupathy, S., Schindler, J.: An analysis of latent sector errors in disk drives. In: Proceedings of ACM SIGMETRICS (2007)

    Google Scholar 

  4. Bhagwat, D., Pollack, K., Long, D.D., Schwarz, T., Miller, E.L., Pâris, J.F.: Providing high reliability in a minimum redundancy archival storage system. In: Proceedings of IEEE MASCOTS (2006)

    Google Scholar 

  5. Dubnicki, C., et al.: HYDRAstor: a scalable secondary storage. In: Proceedings of USENIX FAST, pp. 197–210 (2009)

    Google Scholar 

  6. Elerath, J.G., Schindler, J.: Beyond MTTDL: a closed-form raid 6 reliability equation. ACM Trans. Storage (TOS) 10(2), 7 (2014)

    Google Scholar 

  7. Fu, M., et al.: Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In: Proceedings of USENIX ATC (2014)

    Google Scholar 

  8. Fu, M., Lee, P.P., Feng, D., Chen, Z., Xiao, Y.: A simulation analysis of reliability in primary storage deduplication. In: Proceedings of IEEE IISWC, pp. 199–208 (2016)

    Google Scholar 

  9. Greenan, K.M., Plank, J.S., Wylie, J.J.: Mean time to meaningless: MTTDL, Markov models, and storage system reliability. In: Proceedings of USENIX HotStorage (2010)

    Google Scholar 

  10. Li, R., Lee, P.P., Hu, Y.: Degraded-first scheduling for MapReduce in erasure-coded storage clusters. In: Proceedings of IEEE/IFIP DSN (2014)

    Google Scholar 

  11. Li, X., Lillibridge, M., Uysal, M.: Reliability analysis of deduplicated and erasure-coded storage. ACM SIGMETRICS Perform. Eval. Rev. 38(3), 4–9 (2011)

    Article  Google Scholar 

  12. Liu, C., Gu, Y., Sun, L., Yan, B., Wang, D.: R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems. In: Proceedings of ACM ICS (2009)

    Google Scholar 

  13. Ma, A., et al.: RAIDShield: characterizing, monitoring, and proactively protecting against disk failures. ACM TOS 11(4), 17 (2015)

    Google Scholar 

  14. Mao, B., Wu, S., Jiang, H.: Improving storage availability in cloud-of-clouds with hybrid redundant data distribution. In: Proceedings of IEEE IPDPS, pp. 633–642 (2015)

    Google Scholar 

  15. Ng, C.-H., Ma, M., Wong, T.-Y., Lee, P.P.C., Lui, J.C.S.: Live deduplication storage of virtual machine images in an open-source cloud. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 81–100. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25821-3_5

    Chapter  Google Scholar 

  16. Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of USENIX FAST, pp. 17–29 (2007)

    Google Scholar 

  17. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of USENIX FAST (2002)

    Google Scholar 

  18. Rozier, E.W., Sanders, W.H., Zhou, P., Mandagere, N., Uttamchandani, S.M., Yakushev, M.L.: Modeling the fault tolerance consequences of deduplication. In: Proceedings of IEEE SRDS (2011)

    Google Scholar 

  19. Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: Proceedings of USENIX FAST, pp. 1–16 (2007)

    Google Scholar 

  20. Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: iDedup: latency-aware, inline data deduplication for primary storage. In: Proceedings of USENIX FAST (2012)

    Google Scholar 

  21. Vrable, M., Savage, S., Voelker, G.M.: Cumulus: filesystem backup to the cloud. ACM Trans. Storage (TOS) 5(4), 14 (2009)

    Google Scholar 

  22. Wu, S., Li, K.C., Mao, B., Liao, M.: DAC: improving storage availability with deduplication-assisted cloud-of-clouds. FGCS 74, 190–198 (2017)

    Article  Google Scholar 

  23. Xia, W., et al.: A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 104(9), 1681–1710 (2016)

    Article  Google Scholar 

  24. Xiao, M., Hassan, M.A., Xiao, W., Wei, Q., Chen, S.: CodePlugin: plugging deduplication into erasure coding for cloud storage. In: Proceedings of the USENIX Workshop HotCloud, pp. 1–6 (2015)

    Google Scholar 

  25. Xu, M., Zhu, Y., Lee, P.P.C., Xu, Y.: Even data placement for load balance in reliable distributed deduplication storage systems. In: Proceedings of IEEE/ACM IWQoS, pp. 349–358 (2015)

    Google Scholar 

  26. Zhang, Y., et al.: AE: an asymmetric extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication. In: Proceedings of IEEE INFOCOM, pp. 1337–1345 (2015)

    Google Scholar 

  27. Zhou, Y., et al.: A similarity-aware encrypted deduplication scheme with flexible access control in the cloud. Future Gener. Comput. Syst. (FGCS) 84, 177–189 (2017)

    Article  Google Scholar 

  28. Zhou, Y., et al.: SecDep: a user-aware efficient fine-grained secure deduplication scheme with multi-level key management. In: Proceedings of IEEE MSST, pp. 1–14 (2015)

    Google Scholar 

Download references

Acknowledgment

The authors are grateful to the anonymous reviewers. The work was partly supported by the National Natural Science Foundation of China No. U1705261, No. 61772222 and 61502190; Shenzhen Research Funding of Science and Technology - Fundamental Research (Free exploration) JCYJ20170307172447622. This work was also supported by Engineering Research Center of data storage systems and Technology, Ministry of Education, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Feng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y., Feng, D., Xia, W., Fu, M., Xiao, Y. (2018). DARM: A Deduplication-Aware Redundancy Management Approach for Reliable-Enhanced Storage Systems. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11335. Springer, Cham. https://doi.org/10.1007/978-3-030-05054-2_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05054-2_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05053-5

  • Online ISBN: 978-3-030-05054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics