A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems

  • Rongyu Lai
  • Yu Hua
  • Dan Feng
  • Wen Xia
  • Min Fu
  • Yifan Yang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8630)

Abstract

Cloud backup systems leverage data deduplication to remove duplicate chunks that are shared by many versions. The duplicate chunks are replaced with the references to old chunks via deduplication, instead of being uploaded to the cloud. The consecutive chunks in backup streams are actually stored dispersedly in several segments (the storage unit in the cloud), which results in fragmentation for restore. The segments that are referred will be downloaded from the cloud when the users want to restore the chunks of the latest version, and some chunks that are not referred will be downloaded together, thus jeopardizing the restore performance. In order to address this problem, we propose a near-exact defragmentation scheme, called NED, for deduplication based cloud backups. The idea behind NED is to compute the ratio of the length of chunks referred by current data stream in a segment to the segment length. If the ratio is smaller than a threshold, the chunks in the data stream that refer to the segment will be labeled as fragments and written to new segments. By efficiently identifying fragmented chunks, NED significantly reduces the number of segments for restore with slight decrease of deduplication ratio. Experiment results based on real-world datasets demonstrate that NED effectively improves the restore performance by 6%~105% at the cost of 0.1%~6.5% decrease in terms of deduplication ratio.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: Hydrastor: A scalable secondary storage. In: Proccedings of the 7th Conference on File and Storage Technologies, pp. 197–210. USENIX (2009)Google Scholar
  2. 2.
    Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Xia, W., Huang, F., Liu, Q.: Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 181–192 (2014)Google Scholar
  3. 3.
    Guo, F., Efstathopoulos, P.: Building a high-performance deduplication system. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference. USENIX (2011)Google Scholar
  4. 4.
    Kaczmarczyk, M., Barczynski, M., Kilian, W., Dubnicki, C.: Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th Annual International Systems and Storage Conference, pp. 15:1–15:12. ACM (2012)Google Scholar
  5. 5.
    Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 2013), pp. 183–197. USENIX (2013)Google Scholar
  6. 6.
    Nam, Y.J., Park, D., Du, D.H.C.: Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 201–208. IEEE (2012)Google Scholar
  7. 7.
    Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: idedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies. USENIX (2012)Google Scholar
  8. 8.
    Tan, Y., Jiang, H., Feng, D., Tian, L., Yan, Z.: Cabdedupe: A causality-based deduplication performance booster for cloud backup services. In: Parallel & Distributed Processing Symposium (IPDPS), pp. 1266–1277. IEEE (2011)Google Scholar
  9. 9.
    Vrable, M., Savage, S., Voelker, G.M.: Cumulus: Filesystem backup to the cloud. Trans. Storage 5(4), 14:1–14:28 (2009)Google Scholar
  10. 10.
    Xia, W., Jiang, H., Feng, D., Hua, Y.: Silo: A similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: USENIX Annual Technical Conference (2011)Google Scholar
  11. 11.
    Xia, W., Jiang, H., Feng, D., Tian, L.: Combining deduplication and delta compression to achieve low-overhead data reduction on backup datasets. In: Data Compression Conference (DCC 2014), pp. 203–212 (2014)Google Scholar
  12. 12.
    Xu, Q., Zhao, L., Xiao, M., Liu, A., Dai, Y.: Yurubackup: A space-efficient and highly scalable incremental backup system in the cloud. International Journal of Parallel Programming, 1–23 (2013)Google Scholar
  13. 13.
    Zhan, D., Jiang, H., Seth, S.: Exploiting set-level non-uniformity of capacity demand to enhance cmp cooperative caching. In: 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–10 (2010)Google Scholar
  14. 14.
    Zhan, D., Jiang, H., Seth, S.: Stem: Spatiotemporal management of capacity for intra-core last level caches. In: 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 163–174 (2010)Google Scholar
  15. 15.
    Zhan, D., Jiang, H., Seth, S.C.: Locality & utility co-optimization for practical capacity management of shared last level caches. In: Proceedings of the 26th ACM International Conference on Supercomputing, pp. 279–290. ACM (2012)Google Scholar
  16. 16.
    Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, pp. 18:1–18:14. USENIX (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Rongyu Lai
    • 1
  • Yu Hua
    • 1
  • Dan Feng
    • 1
  • Wen Xia
    • 1
  • Min Fu
    • 1
  • Yifan Yang
    • 1
  1. 1.Wuhan National Laboratory for Optoelectronics (WNLO)Huazhong University of Science and TechnologyWuhanChina

Personalised recommendations