Skip to main content
Log in

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

  • Published:
Journal of Central South University Aims and scope Submit manuscript

Abstract

Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/O latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/O latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3–5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5–2 times performance improvements comparing to traditional lock-based synchronization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R] MA: The Enterprise Strategy Group, 2007.

    Google Scholar 

  2. RHEA S, COX R, PESTEREV A. Fast, inexpensive content-addressed storage in foundation [C]// USENIX 2008 ANNUAL Technical Conference (USENIX ATC’ 08). Boston: USENIX Association, 2008: 143–156.

    Google Scholar 

  3. POLICRONIADES C, PRATT I. Alternatives for detecting redundancy in storage systems data [C]// Proceedings of the 2004 USENIX Annual Technical Conference (USENIX ATC’04). Boston: USENIX Association, 2004: 73–86.

    Google Scholar 

  4. MANBER U. Finding similar files in a large file system [C]// Proceedings of the USENIX Winter 1994 Technical Conference. San Fransisco: USENIX Association, 1994: 17–21.

    Google Scholar 

  5. ZIV J, LEMPEL A. A universal algorithm for sequential data compression [J]. IEEE Transactions on Information Theory, 1997, 23(3): 337–343.

    Article  MathSciNet  Google Scholar 

  6. BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R]. MA: The Enterprise Strategy Group, 2007.

    Google Scholar 

  7. ZHU B, LI Kai, PATTERSON H. Avoiding the disk bottleneck in the data domain deduplication file system [C]// Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’ 08). San Jose: USENIX Association, 2008: 1–14

    Google Scholar 

  8. BHAGWAT D, ESHGHI K, LONG D DE, LILLIBRIDGE M. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [C]// Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09). London: IEEE Press, 2009: 1–9.

    Chapter  Google Scholar 

  9. XIA Wen, JIANG Hong, FENG Dan, HUA Yu. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 26–38.

    Google Scholar 

  10. DEBNATH B, SENGUPTA S, LI Jin. ChunkStash: Speeding up inline storage deduplication using flash memory [C]// Proceedings of the 2010 conference on USENIX Annual Technical Conference (USENIX ATC’10). Boston: USENIX Association, 2010: 16–16.

    Google Scholar 

  11. The White Paper. EMC data domain SISL scaling architecture a detailed review [EB/OL]. [2012-02-12]. http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sisl-sclg-arch-wp.pdf.

  12. LIU Chuan-yi, XUE Yi-bo, JU Da-peng, WANG Dong-sheng. A novel optimization method to improve de-duplication storage system performance [C]// 2009 15th International Conference on Parallel and Distributed Systems (ICPADS’09). Hong Kong: IEEE Press, 2009: 228–235.

    Chapter  Google Scholar 

  13. GUO Fang-lu, EFSTATHOPOULOS P. Building a high-performance deduplication system [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 1–14.

    Google Scholar 

  14. XIA Wen, JIANG Hong, FENG Dan, TIAN Lei. Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). San Jose: USENIX Association, 2012: 1–2.

    Google Scholar 

  15. GHARAIBEH A, AL-KISWANY S, GOPALAKRISHNAN S, RIPEANU M. A GPU accelerated storage system [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). Chicago: ACM, 2010: 167–178.

    Chapter  Google Scholar 

  16. BHATOTIA P, RODRIGUES R, VERMA A. Shredder: GPU-accelerated incremental storage and computation [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’ 12). San Jose: USENIX Association, 2012: 157–172.

    Google Scholar 

  17. TRIPLETT J, MCKENNEY P E, WALPOLE J. Resizable, scalable, concurrent hash tables via relativistic programming [C]// Proceedings of the 2011 Conference on USENIX Annual Technical Conference (USENIX ATC’11). Portland: USENIX Association, 2011: 102–116.

    Google Scholar 

  18. MCKENNEY P E, SLINGWINE J D. Read-copy update: Using execution history to solve concurrency problems [C]// 1998 Parallel and Distributed Computing and Systems. Las Vegas: ACTA Press, 1998: 509–518.

    Google Scholar 

  19. TRIPLETT J, MCKENNEY P E, WALPOLE J. Scalable concurrent hash tables via relativistic programming [J]. ACM SIGOPS Operating Systems Review, 2010, 44(3): 102–109.

    Article  Google Scholar 

  20. DUBNICKI C, GRYZ L, HELDT L, KACZMARCZYK M, KILIAN W, STRZELCZAK P, SZCZEPKOWSKI J, UNGUREANU C, WELNICKI M. Hydrastor: A scalable secondary storage [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). San Francisco: USENIX Association, 2009: 197–210.

    Google Scholar 

  21. MUTHITACHAROEN A, CHEN Ben-jie, MAZIERES D. A low-bandwidth network file system [J]. ACM SIGOPS Operating System Review, 2001, 35(4): 174–187.

    Article  Google Scholar 

  22. QUINLAN S, DORWARD S. Venti: A new approach to archival data storage [C]// Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Monterey: USENIX Association, 2002: 4–4.

    Google Scholar 

  23. LILLIBRIDGE M, ESHGHI K, BHAGWAT D, DEOLALIKAR V, TREZISE G, CAMBLE P. Sparse indexing: Large scale, inline deduplication using sampling and locality [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’ 09). San Francisco: USENIX Association, 2009: 111–123.

    Google Scholar 

  24. FAN Li, CAO Pei, ALMEIDA J, BRODER Z. Summary cache: A scalable wide-area web cache sharing protocol [J]. IEEE/ACM Transactions on Networking, 2000, 8(3): 281–293.

    Article  Google Scholar 

  25. CHEN Song-qiao, HUANG Jin-gui, CHEN Jian-er. Approximation algorithm for multiprocessor parallel job scheduling [J]. Journal of Central South University of Technology (English Edition), 2002, 9(4): 267–272.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei-hua Qin  (秦磊华).

Additional information

Foundation item: Project(IRT0725) supported by the Changjiang Innovative Group of Ministry of Education, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, R., Qin, Lh., Zhou, Jl. et al. Using multi-threads to hide deduplication I/O latency with low synchronization overhead. J. Cent. South Univ. 20, 1582–1591 (2013). https://doi.org/10.1007/s11771-013-1650-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11771-013-1650-4

Key words

Navigation