Abstract
Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/O latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/O latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3–5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5–2 times performance improvements comparing to traditional lock-based synchronization methods.
Similar content being viewed by others
References
BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R] MA: The Enterprise Strategy Group, 2007.
RHEA S, COX R, PESTEREV A. Fast, inexpensive content-addressed storage in foundation [C]// USENIX 2008 ANNUAL Technical Conference (USENIX ATC’ 08). Boston: USENIX Association, 2008: 143–156.
POLICRONIADES C, PRATT I. Alternatives for detecting redundancy in storage systems data [C]// Proceedings of the 2004 USENIX Annual Technical Conference (USENIX ATC’04). Boston: USENIX Association, 2004: 73–86.
MANBER U. Finding similar files in a large file system [C]// Proceedings of the USENIX Winter 1994 Technical Conference. San Fransisco: USENIX Association, 1994: 17–21.
ZIV J, LEMPEL A. A universal algorithm for sequential data compression [J]. IEEE Transactions on Information Theory, 1997, 23(3): 337–343.
BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R]. MA: The Enterprise Strategy Group, 2007.
ZHU B, LI Kai, PATTERSON H. Avoiding the disk bottleneck in the data domain deduplication file system [C]// Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’ 08). San Jose: USENIX Association, 2008: 1–14
BHAGWAT D, ESHGHI K, LONG D DE, LILLIBRIDGE M. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [C]// Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09). London: IEEE Press, 2009: 1–9.
XIA Wen, JIANG Hong, FENG Dan, HUA Yu. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 26–38.
DEBNATH B, SENGUPTA S, LI Jin. ChunkStash: Speeding up inline storage deduplication using flash memory [C]// Proceedings of the 2010 conference on USENIX Annual Technical Conference (USENIX ATC’10). Boston: USENIX Association, 2010: 16–16.
The White Paper. EMC data domain SISL scaling architecture a detailed review [EB/OL]. [2012-02-12]. http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sisl-sclg-arch-wp.pdf.
LIU Chuan-yi, XUE Yi-bo, JU Da-peng, WANG Dong-sheng. A novel optimization method to improve de-duplication storage system performance [C]// 2009 15th International Conference on Parallel and Distributed Systems (ICPADS’09). Hong Kong: IEEE Press, 2009: 228–235.
GUO Fang-lu, EFSTATHOPOULOS P. Building a high-performance deduplication system [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 1–14.
XIA Wen, JIANG Hong, FENG Dan, TIAN Lei. Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). San Jose: USENIX Association, 2012: 1–2.
GHARAIBEH A, AL-KISWANY S, GOPALAKRISHNAN S, RIPEANU M. A GPU accelerated storage system [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). Chicago: ACM, 2010: 167–178.
BHATOTIA P, RODRIGUES R, VERMA A. Shredder: GPU-accelerated incremental storage and computation [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’ 12). San Jose: USENIX Association, 2012: 157–172.
TRIPLETT J, MCKENNEY P E, WALPOLE J. Resizable, scalable, concurrent hash tables via relativistic programming [C]// Proceedings of the 2011 Conference on USENIX Annual Technical Conference (USENIX ATC’11). Portland: USENIX Association, 2011: 102–116.
MCKENNEY P E, SLINGWINE J D. Read-copy update: Using execution history to solve concurrency problems [C]// 1998 Parallel and Distributed Computing and Systems. Las Vegas: ACTA Press, 1998: 509–518.
TRIPLETT J, MCKENNEY P E, WALPOLE J. Scalable concurrent hash tables via relativistic programming [J]. ACM SIGOPS Operating Systems Review, 2010, 44(3): 102–109.
DUBNICKI C, GRYZ L, HELDT L, KACZMARCZYK M, KILIAN W, STRZELCZAK P, SZCZEPKOWSKI J, UNGUREANU C, WELNICKI M. Hydrastor: A scalable secondary storage [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). San Francisco: USENIX Association, 2009: 197–210.
MUTHITACHAROEN A, CHEN Ben-jie, MAZIERES D. A low-bandwidth network file system [J]. ACM SIGOPS Operating System Review, 2001, 35(4): 174–187.
QUINLAN S, DORWARD S. Venti: A new approach to archival data storage [C]// Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Monterey: USENIX Association, 2002: 4–4.
LILLIBRIDGE M, ESHGHI K, BHAGWAT D, DEOLALIKAR V, TREZISE G, CAMBLE P. Sparse indexing: Large scale, inline deduplication using sampling and locality [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’ 09). San Francisco: USENIX Association, 2009: 111–123.
FAN Li, CAO Pei, ALMEIDA J, BRODER Z. Summary cache: A scalable wide-area web cache sharing protocol [J]. IEEE/ACM Transactions on Networking, 2000, 8(3): 281–293.
CHEN Song-qiao, HUANG Jin-gui, CHEN Jian-er. Approximation algorithm for multiprocessor parallel job scheduling [J]. Journal of Central South University of Technology (English Edition), 2002, 9(4): 267–272.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Project(IRT0725) supported by the Changjiang Innovative Group of Ministry of Education, China
Rights and permissions
About this article
Cite this article
Zhu, R., Qin, Lh., Zhou, Jl. et al. Using multi-threads to hide deduplication I/O latency with low synchronization overhead. J. Cent. South Univ. 20, 1582–1591 (2013). https://doi.org/10.1007/s11771-013-1650-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11771-013-1650-4