Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Zhu, Rui; Qin, Lei-hua; Zhou, Jing-li; Zheng, Huan

doi:10.1007/s11771-013-1650-4

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Published: 07 June 2013

Volume 20, pages 1582–1591, (2013)
Cite this article

Journal of Central South University Aims and scope Submit manuscript

Rui Zhu (朱锐)^1,2,
Lei-hua Qin (秦磊华)^1,2,
Jing-li Zhou (周敬利)^1,2 &
…
Huan Zheng (郑寰)^1,2

112 Accesses
3 Citations
Explore all metrics

Abstract

Data deduplication, as a compression method, has been widely used in most backup systems to improve bandwidth and space efficiency. As data exploded to be backed up, two main challenges in data deduplication are the CPU-intensive chunking and hashing works and the I/O intensive disk-index access latency. However, CPU-intensive works have been vastly parallelized and speeded up by multi-core and many-core processors; the I/O latency is likely becoming the bottleneck in data deduplication. To alleviate the challenge of I/O latency in multi-core systems, multi-threaded deduplication (Multi-Dedup) architecture was proposed. The main idea of Multi-Dedup was using parallel deduplication threads to hide the I/O latency. A prefix based concurrent index was designed to maintain the internal consistency of the deduplication index with low synchronization overhead. On the other hand, a collisionless cache array was also designed to preserve locality and similarity within the parallel threads. In various real-world datasets experiments, Multi-Dedup achieves 3–5 times performance improvements incorporating with locality-based ChunkStash and local-similarity based SiLo methods. In addition, Multi-Dedup has dramatically decreased the synchronization overhead and achieves 1.5–2 times performance improvements comparing to traditional lock-based synchronization methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

G-Paradex: GPU-Based Parallel Indexing for Fast Data Deduplication

CLMS: Configurable and Lightweight Metadata Service for Parallel File Systems on NVMe SSDs

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

Article 13 January 2017

References

BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R] MA: The Enterprise Strategy Group, 2007.
Google Scholar
RHEA S, COX R, PESTEREV A. Fast, inexpensive content-addressed storage in foundation [C]// USENIX 2008 ANNUAL Technical Conference (USENIX ATC’ 08). Boston: USENIX Association, 2008: 143–156.
Google Scholar
POLICRONIADES C, PRATT I. Alternatives for detecting redundancy in storage systems data [C]// Proceedings of the 2004 USENIX Annual Technical Conference (USENIX ATC’04). Boston: USENIX Association, 2004: 73–86.
Google Scholar
MANBER U. Finding similar files in a large file system [C]// Proceedings of the USENIX Winter 1994 Technical Conference. San Fransisco: USENIX Association, 1994: 17–21.
Google Scholar
ZIV J, LEMPEL A. A universal algorithm for sequential data compression [J]. IEEE Transactions on Information Theory, 1997, 23(3): 337–343.
Article MathSciNet Google Scholar
BIGGAR H. Experiencing data de-duplication: Improving efficiency and reducing capacity requirements [R]. MA: The Enterprise Strategy Group, 2007.
Google Scholar
ZHU B, LI Kai, PATTERSON H. Avoiding the disk bottleneck in the data domain deduplication file system [C]// Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST’ 08). San Jose: USENIX Association, 2008: 1–14
Google Scholar
BHAGWAT D, ESHGHI K, LONG D DE, LILLIBRIDGE M. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [C]// Proceedings of 17th IEEE/ACM International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’09). London: IEEE Press, 2009: 1–9.
Chapter Google Scholar
XIA Wen, JIANG Hong, FENG Dan, HUA Yu. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 26–38.
Google Scholar
DEBNATH B, SENGUPTA S, LI Jin. ChunkStash: Speeding up inline storage deduplication using flash memory [C]// Proceedings of the 2010 conference on USENIX Annual Technical Conference (USENIX ATC’10). Boston: USENIX Association, 2010: 16–16.
Google Scholar
The White Paper. EMC data domain SISL scaling architecture a detailed review [EB/OL]. [2012-02-12]. http://www.emc.com/collateral/hardware/white-papers/h7221-data-domain-sisl-sclg-arch-wp.pdf.
LIU Chuan-yi, XUE Yi-bo, JU Da-peng, WANG Dong-sheng. A novel optimization method to improve de-duplication storage system performance [C]// 2009 15th International Conference on Parallel and Distributed Systems (ICPADS’09). Hong Kong: IEEE Press, 2009: 228–235.
Chapter Google Scholar
GUO Fang-lu, EFSTATHOPOULOS P. Building a high-performance deduplication system [C]// Proceedings of the 2011 conference on USENIX Annual Technical conference (USENIX ATC’11). Portland: USENIX Association, 2011: 1–14.
Google Scholar
XIA Wen, JIANG Hong, FENG Dan, TIAN Lei. Accelerating Data Deduplication by Exploiting Pipelining and Parallelism with Multicore or Manycore Processors [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). San Jose: USENIX Association, 2012: 1–2.
Google Scholar
GHARAIBEH A, AL-KISWANY S, GOPALAKRISHNAN S, RIPEANU M. A GPU accelerated storage system [C]// Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC’10). Chicago: ACM, 2010: 167–178.
Chapter Google Scholar
BHATOTIA P, RODRIGUES R, VERMA A. Shredder: GPU-accelerated incremental storage and computation [C]// Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’ 12). San Jose: USENIX Association, 2012: 157–172.
Google Scholar
TRIPLETT J, MCKENNEY P E, WALPOLE J. Resizable, scalable, concurrent hash tables via relativistic programming [C]// Proceedings of the 2011 Conference on USENIX Annual Technical Conference (USENIX ATC’11). Portland: USENIX Association, 2011: 102–116.
Google Scholar
MCKENNEY P E, SLINGWINE J D. Read-copy update: Using execution history to solve concurrency problems [C]// 1998 Parallel and Distributed Computing and Systems. Las Vegas: ACTA Press, 1998: 509–518.
Google Scholar
TRIPLETT J, MCKENNEY P E, WALPOLE J. Scalable concurrent hash tables via relativistic programming [J]. ACM SIGOPS Operating Systems Review, 2010, 44(3): 102–109.
Article Google Scholar
DUBNICKI C, GRYZ L, HELDT L, KACZMARCZYK M, KILIAN W, STRZELCZAK P, SZCZEPKOWSKI J, UNGUREANU C, WELNICKI M. Hydrastor: A scalable secondary storage [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’09). San Francisco: USENIX Association, 2009: 197–210.
Google Scholar
MUTHITACHAROEN A, CHEN Ben-jie, MAZIERES D. A low-bandwidth network file system [J]. ACM SIGOPS Operating System Review, 2001, 35(4): 174–187.
Article Google Scholar
QUINLAN S, DORWARD S. Venti: A new approach to archival data storage [C]// Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Monterey: USENIX Association, 2002: 4–4.
Google Scholar
LILLIBRIDGE M, ESHGHI K, BHAGWAT D, DEOLALIKAR V, TREZISE G, CAMBLE P. Sparse indexing: Large scale, inline deduplication using sampling and locality [C]// Proceedings of the 7th USENIX Conference on File and Storage Technologies (FAST’ 09). San Francisco: USENIX Association, 2009: 111–123.
Google Scholar
FAN Li, CAO Pei, ALMEIDA J, BRODER Z. Summary cache: A scalable wide-area web cache sharing protocol [J]. IEEE/ACM Transactions on Networking, 2000, 8(3): 281–293.
Article Google Scholar
CHEN Song-qiao, HUANG Jin-gui, CHEN Jian-er. Approximation algorithm for multiprocessor parallel job scheduling [J]. Journal of Central South University of Technology (English Edition), 2002, 9(4): 267–272.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Huazhong University of Science and Technology, Wuhan, 430074, China
Rui Zhu (朱锐), Lei-hua Qin (秦磊华), Jing-li Zhou (周敬利) & Huan Zheng (郑寰)
Wuhan National Lab for Optoelectronics, Wuhan, 430074, China
Rui Zhu (朱锐), Lei-hua Qin (秦磊华), Jing-li Zhou (周敬利) & Huan Zheng (郑寰)

Authors

Rui Zhu (朱锐)
View author publications
You can also search for this author in PubMed Google Scholar
Lei-hua Qin (秦磊华)
View author publications
You can also search for this author in PubMed Google Scholar
Jing-li Zhou (周敬利)
View author publications
You can also search for this author in PubMed Google Scholar
Huan Zheng (郑寰)
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei-hua Qin (秦磊华).

Additional information

Foundation item: Project(IRT0725) supported by the Changjiang Innovative Group of Ministry of Education, China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, R., Qin, Lh., Zhou, Jl. et al. Using multi-threads to hide deduplication I/O latency with low synchronization overhead. J. Cent. South Univ. 20, 1582–1591 (2013). https://doi.org/10.1007/s11771-013-1650-4

Download citation

Received: 13 March 2012
Accepted: 13 July 2012
Published: 07 June 2013
Issue Date: June 2013
DOI: https://doi.org/10.1007/s11771-013-1650-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Abstract

Access this article

Similar content being viewed by others

G-Paradex: GPU-Based Parallel Indexing for Fast Data Deduplication

CLMS: Configurable and Lightweight Metadata Service for Parallel File Systems on NVMe SSDs

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Using multi-threads to hide deduplication I/O latency with low synchronization overhead

Abstract

Access this article

Similar content being viewed by others

G-Paradex: GPU-Based Parallel Indexing for Fast Data Deduplication

CLMS: Configurable and Lightweight Metadata Service for Parallel File Systems on NVMe SSDs

Performance Improvement of MapReduce for Heterogeneous Clusters Based on Efficient Locality and Replica Aware Scheduling (ELRAS) Strategy

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation