A Self-Matching Sliding Block Algorithm Applied to Deduplication in Distributed Storage System

  • Chuiyi XieEmail author
  • Ying Huo
  • Sihan Qing
  • Shoushan Luo
  • Lingli Hu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9543)


The deduplication technology can significantly reduce the amount of storage in data centers, thus to save network bandwidth and decrease the cost of construction and maintenance. Having inspired by the sliding block method of the Sliding Block (SB) algorithm and independent block-dividing thought of the Content Defined Chunking (CDC) algorithm, a Self-Matching Sliding Block (SMSB) algorithm for deduplication is proposed. Via communication with metadata node, the storage system client builds a matching table in local memory that contains fingerprint and checksum, based on the matching table to realize sliding block self-matching so as to detect the duplicate blocks. The experimental results show that the deduplication rate and the disk space utilization rate of SMSB algorithm is respectively 2.03 times and 1.28 times of the CDC algorithm and that the data processing speed is 0.83 times of the CDC algorithm. The SMSB algorithm is suitable for distributed storage system.


Distributed storage Deduplication Sliding block algorithm Rabin fingerprint Adler-32 checksum 



This study is supported by National Natural Science Foundation of China (61170282), Guangdong Laboratory Research Foundation (GDJ2014081), Shaoguan Innovation Foundation (2012CX/K123), Scientific Research Project of Shaoguan University (201216), Discipline Construction Project of Guangdong Province (2013KJCX0168), and Guangdong Natural Science Foundation (2014A030307029).


  1. 1.
    Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. Proc. ACM Symp. Oper. Syst. Principles 35(5), 174–187 (2001)CrossRefGoogle Scholar
  2. 2.
    Cox, L.P., Murray, C.D., Noble, B.D.: Pastiche: making backup cheap and easy. ACM SIGOPS Oper. Syst. Rev. 36, 285–298 (2002)CrossRefGoogle Scholar
  3. 3.
    You, L.L., Pollack, K.T., Long, D.D.E.: Deep store: an archival storage system architecture. In: 2014 IEEE 30th International Conference on Data Engineering, pp. 804–815. IEEE Press, New York (2005)Google Scholar
  4. 4.
    Eshghi, K., Tang, H.K.: A framework for analyzing and improving content-based chunking algorithms. Technical report, Hewlett-Packard Labs (2005)Google Scholar
  5. 5.
    Lu, G.L., Jin, Y., Du, H.C.: Frequency based chunking for data de-duplication. In: Modeling Analysis & Simulation of Computer & Telecommunication Systems, pp.287–296. IEEE Press, New York (2010)Google Scholar
  6. 6.
    Zhang, Y.C., Jiang, H., Feng, D., Xia, W., Fu, M., Huang, F.T., Zhou, Y.K.: AE: an asymmetric extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp.1337–1345. IEEE Press, Kowloon (2015)Google Scholar
  7. 7.
    Yu, C., Zhang, C., Mao Y., Li, F.L.: Leap-based content defined chunking — theory and implementation. In: 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), pp.1–12. IEEE Press, Santa Clara (2015)Google Scholar
  8. 8.
    Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Trans. Storage 2(4), 424–448 (2006)CrossRefGoogle Scholar
  9. 9.
    Wang, G.P., Chen, S.Y., Lin, M.W., Liu, X.W.: SBBS: a sliding blocking algorithm with backtracking sub-blocks for duplicate data detection. Expert Syst. Appl. 41, 2415–2423 (2014)CrossRefGoogle Scholar
  10. 10.
    Zhu, G.F., Zhang X.J., Wang, L., Zhu, Y.G., Dong, X.S.: An intelligent data deduplication based backup system. In: 2012 15th International Conference on Network-Based Information Systems (NBiS), pp. 771–776. IEEE Press, New York (2012)Google Scholar
  11. 11.
    Rabin, M.: Fingerprint by random polynomials. Technical Report, Center for Research in Computing Technology, Harvard University (1981)Google Scholar
  12. 12.
    Deutsch, L.P., Gailly, J.L.: RFC 1950: ZLIB compressed data format specification version 3. In: RFC 1950, Aladdin Enterprises, Info-ZIP (1996)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Chuiyi Xie
    • 1
    • 2
    Email author
  • Ying Huo
    • 2
  • Sihan Qing
    • 3
    • 4
    • 5
  • Shoushan Luo
    • 1
  • Lingli Hu
    • 2
  1. 1.National Engineering Laboratory for Disaster Backup and RecoveryBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.Department of Information and Computing ScienceShaoguan UniversityShaoguanChina
  3. 3.Institute of SoftwareChinese Academy of SciencesBeijingChina
  4. 4.Institute of Information EngineeringChinese Academy of SciencesBeijingChina
  5. 5.School of Software and MicroelectronicsPeking UniversityBeijingChina

Personalised recommendations