Skip to main content

A Deduplication Algorithm Based on Data Similarity and Delta Encoding

  • Conference paper
  • First Online:
Geo-Spatial Knowledge and Intelligence (GRMSE 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 699))

  • 1123 Accesses

Abstract

Satellite applications such as remote sensing application are overwhelmed with vast quantities of data. Nevertheless, the storage resources in the satellite are so limited that it should be used more efficient. The similarity between the remote sensing data is high, but the dissimilar parts of the data distribute irregularly. When using the traditional deduplication algorithm to split the file into chunks, a large amount of chunks are exactly similar but not the same, which results in the bad effect of data deduplication. We propose a deduplication algorithm based on data similarity and delta encoding to reduce the usage of storage resources. The data similarity analysis can find out the similar data. The delta encoding technology can reduce the usage of storage resources. Through experiments on remote sensing application data, we have achieved deduplication ratios up to 30:1, and analyzed how the chunksize affect the experiment results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, L., Ma, Y., Zomaya, A.Y., et al.: A parallel file system with application-aware data layout policies for massive remote sensing image processing in digital earth. IEEE Trans. Parallel Distrib. Syst. 26(6), 1497–1508 (2015)

    Article  Google Scholar 

  2. Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. ACM Trans. Storage (TOS) 7(4), 14 (2012)

    Google Scholar 

  3. Rivest, R.: The MD5 message-digest algorithm. RFC Editor (1992)

    Google Scholar 

  4. Eastlake 3rd, D., Jones, P.: US secure hash algorithm 1 (SHA1) (2001)

    Google Scholar 

  5. Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)

    Google Scholar 

  6. Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Trans. Storage 2(4), 424–448 (2006)

    Article  Google Scholar 

  7. Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal content defined chunking for backup streams. In: FAST, pp. 239–252 (2010)

    Google Scholar 

  8. Manogar, E., Abirami, S.: A study on data deduplication techniques for optimized storage. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 161–166. IEEE (2014)

    Google Scholar 

  9. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  10. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  11. Hunt, J.J., Vo, K.P., Tichy, W.F.: An empirical study of delta algorithms. In: Sommerville, I. (ed.) SCM 1996. LNCS, vol. 1167, pp. 49–66. Springer, Heidelberg (1996). doi:10.1007/BFb0023080

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61370059, the National Natural Science Foundation of China under Grant No. 61232009, Beijing Natural Science Foundation under Grant No. 4152030, the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2016ZX-13, the Open Research Fund of The Academy of Satellite Application under Grant No. Y20A-E03 and the Open Project Program of National Engineering Research Center for Science & Technology Resources Sharing Service (Beihang University).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Song, B., Xiao, L., Qin, G., Ruan, L., Qiu, S. (2017). A Deduplication Algorithm Based on Data Similarity and Delta Encoding. In: Yuan, H., Geng, J., Bian, F. (eds) Geo-Spatial Knowledge and Intelligence. GRMSE 2016. Communications in Computer and Information Science, vol 699. Springer, Singapore. https://doi.org/10.1007/978-981-10-3969-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3969-0_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3968-3

  • Online ISBN: 978-981-10-3969-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics