Skip to main content

A Two-Phase Differential Synchronization Algorithm for Remote Files

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6081))

Abstract

This paper presents a two-phase synchronization algorithm—tpsync, which combines content-defined chunking (CDC) with sliding block duplicated data detection methods. tpsync firstly partitions synchronized files into variable-sized chunks in coarse-grained scale with CDC method, locates the unmatched chunks of synchronized files using the edit distance algorithm, and finally generates the fine-grained delta data with fixed-sized sliding block duplicated data detection method. At the first-phase, tpsync can quickly locate the partial changed chunks between two files through similar files’ fingerprint characteristics. On the basis of the first phase’s results, small fixed-sized sliding block duplicated data detection method can produce better fine-grained delta data between the corresponding unmatched data chunks further. Extensive experiments on ASCII, binary and database files demonstrate that tpsync can achieve a higher performance on synchronization time and total transferred data compared to traditional fixed-sized sliding block method—rsync. Compared to rsync, tpsync reduces synchronization time by 12% and bandwidth by 18.9% on average if optimized parameters are applied on both. With signature cached synchronization method adopted, tpsync can yield a better performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ajtai, M., Burns, R., Fagin, R., Long, D., Stockmeyer, L.: Compactly encoding unstructured inputs with differential compression. Journal of the ACM (JACM) 49, 318–367 (2002)

    Article  MathSciNet  Google Scholar 

  2. Whitten, A.: Scalable Document Fingerprinting. In: The USENIX Workshop on E-Commerce (1996)

    Google Scholar 

  3. Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Communications of the ACM 20, 350–353 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  4. Korn, D., Vo, K.: Engineering a differencing and compression data format, pp. 219–228 (2002)

    Google Scholar 

  5. MacDonald, J.: File system support for delta compression. Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, Master thesis (May 2000)

    Google Scholar 

  6. Percival, C.: Naive differences of executable code, Draft Paper, http://www.daemonology.net/bsdiff

  7. Trendafilov, D., Memon, N., Suel, T.: zdelta: An efficient delta compression tool. Department of Computer and Information Science, Polytechnic University Technical Report (2002)

    Google Scholar 

  8. Tridgell, A.: Efficient algorithms for sorting and synchronization. PhD thesis, Australian National University (1999)

    Google Scholar 

  9. Meunier, P., Nystrom, S., Kamara, S., Yost, S., Alexander, K., Noland, D., Crane, J.: ActiveSync, TCP/IP and 802.11 b Wireless Vulnerabilities of WinCE-based PDAs. In: Proceedings of Eleventh IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, WET ICE 2002, pp. 145–150 (2002)

    Google Scholar 

  10. Whitepapers, P.: Invasion of the data snatchers (1999), http://www.pumatech.com/enterprise/wp-1.html

  11. Palm: Palm developer knowledge base manuals (1999), http://palmos.com/dev/support/docs/palmos/ReferenceTOC.html

  12. Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, pp. 174–187. ACM, New York (2001)

    Chapter  Google Scholar 

  13. Teodosiu, D., Bjorner, N., Gurevich, Y., Manasse, M., Porkka, J.: Optimizing file replication over limited bandwidth networks using remote differential compression. Technical report, Microsoft Corporation (2006)

    Google Scholar 

  14. Grune, D.: Concurrent Versions System, a method for independent cooperation. Report IR-114, Vrije University, Amsterdam (1986)

    Google Scholar 

  15. Collins-Sussman, B., Pilato, C., Pilato, C., Fitzpatrick, B.: Version control with subversion. O’Reilly Media, Inc., Sebastopol (2008)

    Google Scholar 

  16. Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of the 2004 USENIX Annual Technical Conference, pp. 73–86 (2004)

    Google Scholar 

  17. Jain, N., Dahlin, M., Tewari, R.: Taper: Tiered approach for eliminating redundancy in replica synchronization. In: Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST 2005) (2005)

    Google Scholar 

  18. Denehy, T., Hsu, W.: Duplicate management for reference data. Research Report RJ10305, IBM (2003)

    Google Scholar 

  19. Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.: Redundancy elimination within large collections of files. In: The USENIX Annual Technical Conference, General Track, 59–72 (2004)

    Google Scholar 

  20. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies, vol. 4 (2002)

    Google Scholar 

  21. Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Weimer, W., Wells, C., et al.: Oceanstore: An extremely wide-area storage system. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, Citeseer, pp. 190–201 (2000)

    Google Scholar 

  22. Rabin, M.: Fingerprinting by random polynomials. Technical report, Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981)

    Google Scholar 

  23. Bobbarjung, D., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Transactions on Storage (TOS) 2, 424–448 (2006)

    Article  Google Scholar 

  24. Levenshteiti, V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics-Doklady, vol. 10 (1966)

    Google Scholar 

  25. Martin Pool, D.B.: librsync, http://librsync.sourceforge.net

  26. Council, T.: TPC BenchmarkTM C Standard Specification (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sheng, Y., Xu, D., Wang, D. (2010). A Two-Phase Differential Synchronization Algorithm for Remote Files. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13119-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13118-9

  • Online ISBN: 978-3-642-13119-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics