Advertisement

A Two-Phase Differential Synchronization Algorithm for Remote Files

  • Yonghong Sheng
  • Dan Xu
  • Dongsheng Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6081)

Abstract

This paper presents a two-phase synchronization algorithm—tpsync, which combines content-defined chunking (CDC) with sliding block duplicated data detection methods. tpsync firstly partitions synchronized files into variable-sized chunks in coarse-grained scale with CDC method, locates the unmatched chunks of synchronized files using the edit distance algorithm, and finally generates the fine-grained delta data with fixed-sized sliding block duplicated data detection method. At the first-phase, tpsync can quickly locate the partial changed chunks between two files through similar files’ fingerprint characteristics. On the basis of the first phase’s results, small fixed-sized sliding block duplicated data detection method can produce better fine-grained delta data between the corresponding unmatched data chunks further. Extensive experiments on ASCII, binary and database files demonstrate that tpsync can achieve a higher performance on synchronization time and total transferred data compared to traditional fixed-sized sliding block method—rsync. Compared to rsync, tpsync reduces synchronization time by 12% and bandwidth by 18.9% on average if optimized parameters are applied on both. With signature cached synchronization method adopted, tpsync can yield a better performance.

Keywords

Block Size Synchronization Time Synchronization Algorithm Bandwidth Saving Chunk Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ajtai, M., Burns, R., Fagin, R., Long, D., Stockmeyer, L.: Compactly encoding unstructured inputs with differential compression. Journal of the ACM (JACM) 49, 318–367 (2002)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Whitten, A.: Scalable Document Fingerprinting. In: The USENIX Workshop on E-Commerce (1996)Google Scholar
  3. 3.
    Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Communications of the ACM 20, 350–353 (1977)CrossRefMathSciNetzbMATHGoogle Scholar
  4. 4.
    Korn, D., Vo, K.: Engineering a differencing and compression data format, pp. 219–228 (2002)Google Scholar
  5. 5.
    MacDonald, J.: File system support for delta compression. Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, Master thesis (May 2000)Google Scholar
  6. 6.
    Percival, C.: Naive differences of executable code, Draft Paper, http://www.daemonology.net/bsdiff
  7. 7.
    Trendafilov, D., Memon, N., Suel, T.: zdelta: An efficient delta compression tool. Department of Computer and Information Science, Polytechnic University Technical Report (2002)Google Scholar
  8. 8.
    Tridgell, A.: Efficient algorithms for sorting and synchronization. PhD thesis, Australian National University (1999)Google Scholar
  9. 9.
    Meunier, P., Nystrom, S., Kamara, S., Yost, S., Alexander, K., Noland, D., Crane, J.: ActiveSync, TCP/IP and 802.11 b Wireless Vulnerabilities of WinCE-based PDAs. In: Proceedings of Eleventh IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, WET ICE 2002, pp. 145–150 (2002)Google Scholar
  10. 10.
    Whitepapers, P.: Invasion of the data snatchers (1999), http://www.pumatech.com/enterprise/wp-1.html
  11. 11.
    Palm: Palm developer knowledge base manuals (1999), http://palmos.com/dev/support/docs/palmos/ReferenceTOC.html
  12. 12.
    Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. In: Proceedings of the eighteenth ACM symposium on Operating systems principles, pp. 174–187. ACM, New York (2001)CrossRefGoogle Scholar
  13. 13.
    Teodosiu, D., Bjorner, N., Gurevich, Y., Manasse, M., Porkka, J.: Optimizing file replication over limited bandwidth networks using remote differential compression. Technical report, Microsoft Corporation (2006)Google Scholar
  14. 14.
    Grune, D.: Concurrent Versions System, a method for independent cooperation. Report IR-114, Vrije University, Amsterdam (1986)Google Scholar
  15. 15.
    Collins-Sussman, B., Pilato, C., Pilato, C., Fitzpatrick, B.: Version control with subversion. O’Reilly Media, Inc., Sebastopol (2008)Google Scholar
  16. 16.
    Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of the 2004 USENIX Annual Technical Conference, pp. 73–86 (2004)Google Scholar
  17. 17.
    Jain, N., Dahlin, M., Tewari, R.: Taper: Tiered approach for eliminating redundancy in replica synchronization. In: Proceedings of the 4th Usenix Conference on File and Storage Technologies (FAST 2005) (2005)Google Scholar
  18. 18.
    Denehy, T., Hsu, W.: Duplicate management for reference data. Research Report RJ10305, IBM (2003)Google Scholar
  19. 19.
    Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.: Redundancy elimination within large collections of files. In: The USENIX Annual Technical Conference, General Track, 59–72 (2004)Google Scholar
  20. 20.
    Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies, vol. 4 (2002)Google Scholar
  21. 21.
    Bindel, D., Chen, Y., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Weimer, W., Wells, C., et al.: Oceanstore: An extremely wide-area storage system. In: Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, Citeseer, pp. 190–201 (2000)Google Scholar
  22. 22.
    Rabin, M.: Fingerprinting by random polynomials. Technical report, Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981)Google Scholar
  23. 23.
    Bobbarjung, D., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. ACM Transactions on Storage (TOS) 2, 424–448 (2006)CrossRefGoogle Scholar
  24. 24.
    Levenshteiti, V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet Physics-Doklady, vol. 10 (1966)Google Scholar
  25. 25.
    Martin Pool, D.B.: librsync, http://librsync.sourceforge.net
  26. 26.
    Council, T.: TPC BenchmarkTM C Standard Specification (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yonghong Sheng
    • 1
  • Dan Xu
    • 2
  • Dongsheng Wang
    • 1
    • 3
  1. 1.Department of Computer Science and TechnologyTsinghua UniversityBeijingP.R. China
  2. 2.School of Computer Science and TechnologyBeijing University of Posts and TelecommunicationsBeijingP.R. China
  3. 3.Tsinghua National Laboratory for Information Science and TechnologyBeijingP.R. China

Personalised recommendations