X-Tree Diff+: Efficient Change Detection Algorithm in XML Documents

  • Suk Kyoon Lee
  • Dong Ah Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4096)


As web documents proliferate fast, the need fo real-time computation of change (edit script) between web documents increases. Though fast heuristic algorithms have been proposed recently, the qualities of edit scripts produced by them are not satisfactory. In this paper, we propose X-tree Diff+ which produces better quality of edit scripts by introducing a tuning step based on the notion of consistency of matching. We also add copy operation to provide users more convenience. Tuning and copy operation increase matching ratio drastically. X-tree Diff+ produces better quality of edit scripts and runs fast equivalent to the time complexity of fastest heuristic algorithms.


Hash Table Edit Operation Matched Node Current Match Sibling Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chawathe, S., Rajaraman, A., Molina, H.G., Widom, J.: Change Detection in Hierarchically Structured Information. In: Proc. of ACM SIGMOD Int’l. Conf. on Management of Data, Montreal (June 1996)Google Scholar
  2. 2.
    Selkow, S.M.: The tree-to-tree editing problem. Information Proc. Letters 6, 184–186 (1977)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Tai, K.: The tree-to-tree correction problem. Journal of the ACM 26(3), 422–433 (1979)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Lu, S.: A tree-to-tree distance and its application to cluster analysis. IEEE TPAMI 1(2), 219–224 (1979)MATHGoogle Scholar
  5. 5.
    Wang, J.T., Zhang, K.: A System for Approximate Tree Matching. IEEE TKDE 6(4), 559–571 (1994)Google Scholar
  6. 6.
    Chawathe, S., Molina, H.G.: Meaningful Change Detection in Structured Data. In: Proc. of ACM SIGMOD 1997, pp. 26–37 (1997)Google Scholar
  7. 7.
    Chawathe, S.: Comparing Hierarchical Data in External Memory. In: Proc. of the 25th VLDB Conf., pp. 90–101 (1999)Google Scholar
  8. 8.
    Lim, S.J., Ng, Y.K.: An Automated Change-Detection Algorithm for HTML Documents Based on Semantic Hierarchies. In: The 17th ICDE, pp. 303–312 (2001)Google Scholar
  9. 9.
    Curbera, Epstein, D.A.: Fast Difference and Update of XML Documents. In: XTech 1999, San Jose (March 1999)Google Scholar
  10. 10.
    Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: The 18th ICDE (2002)Google Scholar
  11. 11.
    Wang, Y., DeWitt, D.J., Cai, J.Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents. In: The 19th ICDE (2003)Google Scholar
  12. 12.
    Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM journal of Computing 18(6), 1245–1262 (1989)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Rivest, R.: The MD4 Message-Digest Algorithm. MIT and RSA Data Security, Inc. (April 1992)Google Scholar
  14. 14.
    Kim, D.A., Lee, S.K.: Efficient Change Detection in Tree-Structured Data. In: Human.Society@Internet Conf. 2003, pp. 675–681 (2003)Google Scholar
  15. 15.
    Aboulnaga, A., Naughton, J.F., Zhang, C.: Generating Synthetic Complex-structured XML Data. In: Proceedings of the Fourth International Workshop on the Web and Databases, WebDB (2001)Google Scholar
  16. 16.
    NIAGARA Query Engine, http://www.cs.wisc.edu/niagara/
  17. 17.
    Kim, D.A.: Change Detection and Management in XML Documents. Ph.D. thesis, Dankook University, Korea (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Suk Kyoon Lee
    • 1
  • Dong Ah Kim
    • 1
  1. 1.Division of Information and Computer ScienceDankook UniversitySeoulKorea

Personalised recommendations