Schema-Less, Semantics-Based Change Detection for XML Documents

  • Shuohao Zhang
  • Curtis Dyreson
  • Richard T. Snodgrass
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3306)


Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.


Change Detection Edit Distance Successive Version XPath Query XPath Expression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford University Press, Oxford (1997)zbMATHGoogle Scholar
  2. 2.
    Brewington, B., Cybenko, G.: How Dynamic is the Web? In: Proc. of the 9th International World Wide Web Conference, Amsterdam, Netherlands, pp. 257–276 (May 2000)Google Scholar
  3. 3.
    Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. In: Proc. of the 10th International World Wide Web Conference, Hong Kong, China, pp. 201–210 (2001)Google Scholar
  4. 4.
    Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: Proceedings of ICDE, San Jose, pp. 41–52 (February 2002)Google Scholar
  5. 5.
    Chawathe, S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of SIGMOD Conference, pp. 26–37 (June 1997)Google Scholar
  6. 6.
    Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: Proc. of VLDB Conference, Cairo, Egypt, pp. 200–209 (September 2000)Google Scholar
  7. 7.
    Chawathe, S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change Detection in Hierarchically Structured Information. In: SIGMOD Conference, Montreal, Canada, pp. 493–504 (June 1996)Google Scholar
  8. 8.
    Douglis, F., Ball, T., Chen, Y.F., Koutsofios, E.: The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web. World Wide Web 1(1), 27–44 (1998)CrossRefGoogle Scholar
  9. 9.
    Dyreson, C., Ling, H., Wang, Y.: Managing Versions of Web Documents in a Transaction time Web Server. In: Proc. of the 13th International World Wide Web Conference, New York City, pp. 421–432 (May 2004)Google Scholar
  10. 10.
    Dyreson, C.: Observing Transaction-time Semantics with TTXPath. In: Proceedings of WISE, Kyoto, Japan, pp. 193–202 (December 2001)Google Scholar
  11. 11.
    Grandi, F.: Introducing an Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web. SIGMOD Record 33(2) (June 2004)Google Scholar
  12. 12.
    Gao, D., Snodgrass, R.T.: Temporal Slicing in the Evaluation of XML Queries. In: Proceedings of VLDB, pp. 632–643 (2003)Google Scholar
  13. 13.
    Hoffmann, C.M., O’Donnell, M.: Pattern Matching in Trees. JACM 29, 68–95 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10, 707–710 (1966)MathSciNetGoogle Scholar
  15. 15.
    Liu, L., Pu, C., Barga, R., Zhou, T.: Differential Evaluation of Continual Queries. In: Proc. of the International Conference on Distributed Computing Systems, pp. 458–465 (1996)Google Scholar
  16. 16.
    Liu, L., Pu, C., Tang, W.: Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Trans. Knowledge Data Engineering 11(4), 610–628 (1999)CrossRefGoogle Scholar
  17. 17.
    Lu, S.: A tree-to-tree distance and its application to cluster analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 219–224 (1979)zbMATHGoogle Scholar
  18. 18.
    Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. System Sci, 18–31 (1980)Google Scholar
  19. 19.
    Myers, E.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1(2), 251–266 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Tai, K.C.: The Tree-to-Tree Correction Problem. JACM 26, 485–495 (1979)CrossRefMathSciNetGoogle Scholar
  21. 21.
    XML Path Language (XPath) 2.0. W3C, (current as of August 2004)
  22. 22.
    Wang, Y., DeWitt, D., Cai, J.-Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents, (Current as of August 2004)
  23. 23.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. JACM 21, 168–173 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42, 133–139 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Zhang, K.: A Constrained Edit Distance between Unordered Labeled Trees. Algorithmica, 205–222 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Shuohao Zhang
    • 1
  • Curtis Dyreson
    • 1
  • Richard T. Snodgrass
    • 2
  1. 1.Washington State UniversityPullmanU.S.A.
  2. 2.The University of ArizonaTucsonU.S.A.

Personalised recommendations