Similarity of DTDs Based on Edit Distance and Semantics

  • Aleš Wojnar
  • Irena Mlýnková
  • Jiří Dokulil
Part of the Studies in Computational Intelligence book series (SCI, volume 162)


In this paper we propose a technique for evaluating similarity of XML schema fragments. Contrary to existing works we focus on structural level in combination with semantic similarity of the data. For this purpose we exploit the idea of edit distance utilized to constructs of DTDs which enables to express the structural differences of the given data more precisely. In addition, in combination with the semantic similarity it provides more realistic results. Using various experiments we show the behavior and advantages of the proposed approach.


Semantic Similarity Edit Distance Tree Representation Edit Operation Cardinality Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altinel, M., Franklin, M.J.: Efficient Filtering of XML Documents for Selective Dissemination of Information. In: VLDB 2000, pp. 53–64. Morgan Kaufmann, San Francisco (2000)Google Scholar
  2. 2.
    Bertino, E., Guerrini, G., Mesiti, M.: A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Inf. Syst. 29(1), 23–46 (2004)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0 (Fourth Edition). W3C (2006)Google Scholar
  4. 4.
    Do, H.H., Rahm, E.: COMA – A System for Flexible Combination of Schema Matching Approaches. In: VLDB 2002, pp. 610–621. Morgan Kaufmann, Hong Kong (2002)Google Scholar
  5. 5.
    Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: CIKM 2002, pp. 292–299. ACM Press, New York (2002)CrossRefGoogle Scholar
  6. 6.
    Levenshtein, V.I.: Binary Codes Capable of Correcting Deletions, Insertions and Reversals. Soviet Physics Doklady 10, 707 (1966)MathSciNetGoogle Scholar
  7. 7.
    Milo, T., Zohar, S.: Using Schema Matching to Simplify Heterogeneous Data Translation. In: VLDB 1998, pp. 122–133. Morgan Kaufmann, San Francisco (1998)Google Scholar
  8. 8.
    Mlynkova, I., Toman, K., Pokorny, J.: Statistical Analysis of Real XML Data Collections. In: COMAD 2006, New Delhi, India, pp. 20–31. Tata McGraw-Hill Publishing, New York (2006)Google Scholar
  9. 9.
    Ng, P.K.L., Ng, V.T.Y.: Structural Similarity between XML Documents and DTDs. In: ICCS 2003, pp. 412–421. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: WebDB 2002, Madison, Wisconsin, USA, pp. 61–66 (2002)Google Scholar
  11. 11.
    Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D.J., Naughton, J.F.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: VLDB 1999, pp. 302–314. Morgan Kaufmann, San Francisco (1999)Google Scholar
  12. 12.
    Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity Metric for XML Documents. In: FGWM 2003, Karlsruhe, Germany (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Aleš Wojnar
    • 1
  • Irena Mlýnková
    • 1
  • Jiří Dokulil
    • 1
  1. 1.Charles University in PragueCzech Republic

Personalised recommendations