Extensible User-Based XML Grammar Matching

  • Joe Tekli
  • Richard Chbeir
  • Kokou Yetongnon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5829)


XML grammar matching has found considerable interest recently due to the growing number of heterogeneous XML documents on the web and the increasing need to integrate, and consequently search and retrieve XML data originated from different data sources. In this paper, we provide an approach for automatic XML grammar matching and comparison aiming to minimize the amount of user effort required to perform the match task. We propose an open framework based on the concept of tree edit distance, integrating different matching criterions so as to capture XML grammar element semantic and syntactic similarities, cardinality and alternativeness constraints, as well as data-type correspondences and relative ordering. It is flexible, enabling the user to chose mapping cardinality (1:1, 1:n, n:1, n:n), in comparison with existing static methods (constrained to 1:1), and considers user feedback to adjust matching results to the user’s perception of correct matches. Conducted experiments demonstrate the efficiency of our approach, in comparison with alternative methods.


XML and Semi-structured data XML grammar schema matching structural similarity tree edit distance vector space model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertino, E., Guerrini, G., Mesiti, M.: A Matching Algorithm for Measuring the Structural Similarity between an XML Document and a DTD and its Applications. Elsevier Computer Science 29(23-46) (2004)Google Scholar
  2. 2.
    Bille, P.: A Survey on Tree Edit Distance and Related Problems. Theoretical Computer Science 337(1-3), 217–239 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Boukottaya, A., Vanoirbeek, C.: Schema Matching for Transforming Structured Documents. The. In: Int. ACM Symposium on Document Engineering, pp. 101–110 (2005)Google Scholar
  4. 4.
    Bray, T., Paoli, J., Sperberg-McQueen, C.M., Mailer, Y., Yergeau, F.: Extensible Markup Language (XML) 1.0 5th edn., W3C recommendation (November 2008),
  5. 5.
    Buttler, D.: A Short Survey of Document Structure Similarity Algorithms. In: Proc. of ICOMP, pp. 3–9 (2004)Google Scholar
  6. 6.
    Chawathe, S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change Detection in Hierarchically Structured Information. In: ACM SIGMOD Record, pp. 493–504 (1996)Google Scholar
  7. 7.
    Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: ICDE, pp. 41–52 (2002)Google Scholar
  8. 8.
    Dalamagas, T., Cheng, T., Winkel, K., Sellis, T.: A methodology for clustering XML documents by structure. Inormation Systems 31(3), 187–228 (2006)CrossRefGoogle Scholar
  9. 9.
    Do, H.H., Rahm, E.: COMA: A System for Flexible Combination of Schema Matching Approaches. In: VLDB Conference, pp. 610–621 (2002)Google Scholar
  10. 10.
    Do, H.H., Melnik, S., Rahm, E.: Comparison of Schema Matching Evaluations, In. In: Proc. of GI-Workshop on the Web and Databases, pp. 221–237 (2002)Google Scholar
  11. 11.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach. In: Proc. of the SIGMOD Conference (2001)Google Scholar
  12. 12.
    Formica, A.: Similarity of XML-Schema Elements: A Structural and Information content Approach. The Computer Journal 51(2), 240–254 (2008)CrossRefGoogle Scholar
  13. 13.
    Hall, P., Dowling, G.: Approximate String Matching. Computing Surveys 12(4), 381–402 (1980)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Jeong, B., Lee, D., Cho, H., Lee, J.: A Novel Method for Measuring Semantic Similarity for XML Schema Matching. Expert Systems with Applications: An International Journal 34(3), 1651–1658 (2008)CrossRefGoogle Scholar
  15. 15.
    Knuth, D.: Sorting by Merging. In: The Art of Computer Programming, pp. 158–168. Addison-Wesley, Reading (1998)Google Scholar
  16. 16.
    Lee, M., Yang, L., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. of CIKM, pp. 292–299 (2002)Google Scholar
  17. 17.
    Leonardi, E., et al.: DTD-Diff: A Change Detection Algorithm for DTDs. DKE 61(2), 384–402 (2007)CrossRefGoogle Scholar
  18. 18.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of the Int. Conf. on ML, pp. 296–304 (1998)Google Scholar
  19. 19.
    Madhavan, J., Bernstein, P., Rahm, E.: Generic Schema Matching With Cupid. In: VLDB, pp. 49–58 (2001)Google Scholar
  20. 20.
    Maguitman, A.G., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic Detection of Semantic Similarity. In: Proc. of WWW, pp. 107–116 (2005)Google Scholar
  21. 21.
    McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)zbMATHGoogle Scholar
  22. 22.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching. In: Proceedings of ICDE (2002)Google Scholar
  23. 23.
    Miller, G.: WordNet: An On-Line Lexical Database. Journal of Lexicography (1990)Google Scholar
  24. 24.
    Miller, R., Hass, L., Hermandez, M.A.: Schema Mapping as Query Discovery. In: VLDB, pp. 77–88 (2000)Google Scholar
  25. 25.
    Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: WebDB, pp. 61–66 (2002)Google Scholar
  26. 26.
    Peterson, D., Gao, S., Malhotra, A., Sperberg-McQueen, C.M., Thompson, H.S.: W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes (January 2009),
  27. 27.
    Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. The VLDB Journal 10, 334–350 (2001)zbMATHCrossRefGoogle Scholar
  28. 28.
    Schlieder, T.: Similarity Search in XML Data Using Cost-based Query Transformations. In: Proc. of SIGMOD WebDB, pp. 10–24 (2001)Google Scholar
  29. 29.
    Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Pattern Matching in Strings, Trees and Arrays. Oxford Press, Oxford (1995)Google Scholar
  30. 30.
    Su, H., Kuno, H., Rundensteiner, E.A.: Automating the Transformation of XML Documents. In: Proc. of ACM Workshop on Web Information and Data Management, pp. 68–75 (2001)Google Scholar
  31. 31.
    Su, H., Padmanabhan, S., Lo, M.L.: Identification of Syntactically Similar DTD Elements for Schema Matching. In: Advances in Web-Age Information Management Conf., pp. 145–159 (2001)Google Scholar
  32. 32.
    Tekli, J., Chbeir, R., Yetongnon, K.: A Fine-Grained XML Structural Comparison Approach. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 582–598. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  33. 33.
    Tekli, J., Chbeir, R., Yetongnon, K.: An XML Grammar Comparison Framework – Technical Report (2008),
  34. 34.
    Wagner, J., Fisher, M.: The String-to-String correction problem. Journal of ACM 21(1), 168–173 (1974)zbMATHCrossRefGoogle Scholar
  35. 35.
    Wu, Z., Palmer, M.: Verb Semantics and Lexical Selection. In: In Proc. of the 32nd Annual Meeting of the Associations for Computational Linguistics, pp. 133–138 (1994)Google Scholar
  36. 36.
    Yi, S., Huang, B., Chan, W.T.: XML Application Schema Matching Using Similarity Measure and Relaxation Labeling. Information Sciences 169(1-2), 27–46 (2005)zbMATHCrossRefGoogle Scholar
  37. 37.
    Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal 18(6), 1245–1262 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Zhang, Z., Li, R., Cao, S., Zhu, Y.: Similarity Metric in XML Documents. In: Knowledge and Experience Management Workshop (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Joe Tekli
    • 1
  • Richard Chbeir
    • 1
  • Kokou Yetongnon
    • 1
  1. 1.LE2I Laboratory UMR-CNRSUniversity of BourgogneDijonFrance

Personalised recommendations