Adaptive Similarity of XML Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8841)


In this work we explore application of XML schema similarity mapping in the area of conceptual modeling of XML schemas. We expand upon our previous efforts to map XML schemas to a common platform-independent schema using similarity evaluation based on exploitation of a decision tree. In particular, in this paper a more versatile method is implemented and the decision tree is trained using a large set of user-annotated mapping decision samples. Several variations of training that could improve the mapping results are proposed. The approach is implemented within a modeling and evolution management framework called eXolutio and its variations are evaluated using a wide range of experiments.


XML schema matching PSM-to-PIM mapping model driven architecture 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Do, H.H., Rahm, E.: COMA – A system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, Pages, pp. 610–621. VLDB Endowment, Hong Kong (2002)Google Scholar
  2. 2.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proceeding SIGMOD 2005 Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908 (2005) ISBN:1-59593-060-4Google Scholar
  3. 3.
    Duchateau, F., Bellahsene, Z., Coletta, R.: A flexible approach for planning schema matching algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)Google Scholar
  4. 4.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. In: Proceeding ICDE 2002 Proceedings of the 18th International Conference on Data Engineering, p. 117. IEEE Computer Society, Washington, DC (2002)CrossRefGoogle Scholar
  5. 5.
    Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation (November 26, 2008),
  6. 6.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  7. 7.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco (1993) ISBN:1-55860-238-0Google Scholar
  8. 8.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman & Hall, New York (1984)Google Scholar
  9. 9.
    Hunt, E. B., Marin, J., Stone, P. T.: Experiments in Induction. Academic Press, New York (1966)Google Scholar
  10. 10.
    Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–540. Springer, Heidelberg (1996)Google Scholar
  11. 11.
    Stárka, J.: Similarity of XML Data. Master’s thesis, Charles University in Prague (2010),
  12. 12.
    Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Pattern Matching Algorithms, pp. 341–371. Oxford University Press (1997)Google Scholar
  13. 13.
    Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proceedings of the Fifth International Workshop on the Web and Databases, pp. 61–66 (2002)Google Scholar
  14. 14.
    Li, W., Clifton, C.: SemInt: a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data & Knowledge Engineering 33(1), 169–123 (2000) ISSN 0169-023XGoogle Scholar
  15. 15.
    Chen, P.: The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems, 9–36 (March 1976)Google Scholar
  16. 16.
  17. 17.
    Stárka, J., Mlýnková, I., Klímek, J., Nečaský, M.: Integration of web service interfaces via decision trees. In: Proceedings of the 7th International Symposium on Innovations in Information Technology, pp. 47–52. IEEE Computer Society, Abu Dhabi (2011) ISBN: 978-1-4577-0311-9Google Scholar
  18. 18.
    Klímek, J., Mlýnková, I., Nečaský, M.: eXolutio: Tool for XML and Data Management. In: CEUR Workshop Proceedings, pp. 1613–1673 (2012) ISSN: 1613-0073Google Scholar
  19. 19.
    Miller, J., Mukerji, J.: MDA Guide Version 1.0.1. Object Management Group (2003),
  20. 20.
    Nečaský, M., Mlýnková, I., Klímek, J., Malý, J.: When conceptual model meets grammar: A dual approach to XML data modeling. International Journal on Data & Knowledge Engineering 72, 1–30 (2012) ISBN:3-642-17615-1, 978-3-642-17615-9Google Scholar
  21. 21.
    Jílková, E.: Adaptive Similarity of XML Data. Master’s thesis, Charles University in Prague (2013),
  22. 22.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1) (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Software EngineeringCharles University in PragueCzech Republic

Personalised recommendations