Advertisement

Matcher Composition Methods for Automatic Schema Matching

  • Daniel Nikovski
  • Alan Esenther
  • Xiang Ye
  • Mitsuteru Shiba
  • Shigenobu Takayama
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 141)

Abstract

We address the problem of automating the process of deciding whether two data schema elements match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component matchers, and the careful selection of optimization switches can improve matching accuracy even further.

Keywords

Data integration Virtual databases Uncertain schema matching 

References

  1. 1.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)CrossRefGoogle Scholar
  2. 2.
    Do, H.H., Rahm, E.: COMA - A System for Flexible Combination of Schema Matching Approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB) (2002)Google Scholar
  3. 3.
    Li, W., Clifton, C.: A tool for identifying attribute correspondences in heterogeneous databases using neural network. J. Data Knowl. Eng. 33(1), 49–84 (2000)CrossRefGoogle Scholar
  4. 4.
    Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of databases: a multistrategy approach. Mach. Learn. J. 50, 279–301 (2003)CrossRefGoogle Scholar
  5. 5.
    Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. J. Data Knowl. Eng. 36(3), 215–249 (2001)CrossRefGoogle Scholar
  6. 6.
    Do, H.H., Rahm, R.: Matching large schemas: approaches and evaluation. J. Inf. Syst. 32(6), 857–885 (2007)CrossRefGoogle Scholar
  7. 7.
    Doan, A.H., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A Machine Learning Approach. In: SIGMOD 2001 (2001)Google Scholar
  8. 8.
    Embley, D.W.: Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration. In: WIIW 2001 (2001)Google Scholar
  9. 9.
    Heckerman, D.: A tutorial on learning bayesian networks. J. Learn. Graph. Models, 301–354 (2001)Google Scholar
  10. 10.
    Tang, J., Li, J.Z.: Using bayesian decision for ontology mapping. J. Web Semant. 4(4), 157 (2006)CrossRefGoogle Scholar
  11. 11.
    Thiesson, B.: Accelerated quantification of bayesian networks with incomplete data. In: Proceedings of the Conference on Knowledge Discovery in Data, pp. 306–311 (1995)Google Scholar
  12. 12.
    Pan, R., Peng, Y., Ding, Z.: Belief update in Bayesian networks using uncertain evidence. In: 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), pp. 441–444 (2006)Google Scholar
  13. 13.
    Marie, A., Gal, A.: Managing Uncertainty in Schema Matcher Ensembles. In: Prade, H., Subrahmanian, V.S. (eds.) SUM 2007. LNCS (LNAI), vol. 4772, pp. 60–73. Springer, Heidelberg (2007)Google Scholar
  14. 14.
    Doan, A.H., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)CrossRefGoogle Scholar
  15. 15.
    Duchateau, F., Bellahsene, Z., Coletta, R.: A Flexible Approach for Planning Schema Matching Algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)Google Scholar
  16. 16.
    Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: Not yet another matcher. In: Proceedings of CIKM’09, Hong-Kong, China, pp. 2079–2080, November 2009Google Scholar
  17. 17.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)Google Scholar
  18. 18.
    Berlin, J., Motro, A.: Database schema matching using machine learning with feature selection. CAiSE 2002. LNCS, vol. 2348, pp. 452–466. Springer, Heidelberg (2002)Google Scholar
  19. 19.
    Rajesh, A., Srivatsa, S.K.: XML schema matching – using structural information. Int. J. Comput. Appl. 8(2), 34–41 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Daniel Nikovski
    • 1
  • Alan Esenther
    • 1
  • Xiang Ye
    • 1
  • Mitsuteru Shiba
    • 2
  • Shigenobu Takayama
    • 3
  1. 1.Mitsubishi Electric Research LaboratoriesCambridgeUSA
  2. 2.Mitsubishi Electric CorporationKanagawaJapan
  3. 3.Mitsubishi Electric Information Systems CorporationKanagawaJapan

Personalised recommendations