Skip to main content

Matcher Composition Methods for Automatic Schema Matching

  • Conference paper
Enterprise Information Systems

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 141))

Abstract

We address the problem of automating the process of deciding whether two data schema elements match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component matchers, and the careful selection of optimization switches can improve matching accuracy even further.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 95.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10, 334–350 (2001)

    Article  Google Scholar 

  2. Do, H.H., Rahm, E.: COMA - A System for Flexible Combination of Schema Matching Approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB) (2002)

    Google Scholar 

  3. Li, W., Clifton, C.: A tool for identifying attribute correspondences in heterogeneous databases using neural network. J. Data Knowl. Eng. 33(1), 49–84 (2000)

    Article  Google Scholar 

  4. Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of databases: a multistrategy approach. Mach. Learn. J. 50, 279–301 (2003)

    Article  Google Scholar 

  5. Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. J. Data Knowl. Eng. 36(3), 215–249 (2001)

    Article  Google Scholar 

  6. Do, H.H., Rahm, R.: Matching large schemas: approaches and evaluation. J. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  7. Doan, A.H., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A Machine Learning Approach. In: SIGMOD 2001 (2001)

    Google Scholar 

  8. Embley, D.W.: Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration. In: WIIW 2001 (2001)

    Google Scholar 

  9. Heckerman, D.: A tutorial on learning bayesian networks. J. Learn. Graph. Models, 301–354 (2001)

    Google Scholar 

  10. Tang, J., Li, J.Z.: Using bayesian decision for ontology mapping. J. Web Semant. 4(4), 157 (2006)

    Article  Google Scholar 

  11. Thiesson, B.: Accelerated quantification of bayesian networks with incomplete data. In: Proceedings of the Conference on Knowledge Discovery in Data, pp. 306–311 (1995)

    Google Scholar 

  12. Pan, R., Peng, Y., Ding, Z.: Belief update in Bayesian networks using uncertain evidence. In: 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), pp. 441–444 (2006)

    Google Scholar 

  13. Marie, A., Gal, A.: Managing Uncertainty in Schema Matcher Ensembles. In: Prade, H., Subrahmanian, V.S. (eds.) SUM 2007. LNCS (LNAI), vol. 4772, pp. 60–73. Springer, Heidelberg (2007)

    Google Scholar 

  14. Doan, A.H., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)

    Article  Google Scholar 

  15. Duchateau, F., Bellahsene, Z., Coletta, R.: A Flexible Approach for Planning Schema Matching Algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)

    Google Scholar 

  16. Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: Not yet another matcher. In: Proceedings of CIKM’09, Hong-Kong, China, pp. 2079–2080, November 2009

    Google Scholar 

  17. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  18. Berlin, J., Motro, A.: Database schema matching using machine learning with feature selection. CAiSE 2002. LNCS, vol. 2348, pp. 452–466. Springer, Heidelberg (2002)

    Google Scholar 

  19. Rajesh, A., Srivatsa, S.K.: XML schema matching – using structural information. Int. J. Comput. Appl. 8(2), 34–41 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Nikovski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nikovski, D., Esenther, A., Ye, X., Shiba, M., Takayama, S. (2013). Matcher Composition Methods for Automatic Schema Matching. In: Cordeiro, J., Maciaszek, L.A., Filipe, J. (eds) Enterprise Information Systems. Lecture Notes in Business Information Processing, vol 141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40654-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40654-6_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40653-9

  • Online ISBN: 978-3-642-40654-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics