Skip to main content

Managing Uncertainty in Schema Matching with Top-K Schema Mappings

  • Conference paper
Journal on Data Semantics VI

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 4090))

Abstract

In this paper, we propose to extend current practice in schema matching with the simultaneous use of top-K schema mappings rather than a single best mapping. This is a natural extension of existing methods (which can be considered to fall into the top-1 category), taking into account the imprecision inherent in the schema matching process. The essence of this method is the simultaneous generation and examination of K best schema mappings to identify useful mappings. The paper discusses efficient methods for generating top-K methods and propose a generic methodology for the simultaneous utilization of top-K mappings. We also propose a concrete heuristic that aims at improving precision at the cost of recall. We have tested the heuristic on real as well as synthetic data and anlyze the emricial results.

The novelty of this paper lies in the robust extension of existing methods for schema matching, one that can gracefully accommodate less-than-perfect scenarios in which the exact mapping cannot be identified in a single iteration. Our proposal represents a step forward in achieving fully automated schema matching, which is currently semi-automated at best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aitchison, J., Gilchrist, A., Bawden, D.: Thesaurus construction and use: a practical manual, 3rd edn. Aslib, London (1997)

    Google Scholar 

  2. Anaby-Tavor, A.: Enhancing the formal similarity based matching model. Master’s thesis, Technion-Israel Institute of Technology (May 2003)

    Google Scholar 

  3. Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. Data & Knowledge Engineering 36(3) (2001)

    Google Scholar 

  4. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic Web. Scientific American (May 2001)

    Google Scholar 

  5. Brodie, M.: The grand challenge in information technology and the illusion of validity. In: Keynote lecture at the International Federated Conference On the Move to Meaningful Internet Systems and Ubiquitous Computing (2002)

    Google Scholar 

  6. Castano, S., De Antonellis, V., Fugini, M.G., Pernici, B.: Conceptual schema analysis: Techniques and applications. ACM Transactions on Database Systems (TODS) 23(3), 286–332 (1998)

    Article  Google Scholar 

  7. Chegireddy, C.R., Hamacher, H.W.: Algorithms for finding k-best perfect matchings. Discrete Applied Mathematics 18, 155–165 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  8. Convent, B.: Unsolvable problems related to the view integration approach. In: Goos, G., Hartmanis, J. (eds.) ICDT 1986. LNCS, vol. 243, pp. 141–156. Springer, Heidelberg (1986)

    Google Scholar 

  9. Do, H.H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on very Large Data Bases (VLDB), pp. 610–621 (2002)

    Google Scholar 

  10. Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California. ACM Press, New York (May 2001)

    Google Scholar 

  11. Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Learning to map between ontologies on the semantic web. In: Proceedings of the eleventh international conference on World Wide Web, pp. 662–673. ACM Press, New York (2002)

    Chapter  Google Scholar 

  12. Ehrig, M., Staab, S.: Qom quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  13. Noy, N.F., Musen, M.A.: PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, TX, pp. 450–455 (2000)

    Google Scholar 

  14. Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)

    Article  Google Scholar 

  15. Gal, A., Modica, G., Jamil, H.M., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1) (2005)

    Google Scholar 

  16. Galil, Z.: Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys 18(1), 23–38 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  17. Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries in image databases. In: Proceedings of the Twenty Sixth Very Large Databases (VLDB) Conference, Las Vegas, pp. 419–428 (2001)

    Google Scholar 

  18. Hamacher, H.W., Queyranne, M.: K-best solutions to combinatorial optimization problems. Annals of Operations Research 4, 123–143 (1985/6)

    Article  MathSciNet  Google Scholar 

  19. Heß, A., Kushmerick, N.: Learning to attach semantic metadata to web services. In: Proceedings of the Second Semantic Web Conference (2003)

    Google Scholar 

  20. Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)

    Chapter  Google Scholar 

  21. Jarrar, M., Meersman, R.: Formal ontology engineering in the DOGMA approach. In: Proceedings International Federated Conference On the Move to Meaningful Internet Systems and Ubiquitous Computing, pp. 238–1254 (October 2002)

    Google Scholar 

  22. Mehlhorn, K., Naher, S. (eds.): LEDA, A platform for combinatorial and geometric computing. Cambridge University Press, Cambridge (1999)

    MATH  Google Scholar 

  23. Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 2nd edn. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  24. Madhavan, J., Bernstein, P.A., Domingos, P., Halevy, A.Y.: Representing and reasoning about mappings between domain models. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pp. 80–86 (2002)

    Google Scholar 

  25. Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), San Diego, California, pp. 193–204. ACM Press, New York (2003)

    Google Scholar 

  26. Miller, R.J., Haas, L.M., Hernández, M.A.: Schema mapping as query discovery. In: El Abbadi, A., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of the International conference on very Large Data Bases (VLDB), pp. 77–88. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  27. Miller, R.J., Hernàndez, M.A., Haas, L.M., Yan, L.-L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio project: Managing heterogeneity. SIGMOD Record 30(1), 78–83 (2001)

    Article  Google Scholar 

  28. Modica, G., Gal, A., Jamil, H.: The use of machine-generated ontologies in dynamic information seeking. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 433–448. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  29. Murty, K.G.: An algorithm for ranking all the assignments in order of increasing cost. Operations Research 16, 682–687 (1968)

    Article  MATH  Google Scholar 

  30. Pascoal, M., Captivo, M.E., Cl’imaco, J.: A note on a new variant of Murty’s ranking assignments algorithm. Quarterly Journal of the Belgian, French and Italian Operations Research Societies 1(3), 243–255 (2003)

    MATH  MathSciNet  Google Scholar 

  31. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  32. Sheth, A., Larson, J.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22(3), 183–236 (1990)

    Article  Google Scholar 

  33. Sheth, A.P., Gala, S.K., Navathe, S.B.: On automatic reasoning for schema integration. Intenational Journal on Intelligent Cooperative Information Systems (IJICIS) 2(1), 23–50 (1993)

    Article  Google Scholar 

  34. Spyns, P., Meersman, R., Jarrar, M.: Data modelling versus ontology engineering. ACM SIGMOD Record 31(4) (2002)

    Google Scholar 

  35. Vickery, B.C.: Faceted classification schemes. Graduate School of Library Service, Rutgers, the State University, New Brunswick, N.J. (1966)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gal, A. (2006). Managing Uncertainty in Schema Matching with Top-K Schema Mappings. In: Spaccapietra, S., Aberer, K., Cudré-Mauroux, P. (eds) Journal on Data Semantics VI. Lecture Notes in Computer Science, vol 4090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11803034_5

Download citation

  • DOI: https://doi.org/10.1007/11803034_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36712-3

  • Online ISBN: 978-3-540-36871-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics