Abstract
In this paper, we propose to extend current practice in schema matching with the simultaneous use of top-K schema mappings rather than a single best mapping. This is a natural extension of existing methods (which can be considered to fall into the top-1 category), taking into account the imprecision inherent in the schema matching process. The essence of this method is the simultaneous generation and examination of K best schema mappings to identify useful mappings. The paper discusses efficient methods for generating top-K methods and propose a generic methodology for the simultaneous utilization of top-K mappings. We also propose a concrete heuristic that aims at improving precision at the cost of recall. We have tested the heuristic on real as well as synthetic data and anlyze the emricial results.
The novelty of this paper lies in the robust extension of existing methods for schema matching, one that can gracefully accommodate less-than-perfect scenarios in which the exact mapping cannot be identified in a single iteration. Our proposal represents a step forward in achieving fully automated schema matching, which is currently semi-automated at best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aitchison, J., Gilchrist, A., Bawden, D.: Thesaurus construction and use: a practical manual, 3rd edn. Aslib, London (1997)
Anaby-Tavor, A.: Enhancing the formal similarity based matching model. Master’s thesis, Technion-Israel Institute of Technology (May 2003)
Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic integration of heterogeneous information sources. Data & Knowledge Engineering 36(3) (2001)
Berners-Lee, T., Hendler, J., Lassila, O.: The semantic Web. Scientific American (May 2001)
Brodie, M.: The grand challenge in information technology and the illusion of validity. In: Keynote lecture at the International Federated Conference On the Move to Meaningful Internet Systems and Ubiquitous Computing (2002)
Castano, S., De Antonellis, V., Fugini, M.G., Pernici, B.: Conceptual schema analysis: Techniques and applications. ACM Transactions on Database Systems (TODS) 23(3), 286–332 (1998)
Chegireddy, C.R., Hamacher, H.W.: Algorithms for finding k-best perfect matchings. Discrete Applied Mathematics 18, 155–165 (1987)
Convent, B.: Unsolvable problems related to the view integration approach. In: Goos, G., Hartmanis, J. (eds.) ICDT 1986. LNCS, vol. 243, pp. 141–156. Springer, Heidelberg (1986)
Do, H.H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on very Large Data Bases (VLDB), pp. 610–621 (2002)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California. ACM Press, New York (May 2001)
Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Learning to map between ontologies on the semantic web. In: Proceedings of the eleventh international conference on World Wide Web, pp. 662–673. ACM Press, New York (2002)
Ehrig, M., Staab, S.: Qom quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)
Noy, N.F., Musen, M.A.: PROMPT: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, TX, pp. 450–455 (2000)
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)
Gal, A., Modica, G., Jamil, H.M., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1) (2005)
Galil, Z.: Efficient algorithms for finding maximum matching in graphs. ACM Computing Surveys 18(1), 23–38 (1986)
Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries in image databases. In: Proceedings of the Twenty Sixth Very Large Databases (VLDB) Conference, Las Vegas, pp. 419–428 (2001)
Hamacher, H.W., Queyranne, M.: K-best solutions to combinatorial optimization problems. Annals of Operations Research 4, 123–143 (1985/6)
Heß, A., Kushmerick, N.: Learning to attach semantic metadata to web services. In: Proceedings of the Second Semantic Web Conference (2003)
Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)
Jarrar, M., Meersman, R.: Formal ontology engineering in the DOGMA approach. In: Proceedings International Federated Conference On the Move to Meaningful Internet Systems and Ubiquitous Computing, pp. 238–1254 (October 2002)
Mehlhorn, K., Naher, S. (eds.): LEDA, A platform for combinatorial and geometric computing. Cambridge University Press, Cambridge (1999)
Korte, B., Vygen, J.: Combinatorial Optimization: Theory and Algorithms, 2nd edn. Springer, Heidelberg (2002)
Madhavan, J., Bernstein, P.A., Domingos, P., Halevy, A.Y.: Representing and reasoning about mappings between domain models. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pp. 80–86 (2002)
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: A programming platform for generic model management. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), San Diego, California, pp. 193–204. ACM Press, New York (2003)
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema mapping as query discovery. In: El Abbadi, A., Brodie, M.L., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of the International conference on very Large Data Bases (VLDB), pp. 77–88. Morgan Kaufmann, San Francisco (2000)
Miller, R.J., Hernàndez, M.A., Haas, L.M., Yan, L.-L., Ho, C.T.H., Fagin, R., Popa, L.: The Clio project: Managing heterogeneity. SIGMOD Record 30(1), 78–83 (2001)
Modica, G., Gal, A., Jamil, H.: The use of machine-generated ontologies in dynamic information seeking. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 433–448. Springer, Heidelberg (2001)
Murty, K.G.: An algorithm for ranking all the assignments in order of increasing cost. Operations Research 16, 682–687 (1968)
Pascoal, M., Captivo, M.E., Cl’imaco, J.: A note on a new variant of Murty’s ranking assignments algorithm. Quarterly Journal of the Belgian, French and Italian Operations Research Societies 1(3), 243–255 (2003)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Sheth, A., Larson, J.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22(3), 183–236 (1990)
Sheth, A.P., Gala, S.K., Navathe, S.B.: On automatic reasoning for schema integration. Intenational Journal on Intelligent Cooperative Information Systems (IJICIS) 2(1), 23–50 (1993)
Spyns, P., Meersman, R., Jarrar, M.: Data modelling versus ontology engineering. ACM SIGMOD Record 31(4) (2002)
Vickery, B.C.: Faceted classification schemes. Graduate School of Library Service, Rutgers, the State University, New Brunswick, N.J. (1966)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gal, A. (2006). Managing Uncertainty in Schema Matching with Top-K Schema Mappings. In: Spaccapietra, S., Aberer, K., Cudré-Mauroux, P. (eds) Journal on Data Semantics VI. Lecture Notes in Computer Science, vol 4090. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11803034_5
Download citation
DOI: https://doi.org/10.1007/11803034_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36712-3
Online ISBN: 978-3-540-36871-7
eBook Packages: Computer ScienceComputer Science (R0)