Abstract.
The introduction of the Semantic Web vision and the shift toward machine understandable Web resources has unearthed the importance of automatic semantic reconciliation. Consequently, new tools for automating the process were proposed. In this work we present a formal model of semantic reconciliation and analyze in a systematic manner the properties of the process outcome, primarily the inherent uncertainty of the matching process and how it reflects on the resulting mappings. An important feature of this research is the identification and analysis of factors that impact the effectiveness of algorithms for automatic semantic reconciliation, leading, it is hoped, to the design of better algorithms by reducing the uncertainty of existing algorithms. Against this background we empirically study the aptitude of two algorithms to correctly match concepts. This research is both timely and practical in light of recent attempts to develop and utilize methods for automatic semantic reconciliation.
Similar content being viewed by others
References
Aitchison J, Gilchrist A, Bawden D (1997) Thesaurus construction and use: a practical manual, 3rd edn. Aslib, London
Anaby-Tavor A (2003) Enhancing the formal similarity based matching model. Master’s thesis, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
Aref WG, Barbará D, Johnson S, Mehrotra S (1995) Efficient processing of proximity queries for large databases. In: Yu PS, Chen ALP (eds) Proceedings of the IEEE CS international conference on data engineering, Taipei, Taiwan, 6-10 March 1995. IEEE Press, New York, pp 147-154
Arens Y, Knoblock CA, Shen W (1996) Query reformulation for dynamic information integration. In: Wiederhold G (ed) Intelligent integration of information. Kluwer, Dordrecht, pp 11-42
Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration of heterogeneous information sources. Data Knowl Eng 36(3): 215-249
Berlin J, Motro A (2001) Autoplex: automated discovery of content for virtual databases. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 108-122
Bernstein PA (2001) Generic model management. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 1-6
Brodie M (2002) The grand challenge in information technology and the illusion of validity. Keynote lecture at the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, 30 October-1 November 2002
Castano S, de Antonellis V, Fugini MG, Pernici B (1998) Conceptual schema analysis: techniques and applications. ACM Trans Database Sys 23(3):286-332
Convent B (1986) Unsolvable problems related to the view integration approach. In: Proceedings of the international conference on database theory (ICDT), Rome, Italy, September 1986. Also in: Goos G, Hartmanis J (eds) Computer science, vol 243. Springer, Berlin Heidelberg New York, pp 141-156
Davis LS, Roussopoulos N (1980) Approximate pattern matching in a pattern database system. Inf Sys 5(2):107-119
DeMichiel LG (1989) Performing operations over mismatched domains. In: Proceedings of the IEEE CS international conference on data engineering, Los Angeles, February 1989, pp 36-45
DeMichiel LG (1989) Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans Knowl Data Eng 1(4):485-493
Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref WG (ed) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Santa Barbara, CA, May 2001. ACM Press, New York
Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map between ontologies on the semantic web. In: Proceedings of the 11th international conference on the World Wide Web, Honolulu, HI, 7-11 May 2002. ACM Press, New York, pp 662-673
Domingos P, Pazzani M (1996) Conditions for the optimality of the simple bayesian classifier. In: Proceedings of the 13th international conference on machine learning, Bari, Italy, 3-6 July 1996, pp 105-112
Drakopoulos J (1995) Probabilities, possibilities and fuzzy sets. Int J Fuzzy Sets Sys 75(1):1-15
Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international World Wide Web conference (WWW 10), Hong Kong, China, May 2001, pp 613-622
Eiter T, Lukasiewicz T, Walter M (2000) Extension of the relational algebra to probabilistic complex values. In: Thalheim B, Schewe KD (eds) Lecture notes in computer science, vol 1762. Springer, Berlin Heidelberg New York, pp 94-115
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58:83-99
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, 21-23 May 2001. ACM Press, New York
Fagin R, Wimmers E (1997) Incorporating user preferences in multimedia queries. In: Lecture notes in computer science, vol 1186. Springer, Berlin Heidelberg New York, pp 247-261
Fox C (1992) Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ, pp 102-130
Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ
Francis W, Kucera H (eds) Frequency analysis of English usage. Houghton Mifflin, New York
Fridman Noy N, Fergerson RW, Musen MA (1937) The knowledge model of prot’eg’e: combining interoperability and flexibility. In: Proceedings of the 12th international conference on knowledge acquisition, modeling and management (EKAW 2000), Juan-les-Pins, France, 2-6 October 2000. Lecture notes in computer science, vol 1937. Springer, Berlin Heidelberg New York, pp 17-32
Fridman Noy N, Musen MA (1999) Smart: automated support for ontology merging and alignment. In: Proceedings of the 12th Banff workshop on knowledge acquisition, modeling and management, Banff, Alberta, Canada, 16-21 October 1999
Fridman Noy N, Musen MA (2000) PROMPT: algorithm and tool for automated ontology merging and alignment. In: Proceedings of the 17th national conference on artificial intelligence (AAAI-2000), Austin, TX, 30 July-3 August 2000, pp 450-455
Gal A (1999) Semantic interoperability in information services: experiencing with CoopWARE. SIGMOD Rec 28(1):68-75
Gal A, Modica G, Jamil HM (2003) Automatic ontology matching using application semantics. Submitted for publication. Available upon request from avigal@ie.technion.ac.il
Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18(1):23-38
Gonzales RC, Thomanson MG (1978) Syntactic pattern recognition - an introduction. Addison-Wesley, Reading, MA
Hajek P (1998) The metamathematics of fuzzy logic. Kluwer, Dordrecht
Hull R (1997) Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Tucson, AZ, 13-15 May 1997. ACM Press, New York, pp 51-61
Jarrar M, Meersman R (2002) Formal ontology engineering in the DOGMA approach. In: Proceedings of the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, October 2002, pp 1238-1254
Kahng J, McLeod D (1996) Dynamic classification ontologies for discovery in cooperative federated databases. In: Proceedings of the 1st IFCIS international conference on cooperative information systems (CoopIS’96), Brussels, Belgium, June 1996, pp 26-35
Klement EP, Mesiar R, Pap E (2000) Triangular norms. Kluwer, Dordrecht
Klir GJ, Yuan B (eds) Fuzzy sets and fuzzy logic. Prentice-Hall, Englewood Cliffs, NJ
Lakshmanan LVS, Leone N, Ross R, Subrahmanian VS (1997) Probview: A flexible probabilistic database system. ACM Trans Database Sys (TODS) 22(3):419-469
Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: Proceedings of the 10th national conference on artificial intelligence, San Jose, CA, 12-16 July 1992, pp 223-228
Levenstein IV (1966) Binary codes capable of correcting deletions, insertions, and revrsals. Cybern Control Theory 10(8):707-710
Madhavan J, Bernstein PA, Domingos P, Halevy AY (2002) Representing and reasoning about mappings between domain models. In: Proceedings of the 18th national conference on artificial intelligence and the 14th conference on innovative applications of artificial intelligence (AAAI/IAAI), Edmonton, Alberta, Canada, 28 July-1 August 2002, pp 80-86
Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with Cupid. In: Proceedings of the international conference on very large data bases (VLDB), Rome, Italy, September 2001, pp 49-58
Maedche A, Staab S (2002) Measuring similarity between ontologies. In: Proceedings of the 13th international conference on knowledge engineering and knowledge management: ontologies and the semantic Web (EKAW 2002), Siguenza, Spain, October 2002, pp 251-263
McGuinness DL, Fikes R, Rice J, Wilder S (2000) An environment for merging and testing large ontologies. In: Proceedings of the 7th international conference on principles of knowledge representation and reasoning (KR2000), Breckenridge, CO, 11-15 April 2000, pp 483-493
Mena E, Kashayap V, Illarramendi A, Sheth A (2000) Imprecise answers in distributed environments: Estimation of information loss for multi-ontological based query processing. Int J Coop Inf Sys 9(4):403-425
Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: El Abbadi A, Brodie ML, Chakravarthy S, Dayal U, Kamel N, Schlageter G, Whang K-Y (eds) Proceedings of the international conference on very large data bases (VLDB), Cairo, Egypt, 10-14 September 2000. Morgan Kaufmann, San Francisco, pp 77-88
Miller RJ, Hernández MA, Haas LM, Yan L-L, Ho CTH, Fagin R, Popa L (2001) The Clio project: managing heterogeneity. SIGMOD Rec 30(1):78-83
Modica G, Gal A, Jamil H (2001) The use of machine-generated ontologies in dynamic information seeking. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) In: Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 433-448
Moulton A, Madnick SE, Siegel M (1998) Context mediation on Wall Street. In: Proceedings of the 3rd IFCIS international conference on cooperative information systems (CoopIS’98), New York, August 1998. IEEE-CS Press, New York, pp 271-279
Nadler M, Smith E (1993) Pattern recognition engineering. Wiley, New York
Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Haas LM, Tiwary A (eds) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Seattle, June 1998. ACM Press, New York, pp 295-306
Omelayenko B (2002) RDFT: a mapping meta-ontology for business integration. In: Proceedings of the workshop on knowledge transformation for the semantic Web (KTSW 2002) at the 15th European conference on artificial intelligence, Lyon, France, July 2002, pp 76-83
Ouksel AM, Naiman CF (1994) Coordinating context building in heterogeneous information systems. J Intell Inf Sys 3(2):151-183
Palopoli L, Terracina LG, Ursino D (2000) The system DIKE: towards the semi-automatic synthesis of cooperative information systems and data warehouses. In: Proceedings of current issues in databases and information systems, East European conference on advances in databases and information systems. Held jointly with the international conference on database systems for advanced applications (ADBIS-DASFAA 2000), Prague, Czech Republic, 5-8 September 2000, pp 108-117
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. J Very Large Data Bases 10(4):334-350
Schalkoff R (1992) Pattern recognition: statistical, structural, and neural approaches. Wiley, New York
Schuyler PL, Hole WT, Tuttle MS (1993) The UMLS (Unified Medical Language System) metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81:217-222
Sheth A, Larson J (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput Surv 22(3):183-236
Sheth A, Rusinkiewicz M (1993) On transactional workflows. Data Eng Bull 16(2):37-40
Soergel D (1985) Organizing information: principles of data base and retrieval systems. Academic, Orlando
Spyns P, Meersman R, Jarrar M (2002) Data modelling versus ontology engineering. ACM SIGMOD Rec 31(4):12-17
Valtchev P, Euzenat J (1997) Dissimilarity measure for collections of objects and values. In: Liu X, Cohen PR, Berthold MR (eds) Proceedings of the 2nd international symposium on advances in intelligent data analysis, reasoning about data (IDA-97), London, 4-6 August 1997. Lecture notes in computer science, vol 1280. Springer, Berlin Heidelberg New York, pp 259-272
Van Harmelen F, Fensel D (1999) Practical knowledge representation for the web. In: Proceedings of the IJCAI-99 workshop on intelligent information integration, in conjunction with the 16th international joint conference on artificial intelligence, Stockholm, Sweden, 31 July 1999. Proceedings of the CEUR workshop, Stockholm, Sweden, 31 July 1999, vol 23
Vickery BC (1966) Faceted classification schemes. Graduate School of Library Service, Rutgers State University, New Brunswick, NJ
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 6 December 2002, Accepted: 15 September 2003, Published online: 19 December 2003
Edited by: V. Atluri.
Rights and permissions
About this article
Cite this article
Gal, A., Anaby-Tavor, A., Trombetta, A. et al. A framework for modeling and evaluating automatic semantic reconciliation. The VLDB Journal 14, 50–67 (2005). https://doi.org/10.1007/s00778-003-0115-z
Issue Date:
DOI: https://doi.org/10.1007/s00778-003-0115-z