Skip to main content
Log in

A framework for modeling and evaluating automatic semantic reconciliation

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

The introduction of the Semantic Web vision and the shift toward machine understandable Web resources has unearthed the importance of automatic semantic reconciliation. Consequently, new tools for automating the process were proposed. In this work we present a formal model of semantic reconciliation and analyze in a systematic manner the properties of the process outcome, primarily the inherent uncertainty of the matching process and how it reflects on the resulting mappings. An important feature of this research is the identification and analysis of factors that impact the effectiveness of algorithms for automatic semantic reconciliation, leading, it is hoped, to the design of better algorithms by reducing the uncertainty of existing algorithms. Against this background we empirically study the aptitude of two algorithms to correctly match concepts. This research is both timely and practical in light of recent attempts to develop and utilize methods for automatic semantic reconciliation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aitchison J, Gilchrist A, Bawden D (1997) Thesaurus construction and use: a practical manual, 3rd edn. Aslib, London

    Google Scholar 

  2. Anaby-Tavor A (2003) Enhancing the formal similarity based matching model. Master’s thesis, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel

  3. Aref WG, Barbará D, Johnson S, Mehrotra S (1995) Efficient processing of proximity queries for large databases. In: Yu PS, Chen ALP (eds) Proceedings of the IEEE CS international conference on data engineering, Taipei, Taiwan, 6-10 March 1995. IEEE Press, New York, pp 147-154

  4. Arens Y, Knoblock CA, Shen W (1996) Query reformulation for dynamic information integration. In: Wiederhold G (ed) Intelligent integration of information. Kluwer, Dordrecht, pp 11-42

  5. Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration of heterogeneous information sources. Data Knowl Eng 36(3): 215-249

    Article  Google Scholar 

  6. Berlin J, Motro A (2001) Autoplex: automated discovery of content for virtual databases. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 108-122

  7. Bernstein PA (2001) Generic model management. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 1-6

  8. Brodie M (2002) The grand challenge in information technology and the illusion of validity. Keynote lecture at the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, 30 October-1 November 2002

  9. Castano S, de Antonellis V, Fugini MG, Pernici B (1998) Conceptual schema analysis: techniques and applications. ACM Trans Database Sys 23(3):286-332

    Google Scholar 

  10. Convent B (1986) Unsolvable problems related to the view integration approach. In: Proceedings of the international conference on database theory (ICDT), Rome, Italy, September 1986. Also in: Goos G, Hartmanis J (eds) Computer science, vol 243. Springer, Berlin Heidelberg New York, pp 141-156

    Google Scholar 

  11. Davis LS, Roussopoulos N (1980) Approximate pattern matching in a pattern database system. Inf Sys 5(2):107-119

    Google Scholar 

  12. DeMichiel LG (1989) Performing operations over mismatched domains. In: Proceedings of the IEEE CS international conference on data engineering, Los Angeles, February 1989, pp 36-45

  13. DeMichiel LG (1989) Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans Knowl Data Eng 1(4):485-493

    Google Scholar 

  14. Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref WG (ed) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Santa Barbara, CA, May 2001. ACM Press, New York

  15. Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map between ontologies on the semantic web. In: Proceedings of the 11th international conference on the World Wide Web, Honolulu, HI, 7-11 May 2002. ACM Press, New York, pp 662-673

  16. Domingos P, Pazzani M (1996) Conditions for the optimality of the simple bayesian classifier. In: Proceedings of the 13th international conference on machine learning, Bari, Italy, 3-6 July 1996, pp 105-112

  17. Drakopoulos J (1995) Probabilities, possibilities and fuzzy sets. Int J Fuzzy Sets Sys 75(1):1-15

    Google Scholar 

  18. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international World Wide Web conference (WWW 10), Hong Kong, China, May 2001, pp 613-622

  19. Eiter T, Lukasiewicz T, Walter M (2000) Extension of the relational algebra to probabilistic complex values. In: Thalheim B, Schewe KD (eds) Lecture notes in computer science, vol 1762. Springer, Berlin Heidelberg New York, pp 94-115

  20. Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58:83-99

    Google Scholar 

  21. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, 21-23 May 2001. ACM Press, New York

  22. Fagin R, Wimmers E (1997) Incorporating user preferences in multimedia queries. In: Lecture notes in computer science, vol 1186. Springer, Berlin Heidelberg New York, pp 247-261

  23. Fox C (1992) Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ, pp 102-130

  24. Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ

  25. Francis W, Kucera H (eds) Frequency analysis of English usage. Houghton Mifflin, New York

  26. Fridman Noy N, Fergerson RW, Musen MA (1937) The knowledge model of prot’eg’e: combining interoperability and flexibility. In: Proceedings of the 12th international conference on knowledge acquisition, modeling and management (EKAW 2000), Juan-les-Pins, France, 2-6 October 2000. Lecture notes in computer science, vol 1937. Springer, Berlin Heidelberg New York, pp 17-32

  27. Fridman Noy N, Musen MA (1999) Smart: automated support for ontology merging and alignment. In: Proceedings of the 12th Banff workshop on knowledge acquisition, modeling and management, Banff, Alberta, Canada, 16-21 October 1999

  28. Fridman Noy N, Musen MA (2000) PROMPT: algorithm and tool for automated ontology merging and alignment. In: Proceedings of the 17th national conference on artificial intelligence (AAAI-2000), Austin, TX, 30 July-3 August 2000, pp 450-455

  29. Gal A (1999) Semantic interoperability in information services: experiencing with CoopWARE. SIGMOD Rec 28(1):68-75

    Google Scholar 

  30. Gal A, Modica G, Jamil HM (2003) Automatic ontology matching using application semantics. Submitted for publication. Available upon request from avigal@ie.technion.ac.il

  31. Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18(1):23-38

    Google Scholar 

  32. Gonzales RC, Thomanson MG (1978) Syntactic pattern recognition - an introduction. Addison-Wesley, Reading, MA

  33. Hajek P (1998) The metamathematics of fuzzy logic. Kluwer, Dordrecht

  34. Hull R (1997) Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Tucson, AZ, 13-15 May 1997. ACM Press, New York, pp 51-61

  35. Jarrar M, Meersman R (2002) Formal ontology engineering in the DOGMA approach. In: Proceedings of the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, October 2002, pp 1238-1254

  36. Kahng J, McLeod D (1996) Dynamic classification ontologies for discovery in cooperative federated databases. In: Proceedings of the 1st IFCIS international conference on cooperative information systems (CoopIS’96), Brussels, Belgium, June 1996, pp 26-35

  37. Klement EP, Mesiar R, Pap E (2000) Triangular norms. Kluwer, Dordrecht

  38. Klir GJ, Yuan B (eds) Fuzzy sets and fuzzy logic. Prentice-Hall, Englewood Cliffs, NJ

  39. Lakshmanan LVS, Leone N, Ross R, Subrahmanian VS (1997) Probview: A flexible probabilistic database system. ACM Trans Database Sys (TODS) 22(3):419-469

    Google Scholar 

  40. Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: Proceedings of the 10th national conference on artificial intelligence, San Jose, CA, 12-16 July 1992, pp 223-228

  41. Levenstein IV (1966) Binary codes capable of correcting deletions, insertions, and revrsals. Cybern Control Theory 10(8):707-710

    Google Scholar 

  42. Madhavan J, Bernstein PA, Domingos P, Halevy AY (2002) Representing and reasoning about mappings between domain models. In: Proceedings of the 18th national conference on artificial intelligence and the 14th conference on innovative applications of artificial intelligence (AAAI/IAAI), Edmonton, Alberta, Canada, 28 July-1 August 2002, pp 80-86

  43. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with Cupid. In: Proceedings of the international conference on very large data bases (VLDB), Rome, Italy, September 2001, pp 49-58

  44. Maedche A, Staab S (2002) Measuring similarity between ontologies. In: Proceedings of the 13th international conference on knowledge engineering and knowledge management: ontologies and the semantic Web (EKAW 2002), Siguenza, Spain, October 2002, pp 251-263

  45. McGuinness DL, Fikes R, Rice J, Wilder S (2000) An environment for merging and testing large ontologies. In: Proceedings of the 7th international conference on principles of knowledge representation and reasoning (KR2000), Breckenridge, CO, 11-15 April 2000, pp 483-493

  46. Mena E, Kashayap V, Illarramendi A, Sheth A (2000) Imprecise answers in distributed environments: Estimation of information loss for multi-ontological based query processing. Int J Coop Inf Sys 9(4):403-425

    Google Scholar 

  47. Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: El Abbadi A, Brodie ML, Chakravarthy S, Dayal U, Kamel N, Schlageter G, Whang K-Y (eds) Proceedings of the international conference on very large data bases (VLDB), Cairo, Egypt, 10-14 September 2000. Morgan Kaufmann, San Francisco, pp 77-88

  48. Miller RJ, Hernández MA, Haas LM, Yan L-L, Ho CTH, Fagin R, Popa L (2001) The Clio project: managing heterogeneity. SIGMOD Rec 30(1):78-83

    Google Scholar 

  49. Modica G, Gal A, Jamil H (2001) The use of machine-generated ontologies in dynamic information seeking. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) In: Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 433-448

  50. Moulton A, Madnick SE, Siegel M (1998) Context mediation on Wall Street. In: Proceedings of the 3rd IFCIS international conference on cooperative information systems (CoopIS’98), New York, August 1998. IEEE-CS Press, New York, pp 271-279

  51. Nadler M, Smith E (1993) Pattern recognition engineering. Wiley, New York

  52. Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Haas LM, Tiwary A (eds) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Seattle, June 1998. ACM Press, New York, pp 295-306

  53. Omelayenko B (2002) RDFT: a mapping meta-ontology for business integration. In: Proceedings of the workshop on knowledge transformation for the semantic Web (KTSW 2002) at the 15th European conference on artificial intelligence, Lyon, France, July 2002, pp 76-83

  54. Ouksel AM, Naiman CF (1994) Coordinating context building in heterogeneous information systems. J Intell Inf Sys 3(2):151-183

    Google Scholar 

  55. Palopoli L, Terracina LG, Ursino D (2000) The system DIKE: towards the semi-automatic synthesis of cooperative information systems and data warehouses. In: Proceedings of current issues in databases and information systems, East European conference on advances in databases and information systems. Held jointly with the international conference on database systems for advanced applications (ADBIS-DASFAA 2000), Prague, Czech Republic, 5-8 September 2000, pp 108-117

  56. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. J Very Large Data Bases 10(4):334-350

    Google Scholar 

  57. Schalkoff R (1992) Pattern recognition: statistical, structural, and neural approaches. Wiley, New York

    Google Scholar 

  58. Schuyler PL, Hole WT, Tuttle MS (1993) The UMLS (Unified Medical Language System) metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81:217-222

    Google Scholar 

  59. Sheth A, Larson J (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput Surv 22(3):183-236

    Article  Google Scholar 

  60. Sheth A, Rusinkiewicz M (1993) On transactional workflows. Data Eng Bull 16(2):37-40

    Google Scholar 

  61. Soergel D (1985) Organizing information: principles of data base and retrieval systems. Academic, Orlando

    Google Scholar 

  62. Spyns P, Meersman R, Jarrar M (2002) Data modelling versus ontology engineering. ACM SIGMOD Rec 31(4):12-17

    Google Scholar 

  63. Valtchev P, Euzenat J (1997) Dissimilarity measure for collections of objects and values. In: Liu X, Cohen PR, Berthold MR (eds) Proceedings of the 2nd international symposium on advances in intelligent data analysis, reasoning about data (IDA-97), London, 4-6 August 1997. Lecture notes in computer science, vol 1280. Springer, Berlin Heidelberg New York, pp 259-272

  64. Van Harmelen F, Fensel D (1999) Practical knowledge representation for the web. In: Proceedings of the IJCAI-99 workshop on intelligent information integration, in conjunction with the 16th international joint conference on artificial intelligence, Stockholm, Sweden, 31 July 1999. Proceedings of the CEUR workshop, Stockholm, Sweden, 31 July 1999, vol 23

  65. Vickery BC (1966) Faceted classification schemes. Graduate School of Library Service, Rutgers State University, New Brunswick, NJ

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avigdor Gal.

Additional information

Received: 6 December 2002, Accepted: 15 September 2003, Published online: 19 December 2003

Edited by: V. Atluri.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gal, A., Anaby-Tavor, A., Trombetta, A. et al. A framework for modeling and evaluating automatic semantic reconciliation. The VLDB Journal 14, 50–67 (2005). https://doi.org/10.1007/s00778-003-0115-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-003-0115-z

Keywords:

Navigation