Advertisement

The VLDB Journal

, Volume 14, Issue 1, pp 50–67 | Cite as

A framework for modeling and evaluating automatic semantic reconciliation

  • Avigdor Gal
  • Ateret Anaby-Tavor
  • Alberto Trombetta
  • Danilo Montesi
Regular Paper

Abstract.

The introduction of the Semantic Web vision and the shift toward machine understandable Web resources has unearthed the importance of automatic semantic reconciliation. Consequently, new tools for automating the process were proposed. In this work we present a formal model of semantic reconciliation and analyze in a systematic manner the properties of the process outcome, primarily the inherent uncertainty of the matching process and how it reflects on the resulting mappings. An important feature of this research is the identification and analysis of factors that impact the effectiveness of algorithms for automatic semantic reconciliation, leading, it is hoped, to the design of better algorithms by reducing the uncertainty of existing algorithms. Against this background we empirically study the aptitude of two algorithms to correctly match concepts. This research is both timely and practical in light of recent attempts to develop and utilize methods for automatic semantic reconciliation.

Keywords:

Semantic interoperability Ontology versioning Mapping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aitchison J, Gilchrist A, Bawden D (1997) Thesaurus construction and use: a practical manual, 3rd edn. Aslib, LondonGoogle Scholar
  2. 2.
    Anaby-Tavor A (2003) Enhancing the formal similarity based matching model. Master’s thesis, Technion-Israel Institute of Technology, Technion City, Haifa 32000, IsraelGoogle Scholar
  3. 3.
    Aref WG, Barbará D, Johnson S, Mehrotra S (1995) Efficient processing of proximity queries for large databases. In: Yu PS, Chen ALP (eds) Proceedings of the IEEE CS international conference on data engineering, Taipei, Taiwan, 6-10 March 1995. IEEE Press, New York, pp 147-154Google Scholar
  4. 4.
    Arens Y, Knoblock CA, Shen W (1996) Query reformulation for dynamic information integration. In: Wiederhold G (ed) Intelligent integration of information. Kluwer, Dordrecht, pp 11-42Google Scholar
  5. 5.
    Bergamaschi S, Castano S, Vincini M, Beneventano D (2001) Semantic integration of heterogeneous information sources. Data Knowl Eng 36(3): 215-249CrossRefGoogle Scholar
  6. 6.
    Berlin J, Motro A (2001) Autoplex: automated discovery of content for virtual databases. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 108-122Google Scholar
  7. 7.
    Bernstein PA (2001) Generic model management. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 1-6Google Scholar
  8. 8.
    Brodie M (2002) The grand challenge in information technology and the illusion of validity. Keynote lecture at the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, 30 October-1 November 2002Google Scholar
  9. 9.
    Castano S, de Antonellis V, Fugini MG, Pernici B (1998) Conceptual schema analysis: techniques and applications. ACM Trans Database Sys 23(3):286-332Google Scholar
  10. 10.
    Convent B (1986) Unsolvable problems related to the view integration approach. In: Proceedings of the international conference on database theory (ICDT), Rome, Italy, September 1986. Also in: Goos G, Hartmanis J (eds) Computer science, vol 243. Springer, Berlin Heidelberg New York, pp 141-156Google Scholar
  11. 11.
    Davis LS, Roussopoulos N (1980) Approximate pattern matching in a pattern database system. Inf Sys 5(2):107-119Google Scholar
  12. 12.
    DeMichiel LG (1989) Performing operations over mismatched domains. In: Proceedings of the IEEE CS international conference on data engineering, Los Angeles, February 1989, pp 36-45Google Scholar
  13. 13.
    DeMichiel LG (1989) Resolving database incompatibility: an approach to performing relational operations over mismatched domains. IEEE Trans Knowl Data Eng 1(4):485-493Google Scholar
  14. 14.
    Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref WG (ed) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Santa Barbara, CA, May 2001. ACM Press, New YorkGoogle Scholar
  15. 15.
    Doan A, Madhavan J, Domingos P, Halevy A (2002) Learning to map between ontologies on the semantic web. In: Proceedings of the 11th international conference on the World Wide Web, Honolulu, HI, 7-11 May 2002. ACM Press, New York, pp 662-673Google Scholar
  16. 16.
    Domingos P, Pazzani M (1996) Conditions for the optimality of the simple bayesian classifier. In: Proceedings of the 13th international conference on machine learning, Bari, Italy, 3-6 July 1996, pp 105-112Google Scholar
  17. 17.
    Drakopoulos J (1995) Probabilities, possibilities and fuzzy sets. Int J Fuzzy Sets Sys 75(1):1-15Google Scholar
  18. 18.
    Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international World Wide Web conference (WWW 10), Hong Kong, China, May 2001, pp 613-622Google Scholar
  19. 19.
    Eiter T, Lukasiewicz T, Walter M (2000) Extension of the relational algebra to probabilistic complex values. In: Thalheim B, Schewe KD (eds) Lecture notes in computer science, vol 1762. Springer, Berlin Heidelberg New York, pp 94-115Google Scholar
  20. 20.
    Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58:83-99Google Scholar
  21. 21.
    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, 21-23 May 2001. ACM Press, New YorkGoogle Scholar
  22. 22.
    Fagin R, Wimmers E (1997) Incorporating user preferences in multimedia queries. In: Lecture notes in computer science, vol 1186. Springer, Berlin Heidelberg New York, pp 247-261Google Scholar
  23. 23.
    Fox C (1992) Lexical analysis and stoplists. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ, pp 102-130Google Scholar
  24. 24.
    Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  25. 25.
    Francis W, Kucera H (eds) Frequency analysis of English usage. Houghton Mifflin, New YorkGoogle Scholar
  26. 26.
    Fridman Noy N, Fergerson RW, Musen MA (1937) The knowledge model of prot’eg’e: combining interoperability and flexibility. In: Proceedings of the 12th international conference on knowledge acquisition, modeling and management (EKAW 2000), Juan-les-Pins, France, 2-6 October 2000. Lecture notes in computer science, vol 1937. Springer, Berlin Heidelberg New York, pp 17-32Google Scholar
  27. 27.
    Fridman Noy N, Musen MA (1999) Smart: automated support for ontology merging and alignment. In: Proceedings of the 12th Banff workshop on knowledge acquisition, modeling and management, Banff, Alberta, Canada, 16-21 October 1999Google Scholar
  28. 28.
    Fridman Noy N, Musen MA (2000) PROMPT: algorithm and tool for automated ontology merging and alignment. In: Proceedings of the 17th national conference on artificial intelligence (AAAI-2000), Austin, TX, 30 July-3 August 2000, pp 450-455Google Scholar
  29. 29.
    Gal A (1999) Semantic interoperability in information services: experiencing with CoopWARE. SIGMOD Rec 28(1):68-75Google Scholar
  30. 30.
    Gal A, Modica G, Jamil HM (2003) Automatic ontology matching using application semantics. Submitted for publication. Available upon request from avigal@ie.technion.ac.ilGoogle Scholar
  31. 31.
    Galil Z (1986) Efficient algorithms for finding maximum matching in graphs. ACM Comput Surv 18(1):23-38Google Scholar
  32. 32.
    Gonzales RC, Thomanson MG (1978) Syntactic pattern recognition - an introduction. Addison-Wesley, Reading, MAGoogle Scholar
  33. 33.
    Hajek P (1998) The metamathematics of fuzzy logic. Kluwer, DordrechtGoogle Scholar
  34. 34.
    Hull R (1997) Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Tucson, AZ, 13-15 May 1997. ACM Press, New York, pp 51-61Google Scholar
  35. 35.
    Jarrar M, Meersman R (2002) Formal ontology engineering in the DOGMA approach. In: Proceedings of the international federated conference on the move to meaningful Internet systems and ubiquitous computing, Irvine, CA, October 2002, pp 1238-1254Google Scholar
  36. 36.
    Kahng J, McLeod D (1996) Dynamic classification ontologies for discovery in cooperative federated databases. In: Proceedings of the 1st IFCIS international conference on cooperative information systems (CoopIS’96), Brussels, Belgium, June 1996, pp 26-35Google Scholar
  37. 37.
    Klement EP, Mesiar R, Pap E (2000) Triangular norms. Kluwer, DordrechtGoogle Scholar
  38. 38.
    Klir GJ, Yuan B (eds) Fuzzy sets and fuzzy logic. Prentice-Hall, Englewood Cliffs, NJGoogle Scholar
  39. 39.
    Lakshmanan LVS, Leone N, Ross R, Subrahmanian VS (1997) Probview: A flexible probabilistic database system. ACM Trans Database Sys (TODS) 22(3):419-469Google Scholar
  40. 40.
    Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. In: Proceedings of the 10th national conference on artificial intelligence, San Jose, CA, 12-16 July 1992, pp 223-228Google Scholar
  41. 41.
    Levenstein IV (1966) Binary codes capable of correcting deletions, insertions, and revrsals. Cybern Control Theory 10(8):707-710Google Scholar
  42. 42.
    Madhavan J, Bernstein PA, Domingos P, Halevy AY (2002) Representing and reasoning about mappings between domain models. In: Proceedings of the 18th national conference on artificial intelligence and the 14th conference on innovative applications of artificial intelligence (AAAI/IAAI), Edmonton, Alberta, Canada, 28 July-1 August 2002, pp 80-86Google Scholar
  43. 43.
    Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with Cupid. In: Proceedings of the international conference on very large data bases (VLDB), Rome, Italy, September 2001, pp 49-58Google Scholar
  44. 44.
    Maedche A, Staab S (2002) Measuring similarity between ontologies. In: Proceedings of the 13th international conference on knowledge engineering and knowledge management: ontologies and the semantic Web (EKAW 2002), Siguenza, Spain, October 2002, pp 251-263Google Scholar
  45. 45.
    McGuinness DL, Fikes R, Rice J, Wilder S (2000) An environment for merging and testing large ontologies. In: Proceedings of the 7th international conference on principles of knowledge representation and reasoning (KR2000), Breckenridge, CO, 11-15 April 2000, pp 483-493Google Scholar
  46. 46.
    Mena E, Kashayap V, Illarramendi A, Sheth A (2000) Imprecise answers in distributed environments: Estimation of information loss for multi-ontological based query processing. Int J Coop Inf Sys 9(4):403-425Google Scholar
  47. 47.
    Miller RJ, Haas LM, Hernández MA (2000) Schema mapping as query discovery. In: El Abbadi A, Brodie ML, Chakravarthy S, Dayal U, Kamel N, Schlageter G, Whang K-Y (eds) Proceedings of the international conference on very large data bases (VLDB), Cairo, Egypt, 10-14 September 2000. Morgan Kaufmann, San Francisco, pp 77-88Google Scholar
  48. 48.
    Miller RJ, Hernández MA, Haas LM, Yan L-L, Ho CTH, Fagin R, Popa L (2001) The Clio project: managing heterogeneity. SIGMOD Rec 30(1):78-83Google Scholar
  49. 49.
    Modica G, Gal A, Jamil H (2001) The use of machine-generated ontologies in dynamic information seeking. In: Batini C, Giunchiglia F, Giorgini P, Mecella M (eds) In: Proceedings of the 9th international conference on cooperative information systems (CoopIS 2001), Trento, Italy, 5-7 September 2001. Lecture notes in computer science, vol 2172. Springer, Berlin Heidelberg New York, pp 433-448Google Scholar
  50. 50.
    Moulton A, Madnick SE, Siegel M (1998) Context mediation on Wall Street. In: Proceedings of the 3rd IFCIS international conference on cooperative information systems (CoopIS’98), New York, August 1998. IEEE-CS Press, New York, pp 271-279Google Scholar
  51. 51.
    Nadler M, Smith E (1993) Pattern recognition engineering. Wiley, New YorkGoogle Scholar
  52. 52.
    Nestorov S, Abiteboul S, Motwani R (1998) Extracting schema from semistructured data. In: Haas LM, Tiwary A (eds) Proceedings of the ACM-SIGMOD conference on management of data (SIGMOD), Seattle, June 1998. ACM Press, New York, pp 295-306Google Scholar
  53. 53.
    Omelayenko B (2002) RDFT: a mapping meta-ontology for business integration. In: Proceedings of the workshop on knowledge transformation for the semantic Web (KTSW 2002) at the 15th European conference on artificial intelligence, Lyon, France, July 2002, pp 76-83Google Scholar
  54. 54.
    Ouksel AM, Naiman CF (1994) Coordinating context building in heterogeneous information systems. J Intell Inf Sys 3(2):151-183Google Scholar
  55. 55.
    Palopoli L, Terracina LG, Ursino D (2000) The system DIKE: towards the semi-automatic synthesis of cooperative information systems and data warehouses. In: Proceedings of current issues in databases and information systems, East European conference on advances in databases and information systems. Held jointly with the international conference on database systems for advanced applications (ADBIS-DASFAA 2000), Prague, Czech Republic, 5-8 September 2000, pp 108-117Google Scholar
  56. 56.
    Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. J Very Large Data Bases 10(4):334-350Google Scholar
  57. 57.
    Schalkoff R (1992) Pattern recognition: statistical, structural, and neural approaches. Wiley, New YorkGoogle Scholar
  58. 58.
    Schuyler PL, Hole WT, Tuttle MS (1993) The UMLS (Unified Medical Language System) metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81:217-222Google Scholar
  59. 59.
    Sheth A, Larson J (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput Surv 22(3):183-236CrossRefGoogle Scholar
  60. 60.
    Sheth A, Rusinkiewicz M (1993) On transactional workflows. Data Eng Bull 16(2):37-40Google Scholar
  61. 61.
    Soergel D (1985) Organizing information: principles of data base and retrieval systems. Academic, OrlandoGoogle Scholar
  62. 62.
    Spyns P, Meersman R, Jarrar M (2002) Data modelling versus ontology engineering. ACM SIGMOD Rec 31(4):12-17Google Scholar
  63. 63.
    Valtchev P, Euzenat J (1997) Dissimilarity measure for collections of objects and values. In: Liu X, Cohen PR, Berthold MR (eds) Proceedings of the 2nd international symposium on advances in intelligent data analysis, reasoning about data (IDA-97), London, 4-6 August 1997. Lecture notes in computer science, vol 1280. Springer, Berlin Heidelberg New York, pp 259-272Google Scholar
  64. 64.
    Van Harmelen F, Fensel D (1999) Practical knowledge representation for the web. In: Proceedings of the IJCAI-99 workshop on intelligent information integration, in conjunction with the 16th international joint conference on artificial intelligence, Stockholm, Sweden, 31 July 1999. Proceedings of the CEUR workshop, Stockholm, Sweden, 31 July 1999, vol 23Google Scholar
  65. 65.
    Vickery BC (1966) Faceted classification schemes. Graduate School of Library Service, Rutgers State University, New Brunswick, NJGoogle Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2005

Authors and Affiliations

  • Avigdor Gal
    • 1
  • Ateret Anaby-Tavor
    • 1
  • Alberto Trombetta
    • 2
  • Danilo Montesi
    • 3
  1. 1.Technion - Israel Institute of TechnologyHaifaIsrael
  2. 2.University of InsubriaVareseItaly
  3. 3.University of CamerinoCamerinoItaly

Personalised recommendations