Basic Similarity Measures

  • Jérôme Euzenat
  • Pavel Shvaiko

Abstract

The goal of ontology matching is to find relations between entities expressed in different ontologies. Very often, these relations are equivalence relations that are discovered through the measure of similarity between these entities. However, more elaborate methods may directly find more precise relations.

References

  1. Araújo, S., Tran, D., DeVries, A., Hidders, J., Schwabe, D.: SERIMI: class-based disambiguation for effective instance matching over heterogeneous web data. In: Proc. 15th International Workshop on the Web and Databases (WebDB) at the International Conference on Management of Data (SIGMOD), Scottsdale, AZ, USA, pp. 25–30 (2012) Google Scholar
  2. Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: Proc. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW), Galway, Ireland. Lecture Notes in Computer Science, vol. 7603, pp. 144–153 (2012b) CrossRefGoogle Scholar
  3. Biron, P., Malhotra, A. (eds.): XML schema part 2: Datatypes. Recommendation, W3C (2004). http://www.w3.org/TR/xpath
  4. Bourigault, D., Jacquemin, C.: Term extraction + term clustering: an integrated platform for computer-aided terminology. In: Proc. European Chapter of the Association for Computational Linguistics (EACL), Bergen, Norway, pp. 15–22 (1999) Google Scholar
  5. Brill, E.: A simple rule-based part of speech tagger. In: Proc. 3rd Conference on Applied Natural Language Processing (ANLC), Trento, Italy, pp. 152–155 (1992) CrossRefGoogle Scholar
  6. Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006) CrossRefMATHGoogle Scholar
  7. Cerbah, F., Euzenat, J.: Traceability between models and texts through terminology. Data Knowl. Eng. 38(1), 31–43 (2001) CrossRefMATHGoogle Scholar
  8. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: Proc. International Workshop on Data Cleaning and Object Consolidation at the 9th International Conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, USA (2003b) Google Scholar
  9. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M., Saggion, H., Petrak, J., Li, Y., Peters, W.: Text Processing with GATE (Version 6). University of Sheffield, Sheffield (2011) Google Scholar
  10. Damerau, F.: A technique for computer detection and selection of spelling errors. Commun. ACM 7(3), 171–176 (1964) CrossRefGoogle Scholar
  11. Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990) CrossRefGoogle Scholar
  12. Ehrig, M., Sure, Y.: Ontology mapping—an integrated approach. In: Proc. 1st European Semantic Web Symposium (ESWS), Hersounisous, Greece. Lecture Notes in Computer Science, vol. 3053, pp. 76–91 (2004) Google Scholar
  13. Elfeky, M., Elmagarmid, A., Verykios, V.: TAILOR: a record linkage tool box. In: Proc. 18th International Conference on Data Engineering (ICDE), San Jose, CA, USA, pp. 17–28 (2002) CrossRefGoogle Scholar
  14. Euzenat, J., Valtchev, P.: Similarity-based ontology alignment in OWL-lite. In: Proc. 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, pp. 333–337 (2004) Google Scholar
  15. Euzenat, J., Bach, T.L., Barrasa, J., Bouquet, P., De Bo, J., Dieng-Kuntz, R., Ehrig, M., Hauswirth, M., Jarrar, M., Lara, R., Maynard, D., Napoli, A., Stamou, G., Stuckenschmidt, H., Shvaiko, P., Tessaris, S., Van Acker, S., Zaihrayeu, I.: State of the art on ontology alignment. Deliverable D2.2.3, Knowledge web NoE (2004) Google Scholar
  16. Fellbaum, C.: WordNet: an Electronic Lexical Database. MIT Press, Cambridge (1998) MATHGoogle Scholar
  17. Fellegi, I., Sunter, A.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969) CrossRefGoogle Scholar
  18. Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011b) CrossRefGoogle Scholar
  19. Fu, B., Brennan, R., O’Sullivan, D.: A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes. J. Web Semant. 15, 15–36 (2012) CrossRefGoogle Scholar
  20. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999) CrossRefMATHGoogle Scholar
  21. Giunchiglia, F., Yatskevich, M.: Element level semantic matching. In: Proc. International Workshop on Meaning Coordination and Negotiation at the 3rd International Semantic Web Conference (ISWC), Hiroshima, Japan, pp. 37–48 (2004) Google Scholar
  22. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-Match: an algorithm and an implementation of semantic matching. In: Proc. 1st European Semantic Web Symposium (ESWS), Hersounisous, Greece. Lecture Notes in Computer Science, vol. 3053, pp. 61–75 (2004) Google Scholar
  23. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proc. 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, Italy, pp. 382–386 (2006c) Google Scholar
  24. Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1981) CrossRefGoogle Scholar
  25. Gracia, J.: Integration and disambiguation techniques for semantic heterogeneity reduction on the web. PhD thesis, Universidad de Zaragoza, Zaragoza, Spain (2009) Google Scholar
  26. Hamming, R.: Error detecting and error correcting codes. Technical Report 2. Bell Syst. Tech. J. (1950) Google Scholar
  27. Hausdorff, F.: Grundzüge der Mengenlehre, p. 476. Verlag Veit, Leipzig (1914) MATHGoogle Scholar
  28. Ide, N., Véronis, J.: Word Sense Disambiguation: the state of the art. Comput. Linguist. 24(1), 1–40 (1998) Google Scholar
  29. Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. J. Web Semant. (2013, in press) Google Scholar
  30. Jaccard, P.: Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaud. Sci. Nat. 37, 241–272 (1901) Google Scholar
  31. Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Language Information Retrieval, pp. 25–74. Kluwer, Boston (1999) CrossRefGoogle Scholar
  32. Jaro, M.: UNIMATCH: A record linkage system: User’s manual. Technical report, U.S. Bureau of the Census, Washington, DC, USA (1976) Google Scholar
  33. Jaro, M.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989) CrossRefGoogle Scholar
  34. Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. 10th International Conference on Research in Computational Linguistics (ROCLING), Taipei, Taiwan, pp. 19–33 (1997) Google Scholar
  35. Jung, J., Håkansson, A., Hartung, R.: Indirect alignment between multilingual ontologies: a case study of Korean and Swedish ontologies. In: Proc. 3rd Symposium on Agents and Multi-agent Systems: Technologies and Applications (KES-AMSTA), Uppsala, Sweden. Lecture Notes in Computer Science, vol. 5559, pp. 233–241 (2009) CrossRefGoogle Scholar
  36. Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010) CrossRefGoogle Scholar
  37. Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(79), 498–519 (1951) MathSciNetGoogle Scholar
  38. Larson, J., Navathe, S., Elmasri, R.: A theory of attributed equivalence in databases with application to schema integration. IEEE Trans. Softw. Eng. 15(4), 449–463 (1989) CrossRefMATHGoogle Scholar
  39. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proc. 11th International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA, pp. 292–299 (2002) Google Scholar
  40. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proc. 5th Annual International Conference on Systems Documentation (SIGDOC), Toronto, Canada, pp. 24–26 (1986) CrossRefGoogle Scholar
  41. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 163(4), 845–848 (1965). In Russian. English translation in Sov. Phys. Dokl. 10(8), 707–710 (1966) MathSciNetGoogle Scholar
  42. Li, W.-S., Clifton, C.: Semantic integration in heterogeneous databases using neural networks. In: Proc. 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 1–12 (1994) Google Scholar
  43. Lim, E.-P., Srivastava, J., Prabhakar, S., Richardson, J.: Entity identification in database integration. In: Proc. 9th International Conference on Data Engineering (ICDE), Vienna, Austria, pp. 294–301 (1993) CrossRefGoogle Scholar
  44. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conference of Machine Learning (ICML), Madison, WI, USA, pp. 296–304 (1998) Google Scholar
  45. Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1), 22–31 (1968) Google Scholar
  46. Maynard, D.: Term recognition using combined knowledge sources. PhD thesis, Manchester Metropolitan University, Manchester, UK (1999) Google Scholar
  47. Maynard, D., Ananiadou, S.: Term extraction using a similarity-based approach. In: Bourigault, D., Jacquemin, C., Lhomme, M.-C. (eds.) Recent Advances in Computational Terminology, pp. 261–278. Benjamins, Amsterdam (2001) Google Scholar
  48. McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action, 2nd edn. Manning Publications, Shelter Island (2010) Google Scholar
  49. Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995) CrossRefGoogle Scholar
  50. Monge, A., Elkan, C.: An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc. International Workshop on Data Mining and Knowledge Discovery at the 16th International Conference on Management of Data (SIGMOD), Tucson, AZ, USA (1997) Google Scholar
  51. Navathe, S., Buneman, P.: Integrating user views in database design. Computer 19(1), 50–62 (1986) CrossRefGoogle Scholar
  52. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970) CrossRefGoogle Scholar
  53. Ngomo, A.-C.N.: A time-efficient hybrid approach to link discovery. In: Proc. 6th International Workshop on Ontology Matching (OM) at the 10th International Semantic Web Conference (ISWC), Bonn, Germany, pp. 1–12 (2011) Google Scholar
  54. Ngomo, A.-C.N., Auer, S.: LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proc. 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, pp. 2312–2317 (2011) Google Scholar
  55. Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: Proc. 4th Asian Semantic Web Conference (ASWC), Bangkok, Thailand. Lecture Notes in Computer Science, vol. 5367, pp. 332–346 (2009) Google Scholar
  56. Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet::Similarity—measuring the relatedness of concepts. In: Proc. 19th National Conference on Artificial Intelligence (AAAI), San Jose, CA, USA, pp. 1024–1025 (2004) Google Scholar
  57. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980) CrossRefGoogle Scholar
  58. Qu, Y., Hu, W., Chen, G.: Constructing virtual documents for ontology matching. In: Proc. 15th International World Wide Web Conference (WWW), Edinburgh, UK, pp. 23–31 (2006) CrossRefGoogle Scholar
  59. Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001) CrossRefMATHGoogle Scholar
  60. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proc. 14th International Joint Conference on Artificial Intelligence (IJCAI), Montréal, Canada, pp. 448–453 (1995) Google Scholar
  61. Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999) MATHGoogle Scholar
  62. Robertson, S., Spärck Jones, K.: Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27(3), 129–146 (1976) CrossRefGoogle Scholar
  63. Saint-Onge, D.: Detecting and correcting malapropisms with lexical chains. Master’s thesis, University of Toronto, Toronto, Canada (1995) Google Scholar
  64. Salton, G.: The SMART Retrieval System: Experiments in Automatic Information Processing. Prentice Hall, Englewood Cliffs (1971) Google Scholar
  65. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983) MATHGoogle Scholar
  66. Scharffe, F., Euzenat, J.: Linked data meets ontology matching: enhancing data linking through ontology alignments. In: Proc. 3rd International Conference on Knowledge Engineering and Ontology Development (KEOD), Paris, France, pp. 279–284 (2011) Google Scholar
  67. Sheth, A., Larson, J., Cornelio, A., Navathe, S.: A tool for integrating conceptual schemas and user views. In: Proc. 4th International Conference on Data Engineering (ICDE), Los Angeles, CA, USA, pp. 176–183 (1988) CrossRefGoogle Scholar
  68. Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981) CrossRefGoogle Scholar
  69. Spohr, D., Hollink, L., Cimiano, P.: A machine learning approach to multilingual and cross-lingual ontology matching. In: Proc. 10th International Semantic Web Conference (ISWC), Bonn, Germany. Lecture Notes in Computer Science, vol. 7031, pp. 665–680 (2011) Google Scholar
  70. Stoilos, G., Stamou, G., Kollias, S.: A string metric for ontology alignment. In: Proc. 4th International Semantic Web Conference (ISWC), Galway, Ireland. Lecture Notes in Computer Science, vol. 3729, pp. 624–637 (2005) Google Scholar
  71. Trojahn, C., Quaresma, P., Vieira, R.: An API for multilingual ontology matching. In: Proc. 7th Language Resources and Evaluation Conference (LREC), Valletta, Malta, pp. 3830–3835 (2010b) Google Scholar
  72. Tverski, A.: Features of similarity. Psychol. Rev. 84(2), 327–352 (1977) CrossRefGoogle Scholar
  73. Valtchev, P.: Construction automatique de taxonomies pour l’aide à la représentation de connaissances par objets. Thèse d’informatique, Université Grenoble 1, Grenoble, France (1999) Google Scholar
  74. Valtchev, P., Euzenat, J.: Dissimilarity measure for collections of objects and values. In: Proc. 2nd Symposium on Intelligent Data Analysis (IDA), London, UK. Lecture Notes in Computer Science, vol. 1280, pp. 259–272 (1997) Google Scholar
  75. Winkler, W.: The state of record linkage and current research problems. Technical Report 99/04, Statistics of Income Division, Internal Revenue Service Publication (1999) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jérôme Euzenat
    • 1
  • Pavel Shvaiko
    • 2
  1. 1.INRIA and LIGGrenobleFrance
  2. 2.Informatica Trentina SpA, while at Department of Engineering and Computer Science (DISI), University of Trento, while at Web of Data, Bruno Kessler Foundation - IRSTTrentoItaly

Personalised recommendations