Skip to main content

Basic Similarity Measures

  • Chapter
Ontology Matching

Abstract

The goal of ontology matching is to find relations between entities expressed in different ontologies. Very often, these relations are equivalence relations that are discovered through the measure of similarity between these entities. However, more elaborate methods may directly find more precise relations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.dcs.shef.ac.uk/~sam/stringmetrics.html.

  2. 2.

    http://secondstring.sourceforge.net.

  3. 3.

    http://ontosim.gforge.inria.fr.

  4. 4.

    http://www.ifi.unizh.ch/ddis/simpack.html.

  5. 5.

    http://lucene.apache.org.

  6. 6.

    http://gate.ac.uk.

  7. 7.

    http://wordnet.princeton.edu.

  8. 8.

    http://www.illc.uva.nl/EuroWordNet/.

  9. 9.

    http://icame.uib.no.

  10. 10.

    http://wn-similarity.sourceforge.net.

References

  • Araújo, S., Tran, D., DeVries, A., Hidders, J., Schwabe, D.: SERIMI: class-based disambiguation for effective instance matching over heterogeneous web data. In: Proc. 15th International Workshop on the Web and Databases (WebDB) at the International Conference on Management of Data (SIGMOD), Scottsdale, AZ, USA, pp. 25–30 (2012)

    Google Scholar 

  • Atencia, M., David, J., Scharffe, F.: Keys and pseudo-keys detection for web datasets cleansing and interlinking. In: Proc. 18th International Conference on Knowledge Engineering and Knowledge Management (EKAW), Galway, Ireland. Lecture Notes in Computer Science, vol. 7603, pp. 144–153 (2012b)

    Chapter  Google Scholar 

  • Biron, P., Malhotra, A. (eds.): XML schema part 2: Datatypes. Recommendation, W3C (2004). http://www.w3.org/TR/xpath

  • Bourigault, D., Jacquemin, C.: Term extraction + term clustering: an integrated platform for computer-aided terminology. In: Proc. European Chapter of the Association for Computational Linguistics (EACL), Bergen, Norway, pp. 15–22 (1999)

    Google Scholar 

  • Brill, E.: A simple rule-based part of speech tagger. In: Proc. 3rd Conference on Applied Natural Language Processing (ANLC), Trento, Italy, pp. 152–155 (1992)

    Chapter  Google Scholar 

  • Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  MATH  Google Scholar 

  • Cerbah, F., Euzenat, J.: Traceability between models and texts through terminology. Data Knowl. Eng. 38(1), 31–43 (2001)

    Article  MATH  Google Scholar 

  • Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string metrics for matching names and records. In: Proc. International Workshop on Data Cleaning and Object Consolidation at the 9th International Conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, USA (2003b)

    Google Scholar 

  • Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M., Saggion, H., Petrak, J., Li, Y., Peters, W.: Text Processing with GATE (Version 6). University of Sheffield, Sheffield (2011)

    Google Scholar 

  • Damerau, F.: A technique for computer detection and selection of spelling errors. Commun. ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  • Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  • Ehrig, M., Sure, Y.: Ontology mapping—an integrated approach. In: Proc. 1st European Semantic Web Symposium (ESWS), Hersounisous, Greece. Lecture Notes in Computer Science, vol. 3053, pp. 76–91 (2004)

    Google Scholar 

  • Elfeky, M., Elmagarmid, A., Verykios, V.: TAILOR: a record linkage tool box. In: Proc. 18th International Conference on Data Engineering (ICDE), San Jose, CA, USA, pp. 17–28 (2002)

    Chapter  Google Scholar 

  • Euzenat, J., Valtchev, P.: Similarity-based ontology alignment in OWL-lite. In: Proc. 16th European Conference on Artificial Intelligence (ECAI), Valencia, Spain, pp. 333–337 (2004)

    Google Scholar 

  • Euzenat, J., Bach, T.L., Barrasa, J., Bouquet, P., De Bo, J., Dieng-Kuntz, R., Ehrig, M., Hauswirth, M., Jarrar, M., Lara, R., Maynard, D., Napoli, A., Stamou, G., Stuckenschmidt, H., Shvaiko, P., Tessaris, S., Van Acker, S., Zaihrayeu, I.: State of the art on ontology alignment. Deliverable D2.2.3, Knowledge web NoE (2004)

    Google Scholar 

  • Fellbaum, C.: WordNet: an Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  • Fellegi, I., Sunter, A.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)

    Article  Google Scholar 

  • Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semantic Web Inf. Syst. 7(3), 46–76 (2011b)

    Article  Google Scholar 

  • Fu, B., Brennan, R., O’Sullivan, D.: A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes. J. Web Semant. 15, 15–36 (2012)

    Article  Google Scholar 

  • Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999)

    Book  MATH  Google Scholar 

  • Giunchiglia, F., Yatskevich, M.: Element level semantic matching. In: Proc. International Workshop on Meaning Coordination and Negotiation at the 3rd International Semantic Web Conference (ISWC), Hiroshima, Japan, pp. 37–48 (2004)

    Google Scholar 

  • Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-Match: an algorithm and an implementation of semantic matching. In: Proc. 1st European Semantic Web Symposium (ESWS), Hersounisous, Greece. Lecture Notes in Computer Science, vol. 3053, pp. 61–75 (2004)

    Google Scholar 

  • Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proc. 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, Italy, pp. 382–386 (2006c)

    Google Scholar 

  • Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol. 162(3), 705–708 (1981)

    Article  Google Scholar 

  • Gracia, J.: Integration and disambiguation techniques for semantic heterogeneity reduction on the web. PhD thesis, Universidad de Zaragoza, Zaragoza, Spain (2009)

    Google Scholar 

  • Hamming, R.: Error detecting and error correcting codes. Technical Report 2. Bell Syst. Tech. J. (1950)

    Google Scholar 

  • Hausdorff, F.: Grundzüge der Mengenlehre, p. 476. Verlag Veit, Leipzig (1914)

    MATH  Google Scholar 

  • Ide, N., Véronis, J.: Word Sense Disambiguation: the state of the art. Comput. Linguist. 24(1), 1–40 (1998)

    Google Scholar 

  • Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. J. Web Semant. (2013, in press)

    Google Scholar 

  • Jaccard, P.: Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaud. Sci. Nat. 37, 241–272 (1901)

    Google Scholar 

  • Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Language Information Retrieval, pp. 25–74. Kluwer, Boston (1999)

    Chapter  Google Scholar 

  • Jaro, M.: UNIMATCH: A record linkage system: User’s manual. Technical report, U.S. Bureau of the Census, Washington, DC, USA (1976)

    Google Scholar 

  • Jaro, M.: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)

    Article  Google Scholar 

  • Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proc. 10th International Conference on Research in Computational Linguistics (ROCLING), Taipei, Taiwan, pp. 19–33 (1997)

    Google Scholar 

  • Jung, J., Håkansson, A., Hartung, R.: Indirect alignment between multilingual ontologies: a case study of Korean and Swedish ontologies. In: Proc. 3rd Symposium on Agents and Multi-agent Systems: Technologies and Applications (KES-AMSTA), Uppsala, Sweden. Lecture Notes in Computer Science, vol. 5559, pp. 233–241 (2009)

    Chapter  Google Scholar 

  • Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)

    Article  Google Scholar 

  • Kullback, S., Leibler, R.: On information and sufficiency. Ann. Math. Stat. 22(79), 498–519 (1951)

    MathSciNet  Google Scholar 

  • Larson, J., Navathe, S., Elmasri, R.: A theory of attributed equivalence in databases with application to schema integration. IEEE Trans. Softw. Eng. 15(4), 449–463 (1989)

    Article  MATH  Google Scholar 

  • Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: clustering XML schemas for effective integration. In: Proc. 11th International Conference on Information and Knowledge Management (CIKM), McLean, VA, USA, pp. 292–299 (2002)

    Google Scholar 

  • Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proc. 5th Annual International Conference on Systems Documentation (SIGDOC), Toronto, Canada, pp. 24–26 (1986)

    Chapter  Google Scholar 

  • Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad. Nauk SSSR 163(4), 845–848 (1965). In Russian. English translation in Sov. Phys. Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  • Li, W.-S., Clifton, C.: Semantic integration in heterogeneous databases using neural networks. In: Proc. 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, pp. 1–12 (1994)

    Google Scholar 

  • Lim, E.-P., Srivastava, J., Prabhakar, S., Richardson, J.: Entity identification in database integration. In: Proc. 9th International Conference on Data Engineering (ICDE), Vienna, Austria, pp. 294–301 (1993)

    Chapter  Google Scholar 

  • Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conference of Machine Learning (ICML), Madison, WI, USA, pp. 296–304 (1998)

    Google Scholar 

  • Lovins, J.B.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11(1), 22–31 (1968)

    Google Scholar 

  • Maynard, D.: Term recognition using combined knowledge sources. PhD thesis, Manchester Metropolitan University, Manchester, UK (1999)

    Google Scholar 

  • Maynard, D., Ananiadou, S.: Term extraction using a similarity-based approach. In: Bourigault, D., Jacquemin, C., Lhomme, M.-C. (eds.) Recent Advances in Computational Terminology, pp. 261–278. Benjamins, Amsterdam (2001)

    Google Scholar 

  • McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action, 2nd edn. Manning Publications, Shelter Island (2010)

    Google Scholar 

  • Miller, G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  • Monge, A., Elkan, C.: An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc. International Workshop on Data Mining and Knowledge Discovery at the 16th International Conference on Management of Data (SIGMOD), Tucson, AZ, USA (1997)

    Google Scholar 

  • Navathe, S., Buneman, P.: Integrating user views in database design. Computer 19(1), 50–62 (1986)

    Article  Google Scholar 

  • Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  • Ngomo, A.-C.N.: A time-efficient hybrid approach to link discovery. In: Proc. 6th International Workshop on Ontology Matching (OM) at the 10th International Semantic Web Conference (ISWC), Bonn, Germany, pp. 1–12 (2011)

    Google Scholar 

  • Ngomo, A.-C.N., Auer, S.: LIMES: a time-efficient approach for large-scale link discovery on the web of data. In: Proc. 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain, pp. 2312–2317 (2011)

    Google Scholar 

  • Nikolov, A., Uren, V., Motta, E., de Roeck, A.: Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: Proc. 4th Asian Semantic Web Conference (ASWC), Bangkok, Thailand. Lecture Notes in Computer Science, vol. 5367, pp. 332–346 (2009)

    Google Scholar 

  • Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet::Similarity—measuring the relatedness of concepts. In: Proc. 19th National Conference on Artificial Intelligence (AAAI), San Jose, CA, USA, pp. 1024–1025 (2004)

    Google Scholar 

  • Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  • Qu, Y., Hu, W., Chen, G.: Constructing virtual documents for ontology matching. In: Proc. 15th International World Wide Web Conference (WWW), Edinburgh, UK, pp. 23–31 (2006)

    Chapter  Google Scholar 

  • Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  • Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proc. 14th International Joint Conference on Artificial Intelligence (IJCAI), Montréal, Canada, pp. 448–453 (1995)

    Google Scholar 

  • Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)

    MATH  Google Scholar 

  • Robertson, S., Spärck Jones, K.: Relevance weighting of search terms. J. Am. Soc. Inf. Sci. 27(3), 129–146 (1976)

    Article  Google Scholar 

  • Saint-Onge, D.: Detecting and correcting malapropisms with lexical chains. Master’s thesis, University of Toronto, Toronto, Canada (1995)

    Google Scholar 

  • Salton, G.: The SMART Retrieval System: Experiments in Automatic Information Processing. Prentice Hall, Englewood Cliffs (1971)

    Google Scholar 

  • Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  • Scharffe, F., Euzenat, J.: Linked data meets ontology matching: enhancing data linking through ontology alignments. In: Proc. 3rd International Conference on Knowledge Engineering and Ontology Development (KEOD), Paris, France, pp. 279–284 (2011)

    Google Scholar 

  • Sheth, A., Larson, J., Cornelio, A., Navathe, S.: A tool for integrating conceptual schemas and user views. In: Proc. 4th International Conference on Data Engineering (ICDE), Los Angeles, CA, USA, pp. 176–183 (1988)

    Chapter  Google Scholar 

  • Smith, T., Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  • Spohr, D., Hollink, L., Cimiano, P.: A machine learning approach to multilingual and cross-lingual ontology matching. In: Proc. 10th International Semantic Web Conference (ISWC), Bonn, Germany. Lecture Notes in Computer Science, vol. 7031, pp. 665–680 (2011)

    Google Scholar 

  • Stoilos, G., Stamou, G., Kollias, S.: A string metric for ontology alignment. In: Proc. 4th International Semantic Web Conference (ISWC), Galway, Ireland. Lecture Notes in Computer Science, vol. 3729, pp. 624–637 (2005)

    Google Scholar 

  • Trojahn, C., Quaresma, P., Vieira, R.: An API for multilingual ontology matching. In: Proc. 7th Language Resources and Evaluation Conference (LREC), Valletta, Malta, pp. 3830–3835 (2010b)

    Google Scholar 

  • Tverski, A.: Features of similarity. Psychol. Rev. 84(2), 327–352 (1977)

    Article  Google Scholar 

  • Valtchev, P.: Construction automatique de taxonomies pour l’aide à la représentation de connaissances par objets. Thèse d’informatique, Université Grenoble 1, Grenoble, France (1999)

    Google Scholar 

  • Valtchev, P., Euzenat, J.: Dissimilarity measure for collections of objects and values. In: Proc. 2nd Symposium on Intelligent Data Analysis (IDA), London, UK. Lecture Notes in Computer Science, vol. 1280, pp. 259–272 (1997)

    Google Scholar 

  • Winkler, W.: The state of record linkage and current research problems. Technical Report 99/04, Statistics of Income Division, Internal Revenue Service Publication (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Euzenat, J., Shvaiko, P. (2013). Basic Similarity Measures. In: Ontology Matching. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38721-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38721-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38720-3

  • Online ISBN: 978-3-642-38721-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics