Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content

  • Giuseppe Pirró
  • Nuno Seco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5332)


In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library.


Semantic Similarity Intrinsic Information Content Feature Based Similarity Java WordNet Similarity Library 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Budanitsky, H.G.A.: Semantic distance in WordNet: an Experimental Application Oriented Evaluation of Five Measures. In: Proc. of NACCL 2001, pp. 29–34 (2001)Google Scholar
  2. 2.
    Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE TKDE 19(3), 370–383 (2007)Google Scholar
  3. 3.
    Danushka, B., Yutaka, M., Mitsuru, I.: Measuring Semantic Similarity between Words Using Web Search Engines. In: Proc. of WWW 2007, pp. 757–766 (2007)Google Scholar
  4. 4.
    Hai, C., Hanhua, J.: Semrex: Efficient Search in Semantic Overlay for Literature Retrieval. FGCS 24(6), 475–488 (2008)CrossRefGoogle Scholar
  5. 5.
    Hirst, G., St-Onge, D.: WordNet: An Electronic Lexical Database. In: Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.E.: Information retrieval by Semantic Similarity. Int. J. SWIS 2(3), 55–73 (2006)Google Scholar
  7. 7.
    Janowicz, K.: Semantic Similarity Blog,
  8. 8.
    Jiang, J., Conrath, D.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proc. ROCLING X (1997)Google Scholar
  9. 9.
    Lee, J., Kim, M., Lee, Y.: Information Retrieval Based on Conceptual Distance in is-a Hierarchies. Journal of Documentation 49, 188–207 (1993)CrossRefGoogle Scholar
  10. 10.
    Li, Y., Bandar, A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE TKDE 15(4), 871–882 (2003)Google Scholar
  11. 11.
    Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence Similarity based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006)Google Scholar
  12. 12.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of Conf. on Machine Learning, pp. 296–304 (1998)Google Scholar
  13. 13.
    Meilicke, C., Stuckenschmidt, H., Tamilin, A.: Repairing Ontology Mappings. In: Proc. of AAAI 2007, pp. 1408–1413 (2007)Google Scholar
  14. 14.
    Miller, G.: Wordnet an On-Line Lexical Database. International Journal of Lexicography 3(4), 235–312 (1990)CrossRefGoogle Scholar
  15. 15.
    Miller, G., Charles, W.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6, 1–28 (1991)CrossRefGoogle Scholar
  16. 16.
    Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., Chute, C.G.: Measures of Semantic Similarity and Relatedness in the Biomedical Domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)CrossRefGoogle Scholar
  17. 17.
    Pirró, G., Ruffolo, M., Talia, D.: SECCO: On Building Semantic Links in Peer to Peer Networks. Journal on Data Semantics XII (to appear, 2008) Google Scholar
  18. 18.
    Rada, R., Mili, H., Bicknell, M., Blettner, E.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics 19, 17–30 (1989)CrossRefGoogle Scholar
  19. 19.
    Ravi, S., Rada, M.: Unsupervised Graph-Based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In: Proc. of ICSC 2007 (2007)Google Scholar
  20. 20.
    Resnik, P.: Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of IJCAI 1995, pp. 448–453 (1995)Google Scholar
  21. 21.
    Rissland, E.L.: Ai and Similarity. IEEE Intelligent Systems 21, 39–49 (2006)CrossRefGoogle Scholar
  22. 22.
    Rodriguez, M., Egenhofer, M.: Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE TKDE 15(2), 442–456 (2003)Google Scholar
  23. 23.
    Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. CACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  24. 24.
    Schaeffer, B., Wallace, R.: Semantic Similarity and the Comparison of Word Meanings. J. Experiential Psychology 82, 343–346 (1969)CrossRefGoogle Scholar
  25. 25.
    Schwering, A.: Hybrid Model for Semantic Similarity Measurement. In: Proc. of ODBASE 2005, pp. 1449–1465 (2005)Google Scholar
  26. 26.
    Seco, N.: Computational Models of Similarity in Lexical Ontologies. Master’s thesis, University College Dublin (2005)Google Scholar
  27. 27.
    Seco, N., Veale, T., Hayes, J.: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In: Proc. of ECAI 2004, pp. 1089–1090 (2004)Google Scholar
  28. 28.
    Shannon, C.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (1948)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Tversky, A.: Features of similarity. Psychological Review 84(2), 327–352 (1977)CrossRefGoogle Scholar
  30. 30.
    Zavaracky, A.: Glossary-Based Semantic Similarity in the WordNet Ontology. Master’s thesis, University College Dublin (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Giuseppe Pirró
    • 1
  • Nuno Seco
    • 2
  1. 1.D.E.I.SUniversity of CalabriaItaly
  2. 2.DEI-CISUCUniversity of CoimbraPortugal

Personalised recommendations