Abstract
In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Budanitsky, H.G.A.: Semantic distance in WordNet: an Experimental Application Oriented Evaluation of Five Measures. In: Proc. of NACCL 2001, pp. 29–34 (2001)
Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE TKDE 19(3), 370–383 (2007)
Danushka, B., Yutaka, M., Mitsuru, I.: Measuring Semantic Similarity between Words Using Web Search Engines. In: Proc. of WWW 2007, pp. 757–766 (2007)
Hai, C., Hanhua, J.: Semrex: Efficient Search in Semantic Overlay for Literature Retrieval. FGCS 24(6), 475–488 (2008)
Hirst, G., St-Onge, D.: WordNet: An Electronic Lexical Database. In: Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. MIT Press, Cambridge (1998)
Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.E.: Information retrieval by Semantic Similarity. Int. J. SWIS 2(3), 55–73 (2006)
Janowicz, K.: Semantic Similarity Blog, http://www.similarity-blog.de/
Jiang, J., Conrath, D.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proc. ROCLING X (1997)
Lee, J., Kim, M., Lee, Y.: Information Retrieval Based on Conceptual Distance in is-a Hierarchies. Journal of Documentation 49, 188–207 (1993)
Li, Y., Bandar, A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE TKDE 15(4), 871–882 (2003)
Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence Similarity based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006)
Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of Conf. on Machine Learning, pp. 296–304 (1998)
Meilicke, C., Stuckenschmidt, H., Tamilin, A.: Repairing Ontology Mappings. In: Proc. of AAAI 2007, pp. 1408–1413 (2007)
Miller, G.: Wordnet an On-Line Lexical Database. International Journal of Lexicography 3(4), 235–312 (1990)
Miller, G., Charles, W.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6, 1–28 (1991)
Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., Chute, C.G.: Measures of Semantic Similarity and Relatedness in the Biomedical Domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)
Pirró, G., Ruffolo, M., Talia, D.: SECCO: On Building Semantic Links in Peer to Peer Networks. Journal on Data Semantics XII (to appear, 2008)
Rada, R., Mili, H., Bicknell, M., Blettner, E.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics 19, 17–30 (1989)
Ravi, S., Rada, M.: Unsupervised Graph-Based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In: Proc. of ICSC 2007 (2007)
Resnik, P.: Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of IJCAI 1995, pp. 448–453 (1995)
Rissland, E.L.: Ai and Similarity. IEEE Intelligent Systems 21, 39–49 (2006)
Rodriguez, M., Egenhofer, M.: Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE TKDE 15(2), 442–456 (2003)
Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. CACM 8(10), 627–633 (1965)
Schaeffer, B., Wallace, R.: Semantic Similarity and the Comparison of Word Meanings. J. Experiential Psychology 82, 343–346 (1969)
Schwering, A.: Hybrid Model for Semantic Similarity Measurement. In: Proc. of ODBASE 2005, pp. 1449–1465 (2005)
Seco, N.: Computational Models of Similarity in Lexical Ontologies. Master’s thesis, University College Dublin (2005)
Seco, N., Veale, T., Hayes, J.: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In: Proc. of ECAI 2004, pp. 1089–1090 (2004)
Shannon, C.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (1948)
Tversky, A.: Features of similarity. Psychological Review 84(2), 327–352 (1977)
Zavaracky, A.: Glossary-Based Semantic Similarity in the WordNet Ontology. Master’s thesis, University College Dublin (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pirró, G., Seco, N. (2008). Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88873-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-88873-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88872-7
Online ISBN: 978-3-540-88873-4
eBook Packages: Computer ScienceComputer Science (R0)