Abstract
In our previous work we have proposed two methods for evaluating semantic similarity / dissimilarity of nouns based on their modifier sets registered in Oxford Collocation Dictionary for Student of English. In this paper we provide further details on the experimental support and discussion of these methods. Given two nouns, in the first method the similarity is measured by the relative size of the intersection of the sets of modifiers applicable to both of them. In the second method, the dissimilarity is measured by the difference between the mean values of cohesion between a noun and the two sets of modifiers: its own ones and those of the other noun in question. Here, the cohesion between words is measured via Web statistics for co-occurrences of words. The two proposed measures prove to be in approximately inverse dependency. Our experiments show that Web-based weighting (the second method) gives better results.
Work done under partial support of Mexican Government (CONACyT, SNI, SIP-IPN, COTEPABE-IPN). Authors thank anonymous reviewers for valuable comments.
Chapter PDF
Similar content being viewed by others
Keywords
References
Bolshakov, I.A., Bolshakova, E.I.: Measurements of Lexico-Syntactic Cohesion by means of Internet. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005. LNCS (LNAI), vol. 3789, pp. 790–799. Springer, Heidelberg (2005)
Bolshakov, I.A., Gelbukh, A.: Two Methods of Evaluation of Semantic Similarity of Nouns Based on Their Modifier Sets. In: LNCS, vol. 4592, Springer, Heidelberg (2007)
Cilibrasi, R.L., Vitányi, P.M.B.: The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007), www.cwi.nl/~paulv/papers/tkde06.pdf
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Hirst, G., Budanitsky, A.: Correcting Real-Word Spelling Errors by Restoring Lexical Cohesion. Natural Language Engineering 11(1), 87–111 (2005)
Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)
Lin, D.: Automatic retrieval and clustering of similar words. In: COLING-ACL 1998, Canada (1998)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
McCarthy, D., Koeling, R., Weeds, J., Carroll, J.: Finding Predominant Word Senses in Untagged Text. In: Proc. 42nd Annual Meeting of the ACL, Barcelona, Spain (2004)
Oxford Collocations Dictionary for Students of English. Oxford University Press (2003)
Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bolshakov, I.A., Gelbukh, A. (2007). Distribution-Based Semantic Similarity of Nouns. In: Rueda, L., Mery, D., Kittler, J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2007. Lecture Notes in Computer Science, vol 4756. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76725-1_73
Download citation
DOI: https://doi.org/10.1007/978-3-540-76725-1_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76724-4
Online ISBN: 978-3-540-76725-1
eBook Packages: Computer ScienceComputer Science (R0)