Two Methods of Evaluation of Semantic Similarity of Nouns Based on Their Modifier Sets

  • Igor A. Bolshakov
  • Alexander Gelbukh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4592)


Two methods of evaluation of semantic similarity/dissimilarity of English nouns are proposed based on their modifier sets taken from Oxford Collocation Dictionary for Student of English. The first method measures similarity by the portion of modifiers commonly applicable to both nouns under evaluation. The second method measures dissimilarity by the change of the mean value of cohesion between a noun and modifiers, its own or those of the contrasted noun. Cohesion between words is measured by Stable Connection Index (SCI) based of raw Web statistics for occurrences and co-occurrences of words. It is shown that the two proposed measures are approximately in inverse monotonic dependency, while the Web evaluations confer a higher resolution.


Semantic Similarity Word Sense Word Sense Disambiguation Noun Pair British National Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bolshakov, I.A., Bolshakova, E.I.: Measurements of Lexico-Syntactic Cohesion by means of Internet. In: Gelbukh, A., de Albornoz, Á., Terashima-Marín, H. (eds.) MICAI 2005: Advances in Artificial Intelligence. LNCS (LNAI), vol. 3789, pp. 790–799. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  3. 3.
    Hirst, G., Budanitsky, A.: Correcting Real-Word Spelling Errors by Restoring Lexical Cohesion. Natural Language Engineering 11(1), 87–111 (2005)CrossRefGoogle Scholar
  4. 4.
    Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)CrossRefGoogle Scholar
  5. 5.
    Ledo-Mezquita, Y., Sidorov, G.: Combinación de los métodos de Lesk original y simplificado para desambiguación de sentidos de palabras. In: International Workshop on Natural Language Understanding and Intelligent Access to Textual Information, in conjunction with MICAI-2005, Mexico, pp. 41–47 (2005)Google Scholar
  6. 6.
    Lin, D.: Automatic retrieval and clustering of similar words. COLING-ACL 98 (1998)Google Scholar
  7. 7.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  8. 8.
    McCarthy, D., Rob, K., Julie, W., John, C.: Finding PredominantWord Senses in Untagged Text. ACL-2004 (2004) Google Scholar
  9. 9.
    Oxford Collocations Dictionary for Students of English. Oxford University Press, Oxford (2003)Google Scholar
  10. 10.
    Patwardhan, S., Banerjee, S., Pedersen, T.: Using Measures of Semantic Relatedness for Word Sense Disambiguation. In: Gelbukh, A. (ed.) Computational Linguistics and Intelligent Text Processing. LNCS, vol. 2588, Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Alexander Gelbukh
    • 1
  1. 1.Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico CityMexico

Personalised recommendations