Partial Measure of Semantic Relatedness Based on the Local Feature Selection

  • Maciej Piasecki
  • Michał Wendelberger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)


A corpus-based Measure of Semantic Relatedness can be calculated for every pair of words occurring in the corpus, but it can produce erroneous results for many word pairs due to accidental associations derived on the basis of several context features. We propose a novel idea of a partial measure that assigns relatedness values only to word pairs well enough supported by corpus data. Three simple implementations of this idea are presented and evaluated on large corpora and wordnets for two languages. Partial Measures of Semantic Relatedness are shown to perform better in tasks focused on wordnet development than a state-of-the-art ‘full’ Measure of Semantic Relatedness. A comparison of the partial measure with a globally filtered measure is also presented.


Word Pair Semantic Relatedness Computational Linguistics Partial Measure Semantic Classis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baroni, M., Lenci, A.: Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36(4), 637–721 (2010)CrossRefGoogle Scholar
  2. 2.
    Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behav. Res. Methods 44(3), 890–907 (2012)CrossRefGoogle Scholar
  3. 3.
    Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. The MIT Press (1998)Google Scholar
  4. 4.
    Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New experiments in distributional representations of synonymy. In: Proc. of the 9th Conf. on Computational Natural Language Learning, pp. 25–32. ACL, Ann Arbor (2005)CrossRefGoogle Scholar
  5. 5.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of the International Conference on Research in Computational Linguistics (ROCLING X), Taiwan (1997)Google Scholar
  6. 6.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The Latent Semantic Analysis theory of acquisition. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  7. 7.
    Lin, D.: Principle-based parsing without overgeneration. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics (1993)Google Scholar
  8. 8.
    Lin, D.: Using syntactic dependency as local context to resolve word sense ambiguity. In: Proc. of the 35th ACL and 8th EACL, pp. 64–71. ACL, Madrid (1997)Google Scholar
  9. 9.
    Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of the 35th ACL and 17th Inter. Conf. on Computational Linguistics, pp. 768–774. ACL (1998)Google Scholar
  10. 10.
    Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S.: Beyond the transfer-and-merge wordnet construction: plWordNet and a comparison with WordNet. In: Proc. of the Inter. Conf. Recent Advances in Natural Language Processing, RANLP 2013. INCOMA Ltd. and ACL, Hissar, Bulgaria (2013)Google Scholar
  11. 11.
    Navigli, R., Velardi, P., Faralli, S.: A graph-based algorithm for inducing lexical taxonomies from scratch. In: Proceedings of IJCAI (2011)Google Scholar
  12. 12.
    Piasecki, M., Szpakowicz, S., Broda, B.: Extended similarity test for the evaluation of semantic similarity functions. In: Vetulani, Z. (ed.) Proce. of the 3rd Language and Technology Conference, Poznań, pp. 104–108 (2007)Google Scholar
  13. 13.
    Piasecki, M., Szpakowicz, S., Broda, B.: A Wordnet from the Ground Up. Oficyna Wydawnicza Politechniki Wrocławskiej (2009),
  14. 14.
    Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous evidence. In: Proc. of the Joint Conf. of the International Committee on Computational Linguistics and ACL, pp. 801–808 (2006)Google Scholar
  15. 15.
    Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)CrossRefzbMATHGoogle Scholar
  16. 16.
    Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances, pp. 16–24. Association for Computational Linguistics, Sydney (2006)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Maciej Piasecki
    • 1
  • Michał Wendelberger
    • 1
  1. 1.Institute of InformaticsWrocław University of TechnologyPoland

Personalised recommendations