Classification-Based Filtering of Semantic Relatedness in Hypernymy Extraction

  • Maciej Piasecki
  • Stanisław Szpakowicz
  • Michał Marcińczuk
  • Bartosz Broda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5221)


Manual construction of a wordnet can be facilitated by a system that suggests semantic relations acquired from corpora. Such systems tend to produce many wrong suggestions. We propose a method of filtering a raw list of noun pairs potentially linked by hypernymy, and test it on Polish. The method aims for good recall and sufficient precision. The classifiers work with complex features that give clues on the relation between the nouns. We apply a corpus-based measure of semantic relatedness enhanced with a Rank Weight Function. The evaluation is based on the data in Polish WordNet. The results compare favourably with similar methods applied to English, despite the small size of Polish WordNet.


lexical-semantic relations measures of semantic relatedness wordnet construction Polish WordNet nouns hypernymy extraction supervised Machine Learning classifiers Rank Weight Function filtering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: [19], pp. 113–120Google Scholar
  2. 2.
    Hearst, M.A.: Automated Discovery of WordNet Relations. In: Fellbaum, C. (ed.) WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)Google Scholar
  3. 3.
    Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, Concepts and Relations in the Construction of Polish WordNet. In: Tanács, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proc. Global WordNet Conference, Seged, Hungary, January 22-25 2008, pp. 162–177. University of Szeged (2008)Google Scholar
  4. 4.
    Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based Semantic Relatedness for the Construction of Polish WordNet. In: Proc. 6th Language Resources and Evaluation Conference (LREC 2008) (to appear,2008)Google Scholar
  5. 5.
    Piasecki, M., Szpakowicz, S., Broda, B.: Extended Similarity Test for the Evaluation of Semantic Similarity Functions. In: Vetulani, Z. (ed.) Proc. 3rd Language and Technology Conference, Poznań, Poland, Pozna, October 5-7, 2007, pp. 104–108. Wydawnictwo Poznańskie Sp. z o.o. (2007)Google Scholar
  6. 6.
    Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 17, Cambridge, MA, pp. 1297–1304. MIT Press, Cambridge (2005)Google Scholar
  7. 7.
    Snow, R., Jurafsky, D., Ng., A.Y.: Semantic taxonomy induction from heterogenous evidence. In: [19]Google Scholar
  8. 8.
    Kennedy, A.: Analysis and Construction of Noun Hypernym Hierarchies to Enhance Roget’s Thesaurus. Master’s thesis, School of Information Technology and Engineering, University of Ottawa (2006)Google Scholar
  9. 9.
    Zhang, M., Zhang, J., Su, J.: Exploring syntactic features for relation extraction using a convolution tree kernel. In: Proc. Human Language Technology Conference of the NAACL, Main Conference, ACL, pp. 288–295 (2006)Google Scholar
  10. 10.
    Caraballo, S., Charniak, E.: Determining the specificity of nouns from text. In: Proc. Joint SIGDAT conference on empirical methods in natural language processing (EMNLP) and very large corpora (VLC), pp. 63–70 (1999)Google Scholar
  11. 11.
    Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)Google Scholar
  12. 12.
    Weeds, J., Weir, D.: Co-occurrence retrieval: A flexible framework for lexical distributional similarity. Computational Linguistics 31(4), 439–475 (2005)CrossRefzbMATHGoogle Scholar
  13. 13.
    Ryu, P.M., Choi, K.S.: Taxonomy learning using term specificity and similarity. In: Proc. 2nd Workshop on Ontology Learning and Population ACL, Sydney, pp. 41–48 (2006)Google Scholar
  14. 14.
    Weiss, D.: Korpus Rzeczpospolitej. Corpus of text from the online edtion of Rzeczypospolita (2008),
  15. 15.
    Weka: Weka 3: Data Mining Software in Java (2008),
  16. 16.
    Fellbaum, C. (ed.): WordNet – An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  17. 17.
    Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation: Algorithms and Applications. Springer, Heidelberg (2006)Google Scholar
  18. 18.
    Sojka, P., Kopeček, I., Pala, K. (eds.): Proc. Text, Speech and Dialog 2006 Conference. LNCS (LNAI). Springer, Heidelberg (2006)Google Scholar
  19. 19.
    ACL 2006, ed.: Proc. 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, The Association for Computer Linguistics (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Maciej Piasecki
    • 1
  • Stanisław Szpakowicz
    • 2
    • 3
  • Michał Marcińczuk
    • 1
  • Bartosz Broda
    • 1
  1. 1.Institute of Applied InformaticsWrocław University of TechnologyPoland
  2. 2.School of Information Technology and EngineeringUniversity of OttawaCanada
  3. 3.Institute of Computer SciencePolish Academy of SciencesCanada

Personalised recommendations