Extraction of Hypernymy Information from Text

  • Erik Tjong Kim Sang
  • Katja Hofmann
  • Maarten de Rijke
Part of the Theory and Applications of Natural Language Processing book series (NLP)


This chapter presents the results of three studies in extracting hypernymy information from a text. In the first, a method based on a single extraction pattern applied to the web is compared with a set of patterns applied to a big corpus. In the second study, it is examined how relation extraction can be performed reliably from a text without having access to a word sense tagger. And in a third experiment, it is checked what the effect of elaborate syntactic information has on the extraction process. Both using more data and the removal of ambiguities from the training data is found to be beneficial for the extraction process. But it is surprising to find a positive effect of additional syntactic information.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Baayen R, Piepenbrock R, Gulikers L (1995) The CELEX Lexical Database (Release 2) [CD-ROM]. Philadelphia, PA: Linguistic Data Consortium, University of PennsylvaniaGoogle Scholar
  2. van der Beek L, Bouma G, Malouf R, van Noord G (2002) The alpino dependency treebank. In: Proceedings of CLIN 2001, Twente UniversityGoogle Scholar
  3. Caraballo SA (1999) Automatic construction of a hypernym-labeled noun hierarchy from text. In: Proceedings of ACL-99, Maryland, USAGoogle Scholar
  4. Fellbaum C (1998) WordNet – An Electronic Lexical Database. The MIT PressGoogle Scholar
  5. Genkin A, Lewis DD, Madigan D (2004) Large-Scale Bayesian Logistic Regression for Text Categorization. Technical report, Rutgers University, New JerseyGoogle Scholar
  6. Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of ACL-92, Newark, Delaware, USAGoogle Scholar
  7. Hofmann K, Tjong Kim Sang E (2007) Automatic extension of non-english wordnets. In: Proceedings of SIGIR’07, Amsterdam, The Netherlands, (poster)Google Scholar
  8. IJzereef L (2004) Automatische extractie van hyperniemrelaties uit grote tekstcorpora. MSc thesis, University of Groningen, (in Dutch)Google Scholar
  9. Jijkoun V, de Rijke M, Mur J (2004) Information extraction for question answering: Improving recall through syntactic patterns. In: Proceedings of Coling’04Google Scholar
  10. Geneva, Switzerland Li X, Roth D (2001) Exploring evidence for shallow parsing. In: Proceedings of Conference on Computational Natural Language Learning (CoNLL) 2001Google Scholar
  11. McCarthy D, Koeling R, Weeds J, Caroll J (2007) Unsupervised acquisition of predominant word senses. Computational Linguistics 33(4)Google Scholar
  12. van Noord G (2006) At last parsing is now operational. In: Mertens P, Fairon C, Dister A, Watrin P (eds) TALN06. Verbum Ex Machina. Actes de la 13e conference sur le traitement automatique des langues naturellesGoogle Scholar
  13. van Noord G (2009) Huge parsed corpora in lassy. In: Proceedings of TLT7, LOT, Groningen, The NetherlandsGoogle Scholar
  14. van der Plas L, Bouma G (2005) Automatic acquisition of lexico-semantic knowledge for qa. In: Proceedings of the IJCNLP Workshop on Ontologies and Lexical Resources, Jeju Island, KoreaGoogle Scholar
  15. Sabou M, Wroe C, Goble C, Mishne G (2005) Learning domain ontologies for web service descriptions: an experiment in bioinformatics. In: 14th International World Wide Web Conference (WWW2005), Chiba, JapanGoogle Scholar
  16. Snow R, Jurafsky D, Ng AY (2005) Learning syntactic patterns for automatic hypernym discovery. In: NIPS 2005, Vancouver, CanadaGoogle Scholar
  17. Tjong Kim Sang E (2009) To use a treebank or not – which is better for hypernym extraction. In: Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories (TLT 7), Groningen, The NetherlandsGoogle Scholar
  18. Tjong Kim Sang E, Hofmann K (2007) Automatic extraction of dutch hypernymhyponym pairs. In: Proceedings of CLIN-2006, Leuven, BelgiumGoogle Scholar
  19. Van Eynde F (2005) Part of Speech Tagging en Lemmatisering van het Corpus Gesproken Nederlands. K.U. Leuven, (in Dutch)Google Scholar
  20. Vossen P (1998) EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic PublisherGoogle Scholar
  21. Vossen P, Maks I, Segers R, van der Vliet H (2008) Integrating lexical units, synsets, and ontology in the cornetto database. In: Proceedings of LREC-2008, Marrakech, MoroccoGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Erik Tjong Kim Sang
    • 1
  • Katja Hofmann
    • 2
  • Maarten de Rijke
    • 2
  1. 1.Alfa-informaticaUniversity of GroningenGroningenThe Netherlands
  2. 2.ISLA, University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations