Using Noun Phrase Heads to Extract Document Keyphrases

  • Ken Barker
  • Nadia Cornacchia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1822)


Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores.

This paper describes a simple system for choosing noun phrases from a document as keyphrases. A noun phrase is chosen based on its length, its frequency and the frequency of its head noun. Noun phrases are extracted from a text using a base noun phrase skimmer and an off-the-shelf online dictionary.

Experiments involving human judges reveal several interesting results: the simple noun phrase-based system performs roughly as well as a state-of-the-art, corpus-trained keyphrase extractor; ratings for individual keyphrases do not necessarily correlate with ratings for sets of keyphrases for a document; agreement among unbiased judges on the keyphrase rating task is poor.


Noun Phrase Natural Language Processing Head Noun Candidate Phrase Human Judge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barker, Ken & Stan Szpakowicz (1998). “Semi-Automatic Recognition of Noun Modifier Relationships.” Proceedings of COLING-ACL’ 98. Montréal, 96–102.Google Scholar
  2. 2.
    Barker, Ken, Sylvain Delisle & Stan Szpakowicz (1998). “Test-driving Tanka: Evaluating a Semi-Automatic System of Text Analysis for Knowledge Acquisition.” Proceedings of the Twelfth Canadian Conference on Artificial Intelligence (LNAI 1418), Vancouver. 60–71.Google Scholar
  3. 3.
    Barker, Ken, Yllias Chali, Terry Copeck, Stan Matwin & Stan Szpakowicz (1998). “The Design of a Configurable Text Summarization System”. TR-98-04, School of Information Technology and Engineering, University of Ottawa.Google Scholar
  4. 4.
    Brill, Eric (1995). “Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging.” Computational Linguistics 21(4), December, 1995. 543–566.Google Scholar
  5. 5.
    Carletta, Jean (1996). “Assessing Agreement on Classification Tasks: The Kappa Statistic.” Computational Linguistics 22(2), June, 1996. 249–254.Google Scholar
  6. 6.
    Chali, Yllias, Stan Matwin & Stan Szpakowicz (1999) “Query-Biased Text Summarization as a Question-Answering Technique”. Proceedings of the AAAI Fall Symposium Workshop on Question-Answering Systems. Cape Cod, Massachusetts, November 1999.Google Scholar
  7. 7.
    Delannoy, Jean-François, Ken Barker, Terry Copeck, Martin Laplante, Stan Matwin & Stan Szpakowicz (1998) “Flexible Summarization”. AAAI Spring Symposium Workshop on Intelligent Text Summarization. Stanford, March, 1998.Google Scholar
  8. 8.
    Delisle, Sylvain (1994). “Text processing without A-Priori Domain Knowledge: Semi-Automatic Linguistic analysis for Incremental Knowledge Acquisition.” Ph.D. thesis, TR-94-02, Department of Computer Science, University of Ottawa.Google Scholar
  9. 9.
    Krulwich, Bruce & Chad Burkey (1996). “Learning user information interests through the extraction of semantically significant phrases.” In M. Hearst and H. Hirsh, editors, AAAI 1996 Spring Symposium on Machine Learning in Information Access. California: AAAI Press.Google Scholar
  10. 10.
    Turney, Peter D. (1999). “Learning to Extract Keyphrases from Text.” National Research Council, Institute for Information Technology, Technical Report ERB-1057.Google Scholar
  11. 11.
    Turney, Peter D. (2000). “Learning Algorithms for Keyphrase Extraction.” Information Retrieval. To appear.Google Scholar
  12. 12.
    Witten, Ian H., Gordon W. Paynter, Eibe Frank, Carl Gutwin & Craig G. Nevill-Manning (1999). “KEA: Practical Automatic Keyphrase Extraction.” Proceedings of the Fourth ACM Conference on Digital Libraries.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Ken Barker
    • 1
  • Nadia Cornacchia
    • 1
  1. 1.School of Information and Technology EngineeringUniversity of OttawaOttawaCanada

Personalised recommendations