Using Noun Phrase Heads to Extract Document Keyphrases
Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores.
This paper describes a simple system for choosing noun phrases from a document as keyphrases. A noun phrase is chosen based on its length, its frequency and the frequency of its head noun. Noun phrases are extracted from a text using a base noun phrase skimmer and an off-the-shelf online dictionary.
Experiments involving human judges reveal several interesting results: the simple noun phrase-based system performs roughly as well as a state-of-the-art, corpus-trained keyphrase extractor; ratings for individual keyphrases do not necessarily correlate with ratings for sets of keyphrases for a document; agreement among unbiased judges on the keyphrase rating task is poor.
KeywordsNoun Phrase Natural Language Processing Head Noun Candidate Phrase Human Judge
Unable to display preview. Download preview PDF.
- 1.Barker, Ken & Stan Szpakowicz (1998). “Semi-Automatic Recognition of Noun Modifier Relationships.” Proceedings of COLING-ACL’ 98. Montréal, 96–102.Google Scholar
- 2.Barker, Ken, Sylvain Delisle & Stan Szpakowicz (1998). “Test-driving Tanka: Evaluating a Semi-Automatic System of Text Analysis for Knowledge Acquisition.” Proceedings of the Twelfth Canadian Conference on Artificial Intelligence (LNAI 1418), Vancouver. 60–71.Google Scholar
- 3.Barker, Ken, Yllias Chali, Terry Copeck, Stan Matwin & Stan Szpakowicz (1998). “The Design of a Configurable Text Summarization System”. TR-98-04, School of Information Technology and Engineering, University of Ottawa.Google Scholar
- 4.Brill, Eric (1995). “Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging.” Computational Linguistics 21(4), December, 1995. 543–566.Google Scholar
- 5.Carletta, Jean (1996). “Assessing Agreement on Classification Tasks: The Kappa Statistic.” Computational Linguistics 22(2), June, 1996. 249–254.Google Scholar
- 6.Chali, Yllias, Stan Matwin & Stan Szpakowicz (1999) “Query-Biased Text Summarization as a Question-Answering Technique”. Proceedings of the AAAI Fall Symposium Workshop on Question-Answering Systems. Cape Cod, Massachusetts, November 1999.Google Scholar
- 7.Delannoy, Jean-François, Ken Barker, Terry Copeck, Martin Laplante, Stan Matwin & Stan Szpakowicz (1998) “Flexible Summarization”. AAAI Spring Symposium Workshop on Intelligent Text Summarization. Stanford, March, 1998.Google Scholar
- 8.Delisle, Sylvain (1994). “Text processing without A-Priori Domain Knowledge: Semi-Automatic Linguistic analysis for Incremental Knowledge Acquisition.” Ph.D. thesis, TR-94-02, Department of Computer Science, University of Ottawa.Google Scholar
- 9.Krulwich, Bruce & Chad Burkey (1996). “Learning user information interests through the extraction of semantically significant phrases.” In M. Hearst and H. Hirsh, editors, AAAI 1996 Spring Symposium on Machine Learning in Information Access. California: AAAI Press.Google Scholar
- 10.Turney, Peter D. (1999). “Learning to Extract Keyphrases from Text.” National Research Council, Institute for Information Technology, Technical Report ERB-1057.Google Scholar
- 11.Turney, Peter D. (2000). “Learning Algorithms for Keyphrase Extraction.” Information Retrieval. To appear.Google Scholar
- 12.Witten, Ian H., Gordon W. Paynter, Eibe Frank, Carl Gutwin & Craig G. Nevill-Manning (1999). “KEA: Practical Automatic Keyphrase Extraction.” Proceedings of the Fourth ACM Conference on Digital Libraries.Google Scholar