Keyphrase Extraction in Scientific Publications

Nguyen, Thuy Dung; Kan, Min-Yen

doi:10.1007/978-3-540-77094-7_41

Thuy Dung Nguyen¹ &
Min-Yen Kan¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

International Conference on Asian Digital Libraries

2142 Accesses
122 Citations

Abstract

We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect to logical sections found in scientific discourse. We also introduce features that capture salient morphological phenomena found in scientific keyphrases, such as whether a candidate keyphrase is an acronyms or uses specific terminologically productive suffixes. We have implemented these features on top of a baseline feature set used by Kea [1]. In our evaluation using a corpus of 120 scientific publications multiply annotated for keyphrases, our system significantly outperformed Kea at the p < .05 level. As we know of no other existing multiply annotated keyphrase document collections, we have also made our evaluation corpus publicly available. We hope that this contribution will spur future comparative research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Frank, E., Paynter, G.W., Witten, H.I., Gutwin, C., Nevill-Manning, C.G.: Domain specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 668–673 (1999)
Google Scholar
Kim, W., Wilbur, W.J.: Corpus-based statistical screening for content-bearing terms. J. Am. Soc. Inf. Sci. Technol. 52, 247–259 (2001)
Article Google Scholar
Tomokiyo, T., Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of ACL Workshop on Multiword Expressions (2003)
Google Scholar
Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Proc. of the 13th Biennial Conf. of the Canadian Society on Computational Studies of Intelligence, pp. 40–52. Springer, Heidelberg (2000)
Google Scholar
Turney, P.D.: Learning to extract keyphrases from text. Technical Report ERB-1057, National Research Council, Institute for Information Technology (1999)
Google Scholar
Turney, P.D.: Coherent keyphrase extraction via web mining. In: IJCAI 2003. Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 434–439 (2003)
Google Scholar
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., Mcnamara, D., Dennis, S., Kintsch, W. (eds.) Latent Semantic Analysis: A Road to Meaning, Laurence Erlbaum, Mahwah (2005)
Google Scholar
Dumais, S.T., Platt, J., Hecherman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM. Proc. of 7th International Conference on Information and Knowledge Management, pp. 148–155 (1998)
Google Scholar
Pouliquen, B., Steinberger, R., Ignat, C.: Automatic annotation of multilingual text collections with a conceptual thesaurus. In: BUG (2003)
Google Scholar
Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, pp. 296–297. ACM Press, New York (2006)
Chapter Google Scholar
Ratnaparkhi, A.: A maximum entropy part of speech tagger. In: Proc. ACL-SIGDAT Conference on Empirical Methods in Natural Language Processing, Philadelphia (1996)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Nguyen, T.D.: Automatic keyphrase generation. Technical report, National University of Singapore (2007)
Google Scholar
Jones, S., Paynter, G.W.: Human evaluation of Kea, an automatic keyphrasing system. In: ACM/IEEE Joint Conference on Digital Libraries, pp. 148–156 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Thuy Dung Nguyen & Min-Yen Kan

Authors

Thuy Dung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Min-Yen Kan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T.D., Kan, MY. (2007). Keyphrase Extraction in Scientific Publications. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-540-77094-7_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77093-0
Online ISBN: 978-3-540-77094-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics