Language Resources and Evaluation

, Volume 47, Issue 3, pp 723–742 | Cite as

Automatic keyphrase extraction from scientific articles

  • Su Nam Kim
  • Olena Medelyan
  • Min-Yen Kan
  • Timothy Baldwin
Original Paper

Abstract

This paper describes the organization and results of the automatic keyphrase extraction task held at the Workshop on Semantic Evaluation 2010 (SemEval-2010). The keyphrase extraction task was specifically geared towards scientific articles. Systems were automatically evaluated by matching their extracted keyphrases against those assigned by the authors as well as the readers to the same documents. We outline the task, present the overall ranking of the submitted systems, and discuss the improvements to the state-of-the-art in keyphrase extraction.

Keywords

Keyphrase extraction Scientific document processing SemEval-2010 Shared task 

References

  1. Barker, K., & Corrnacchia, N. (2000). Using noun phrase heads to extract document keyphrases. In Proceedings of the 13th biennial conference of the canadian society on computational studies of intelligence: Advances in artificial intelligence (pp. 40–52). Montreal, Canada.Google Scholar
  2. Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings of the ACL/EACL 1997 workshop on intelligent scalable text summarization (pp. 10–17). Madrid, Spain.Google Scholar
  3. Bernend, G., & Farkas, R. (2010). SZTERGAK: Feature engineering for keyphrase extraction. In Proceedings of the 5th international workshop on semantic evaluation (pp. 186–189). Uppsala, Sweden.Google Scholar
  4. Bordea, G., & Buitelaar P. (2010). DERIUNLP: A context based approach to automatic keyphrase extraction. In Proceedings of the 5th international workshop on semantic evaluation (pp. 146–149). Uppsala, Sweden,Google Scholar
  5. D’Avanzo, E., & Magnini, B. (2005). A keyphrase-based approach to summarization: The LAKE system. In Proceedings of the 2005 document understanding workshop (DUC 2005) (pp. 6–8). Vancouver, Canada.Google Scholar
  6. Eichler, K., & Neumann, G. (2010). DFKI KeyWE: Ranking keyphrases extracted from scientific articles. In Proceedings of the 5th international workshop on semantic evaluation (pp. 150–153). Uppsala, Sweden.Google Scholar
  7. El-Beltagy, S. R., & Rafea, A. (2010). KP-Miner: Participation in SemEval-2. In Proceedings of the 5th international workshop on semantic evaluation (pp. 190–193). Uppsala, Sweden.Google Scholar
  8. Ercan, G. (2006). Automated text summarization and keyphrase extraction. Master’s thesis, Bilkent University.Google Scholar
  9. Frank, E., Paynter, G. W., Witten, I. H., Gutwin C., & Nevill-Manning, C. G. (1999). Domain specific keyphrase extraction. In Proceedings of the 16th international joint conference on artificial intelligence (IJCAI-99) (pp. 668–673). Stockholm, Sweden.Google Scholar
  10. Frantzi, K., Ananiadou, S., & Mima, H. (2000). Automatic recognition of multi-word terms. International Journal of Digital Libraries, 3(2), 117–132.CrossRefGoogle Scholar
  11. Gong, Z., & Liu, Q. (2008). Improving keyword based web image search with visual feature distribution and term expansion. Knowledge and Information Systems, 21(1), 113–132.CrossRefGoogle Scholar
  12. Gutwin, C., Paynter, G., Witten, I., Nevill-Manning C., & Frank, E. (1999). Improving browsing in digital libraries with keyphrase indexes. Journal of Decision Support Systems, 27, 81–104.CrossRefGoogle Scholar
  13. Hammouda, K. M., Matute, D. N., & Kamel, M. S. (2005). CorePhrase: Keyphrase extraction for document clustering. In Proceedings of the 4th international conference on machine learning and data mining (MLDM 2005) (pp. 265–274). Leipzig, Germany.Google Scholar
  14. Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 216–223). Sapporo, Japan.Google Scholar
  15. Hulth, A. (2004). Combining machine learning and natural language processing for automatic keyword extraction. Ph.D. thesis, Stockholm University.Google Scholar
  16. Hulth, A., & Megyesi, B. B. (2006). A study on automatically extracted keywords in text categorization. In Proceedings of 21st international conference on computational linguistics and 44th annual meeting of the association for computational Linguistics (pp. 537–544). Sydney, Australia.Google Scholar
  17. Jarmasz, M., & Barriere, C. (2004). Keyphrase Extraction: Enhancing Lists. In Proceedings of the 2nd conference on computational linguistics in the North-East. Montreal, Canada. http://arxiv.org/abs/1204.0255.
  18. Jarvelin, K., & Kekalainen, J. (2002). Cumulated Gain-based Evaluation of IR techniques. ACM Transactions on Information Systems 20(4).Google Scholar
  19. Kim, S. N., Baldwin, T., & Kan, M.-Y. (2009). The use of topic representative words in text categorization. In Proceedings of the fourteenth Australasian document computing symposium (ADCS 2009) (pp. 75–81). Sydney, Australia.Google Scholar
  20. Kim, S. N., Baldwin, T., & Kan, M.-Y. (2010). Evaluating N-gram based evaluation metrics for automatic keyphrase extraction. In Proceedings of the 23rd international conference on computational linguistics (COLING) (pp. 572–580). Beijing, China.Google Scholar
  21. Kim, S. N., & Kan, M.-Y. (2009). Re-examining automatic keyphrase extraction approach in scientific articles. In Proceedings of the ACL/IJCNLP 2009 workshop on multiword expressions (pp. 7–16). Singapore.Google Scholar
  22. Krapivin, M., Autayeu, A., & Marchese, M. (2009). Large dataset for keyphrases extraction. Technical Report DISI-09-055, DISI, University of Trento, Italy.Google Scholar
  23. Krapivin, M., Autayeu, M., Marchese, M., Blanzieri, E., & Segata, N. (2010). Improving machine learning approaches for keyphrases extraction from scientific documents with natural language knowledge. In Proceedings of the joint JCDL/ICADL international digital libraries conference (pp. 102–111). Gold Coast, Australia.Google Scholar
  24. Lawrie, D., Croft, W. B., & Rosenberg, A. (2001). Finding topic words for hierarchical summarization. In Proceedings of SIGIR 2001 (pp. 349–357). New Orleans, USA.Google Scholar
  25. Litvak, M., & Last, M. (2008). Graph-based keyword extraction for single-document summarization. In Proceedings of the 2nd workshop on multi-source multilingual information extraction and summarization (pp. 17–24). Manchester, UK.Google Scholar
  26. Liu, F., Pennell, D., Liu, F., & Liu, Y. (2009a). Unsupervised approaches for automatic keyword extraction using meeting transcripts. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 620–628). Boulder, USA.Google Scholar
  27. Liu, Z., Li, P., Zheng, Y., & Maosong, S. (2009b). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 257–266). Singapore.Google Scholar
  28. Lopez, P., & Romary, L. (2010). HUMB: Automatic key term extraction from scientific articles in GROBID. In Proceedings of the 5th international workshop on semantic evaluation (pp. 248–251). Uppsala, Sweden.Google Scholar
  29. Matsuo, Y., & Ishizuka, M. (2004). Keyword extraction from a single document using word Co-occurrence statistical information. International Journal on Artificial Intelligence Tools, 13(1), 157–169.CrossRefGoogle Scholar
  30. Medelyan, O., Frank, E., & Witten, I. H. (2009) Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing (pp. 1318–1327). Singapore.Google Scholar
  31. Medelyan, O., & Witten, I. (2006). Thesaurus based automatic keyphrase indexing. In Proceedings of the 6th ACM/IEED-CS joint conference on Digital libraries (pp. 296–297).Google Scholar
  32. Mihalcea, R., & Faruque, E. (2004). SenseLearner: Minimally supervised word sense disambiguation for all words in open text. In Proceedings of the ACL/SIGLEX Senseval-3 Workshop (pp. 155–158). Barcelona, Spain.Google Scholar
  33. Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing Order into Texts. In Proceedings of the 2004 conference on empirical methods in natural language processing. Barcelona, Spain.Google Scholar
  34. Nguyen, T. D., & Kan, M.-Y. (2007). Key phrase extraction in scientific publications. In Proceeding of international conference on Asian digital libraries (pp. 317–326). Hanoi, Vietnam.Google Scholar
  35. Nguyen, T. D., & Luong, M.-T. (2010). WINGNUS: Keyphrase extraction utilizing document logical structure. In Proceedings of the 5th international workshop on semantic evaluation (pp. 166–169). Uppsala, Sweden.Google Scholar
  36. Ortiz, R., Pinto, D., Tovar, M., & Jiménez-Salazar, H. (2010). BUAP: An unsupervised approach to automatic keyphrase extraction from scientific articles. In Proceedings of the 5th international workshop on semantic evaluation (pp. 174–177). Uppsala, Sweden.Google Scholar
  37. Ouyang, Y., Li, W., & Zhang, R. (2010). 273. Task 5. keyphrase extraction based on core word identification and word expansion. In Proceedings of the 5th international workshop on semantic evaluation (pp. 142–145). Uppsala, Sweden.Google Scholar
  38. Park, J., Lee, J. G., & Daille, B. (2010). UNPMC: Naive approach to extract keyphrases from scientific articles. In Proceedings of the 5th international workshop on semantic evaluation (pp. 178–181). Uppsala, Sweden.Google Scholar
  39. Pasquier, C. (2010). Single document keyphrase extraction using sentence clustering and Latent Dirichlet allocation. In Proceedings of the 5th international workshop on semantic evaluation (pp. 154–157). Uppsala, Sweden.Google Scholar
  40. Paukkeri, M.-S., & Honkela, T. (2010). Likey: unsupervised language-independent keyphrase extraction. In Proceedings of the 5th international workshop on semantic evaluation (pp. 162–165). Uppsala, Sweden.Google Scholar
  41. Paukkeri, M.-S., Nieminen, I. T., Polla, M., & Honkela, T. (2008). A language-independent approach to keyphrase extraction and evaluation. In Proceedings of the 22nd international conference on computational Linguistics (pp. 83–86). Manchester, UK.Google Scholar
  42. Pianta, E., & Tonelli, S. (2010). KX: A flexible system for keyphrase extraction. In Proceedings of the 5th international workshop on semantic evaluation (pp. 170–173). Uppsala, Sweden.Google Scholar
  43. Schutz, A. T. (2008). Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods. Master’s thesis, National University of Ireland.Google Scholar
  44. Schwartz, A. S., & Hearst, M. A. (2003). A simple algorithm for identifying abbreviation definitions in biomedical text. In Proceedings of the Pacific symposium on biocomputing (Vol. 8, pp. 451–462).Google Scholar
  45. Tomokiyo, T., & Hurst, M. (2003). A language model approach to keyphrase extraction. In Proceedings of ACL workshop on multiword expressions (pp. 33–40). Sapporo, Japan.Google Scholar
  46. Treeratpituk, P., Teregowda, P., Huang, J., & Giles, C. L. (2010). SEERLAB: A system for extracting keyphrases from scholarly documents. In Proceedings of the 5th international workshop on semantic evaluation (pp. 182–185). Uppsala, Sweden.Google Scholar
  47. Turney, P. (1999). Learning to extract keyphrases from text. National Research Council, Institute for Information Technology, Technical Report ERB-1057. (NRC #41622).Google Scholar
  48. Turney, P. (2003). Coherent keyphrase extraction via Web mining. In Proceedings of the eighteenth international joint conference on artificial intelligence (pp. 434–439). Acapulco, Mexico.Google Scholar
  49. Wan, X., & Xiao, J. (2008). CollabRank: Towards a collaborative approach to single-document keyphrase extraction. In Proceedings of 22nd international conference on computational linguistics (pp. 969–976). Manchester, UK.Google Scholar
  50. Wang, C., Zhang, M., Ru, L., & Ma, S. (2008). An automatic online news topic keyphrase extraction system. In Proceedings of 2008 IEEE/WIC/ACM international conference on web intelligence (pp. 214–219). Sydney, Australia.Google Scholar
  51. Wang, L., & Li, F. (2010). SJTULTLAB: Chunk based method for keyphrase extraction. In Proceedings of the 5th international workshop on semantic evaluation (pp. 158–161). Uppsala, Sweden.Google Scholar
  52. Witten, I., Paynter, G., Frank, E., Gutwin, C., & Nevill-Manning G. (1999). KEA: Practical automatic key phrase extraction. In Proceedings of the Fourth ACM conference on digital libraries (pp. 254–255). Berkeley, USA.Google Scholar
  53. Zervanou, K. (2010). UvT: The UvT Term extraction system in the keyphrase extraction task. In Proceedings of the 5th international workshop on semantic evaluation (pp. 194–197). Uppsala, Sweden.Google Scholar
  54. Zesch, T., & Gurevych, I. (2009). Approximate matching for evaluating keyphrase extraction. In Proceedings of RANLP 2009 (Recent Advances in Natural Language Processing) (pp. 484–489). Borovets, Bulgaria.Google Scholar
  55. Zhang, Y., Zincir-Heywood, N., & Milios, E. (2004). Term based clustering and summarization of Web Page collections. In Proceedings of the 17th conference of the Canadian society for computational studies of intelligence (pp. 60–74). London, Canada.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  • Su Nam Kim
    • 1
  • Olena Medelyan
    • 2
  • Min-Yen Kan
    • 3
  • Timothy Baldwin
    • 1
  1. 1.Department of Computing and Information SystemsThe University of MelbourneMelbourneAustralia
  2. 2.PingarAucklandNew Zealand
  3. 3.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations