Skip to main content
Log in

Creation and evaluation of large keyphrase extraction collections with multiple opinions

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

While several automatic keyphrase extraction (AKE) techniques have been developed and analyzed, there is little consensus on the definition of the task and a lack of overview of the effectiveness of different techniques. Proper evaluation of keyphrase extraction requires large test collections with multiple opinions, currently not available for research. In this paper, we (i) present a set of test collections derived from various sources with multiple annotations (which we also refer to as opinions in the remained of the paper) for each document, (ii) systematically evaluate keyphrase extraction using several supervised and unsupervised AKE techniques, (iii) and experimentally analyze the effects of disagreement on AKE evaluation. Our newly created set of test collections spans different types of topical content from general news and magazines, and is annotated with multiple annotations per article by a large annotator panel. Our annotator study shows that for a given document there seems to be a large disagreement on the preferred keyphrases, suggesting the need for multiple opinions per document. A first systematic evaluation of ranking and classification of keyphrases using both unsupervised and supervised AKE techniques on the test collections shows a superior effectiveness of supervised models, even for a low annotation effort and with basic positional and frequency features, and highlights the importance of a suitable keyphrase candidate generation approach. We also study the influence of multiple opinions, training data and document length on evaluation of keyphrase extraction. Our new test collection for keyphrase extraction is one of the largest of its kind and will be made available to stimulate future work to improve reliable evaluation of new keyphrase extractors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.dnb.de/DE/Erwerbung/Inhaltserschliessung/rswk.html.

  2. For information regarding acquiring the test collections, please contact the paper’s first author.

  3. http://www.deredactie.be.

  4. http://www.sporza.be.

  5. http://www.belga.be.

  6. Our multi-label classifier is based on methods from top submissions in the “Greek Media Monitoring Multilabel Classification” (https://www.kaggle.com/c/wise-2014) and “Large Scale Hierarchical Text Classification” (https://www.kaggle.com/c/lshtc) hosted by Kaggle.

  7. https://iptc.org/standards/media-topics/.

  8. https://www.iminds.be/en/succeed-with-digital-research/go-to-market-testing/proeftuinonderzoek.

  9. POS tag definitions used here: Adj = adjective, N = nouns (including singular and plural), IN, Van = preposition or subordinating conjunction and Num = quantity expressions.

  10. Due to copyright issues, the data cannot be published publicly: researchers only can obtain the data (including annotations and candidate keyphrases) after contacting the authors and signing a non-disclosure agreement.

References

  • Augenstein, I., Das, M., Riedel, S., Vikraman, L., & McCallum, A. (2017). SemEval 2017 Task 10: ScienceIE—Extracting keyphrases and relations from scientific publications. ArXiv e-prints 1704.02853.

  • Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on Interactive Presentation Sessions, Association for Computational Linguistics, pp. 69–72.

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. JMLR, 3(4–5):993–1022, doi:10.1162/jmlr.2003.3.4-5.993, http://www.crossref.org/jmlr_DOI.html.

  • Bougouin, A., & Boudin, F. (2014). Topicrank: ordonnancement de sujets pour lextraction automatique de termes-cls. TAL, 55(1):45–69, http://www.atala.org/IMG/pdf/2._Bougoin-TAL55-1.pdf.

  • Bowman, J. (2003). Essential cataloguing. Facet Pub., https://books.google.be/books?id=C-7gAAAAMAAJ.

  • Brill, E. (1992). A simple rule-based part of speech tagger. In Proceedings of the Workshop on Speech and Natural Language, Association for Computational Linguistics, pp. 112–116.

  • Bulgarov, F. A., & Caragea, C. (2015). A comparison of supervised keyphrase extraction models. In Proceedings of the 24th International Conference on World Wide Web Companion, WWW 2015, Florence, Italy, May 18–22, 2015—Companion Volume, pp. 13–14, doi:10.1145/2740908.2742776.

  • Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:1–27:27.

    Article  Google Scholar 

  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. CoRR abs/1603.02754, http://arxiv.org/abs/1603.02754.

  • D’Avanzo, E., Magnini, B., & Vallin, A. (2004). Keyphrase extraction for summarization purposes: The LAKE system at DUC-2004. In Proceedings of the 2004 DUC.

  • Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24–27, 2008, pp. 213–220, doi:10.1145/1401890.1401920.

  • Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378.

    Article  Google Scholar 

  • Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-manning, C.G. (1999). Domain specific keyphrase extraction. In Proceedings of the 16th International Joint Conference on AI, pp. 668–673.

  • Gazendam, L., Wartena, C., Malais, V., Schreiber, G., de Jong, A., & Brugman, H. (2009). Automatic annotation suggestions for audiovisual archives: Evaluation aspects. Interdisciplinary Science Reviews, 34(2–3), 172–188. doi:10.1179/174327909X441090.

    Article  Google Scholar 

  • Gazendam, L., Wartena, C., & Brussee, R. (2010). Thesaurus based term ranking for keyword extraction. In Database and Expert Systems Applications, DEXA, International Workshops, Bilbao, Spain, August 30–September 3, 2010, pp. 9–53, doi:10.1109/DEXA.2010.31.

  • Grineva, M., Grinev, M., & Lizorkin, D. (2009). Extracting key terms from noisy and multitheme documents. WWW 2009 MADRID! Track: Semantic/Data Web / Session: Mining for Semantics, pp. 661–670, http://dl.acm.org/citation.cfm?id=1526798.

  • Hammouda, K. M., Matute, D. N., & Kamel, M. S. (2005). Corephrase: Keyphrase extraction for document clustering. In P. Perner & A. Imiya (Eds.), Machine learning and data mining in pattern recognition (pp. 265–274). Springer: Berlin.

  • Hasan, K. S., & Ng, V. (2014). Automatic keyphrase extraction: A survey of the state of the art. In Proceedings of the Association for Computational Linguistics (ACL), Baltimore, Maryland: Association for Computational Linguistics.

  • Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 Conference on Empirical Natural language Processing (2000). http://dl.acm.org/citation.cfm?id=1119383.

  • Jiang, X., Hu, Y., & Li, H. (2009). A ranking approach to keyphrase extraction. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp. 756–757.

  • Kim, S. N. & Kan, M. Y. (2009). Re-examining automatic keyphrase extraction approaches in scientific articles. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications, Association for Computational Linguistics, pp. 9–16.

  • Lievens, B., Baccarne, B., Veeckman, C., Logghe, S., & Schuurman, D. (2014). Drivers for end-users’ collaboration in participatory innovation development and living lab processes. In 17th ACM Conference on Computer Supported Cooperative Work and Social Computing.

  • Liu, Z., Huang, W., Zheng, Y., & Sun, M. (2010). Automatic keyphrase extraction via topic decomposition. In Proceedings of the 2010 Conference on EMNLP, pp. 366–376.

  • Lopez, P., & Romary, L. (2010). Humb: Automatic key term extraction from scientific articles in grobid. In Proceedings of the 5th International Workshop on Semantic Evaluation, Association for Computational Linguistics, pp. 248–251.

  • Medelyan, O., & Witten, I. (2002). Thesaurus based automatic keyphrase indexing. In Proceedings of the 6th ACM/IEED-CS Joint Conference on Digital Libraries, pp. 296–297.

  • Mihalcea, R., & Csomai, A. (2007). Wikify!: Linking documents to encyclopedic knowledge. CIKM07, November 68, 2007, Lisboa, Portugal (July). http://dl.acm.org/citation.cfm?id=1321475.

  • Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. In Proceedings of the 2004 Conference on EMNLP. http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Mihalcea.pdf.

  • Park, Y., Byrd, R.J., & Boguraev, B. (2002). Automatic glossary extraction: Beyond terminology identification. In 19th International Conference on Computational Linguistics, COLING 2002, Howard International House and Academia Sinica, Taipei, Taiwan, August 24–September 1, 2002. http://aclweb.org/anthology/C02-1142.

  • Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.

    Article  Google Scholar 

  • Sterckx, L., Demeester, T., Deleu, J., & Develder, C. (2015a). Topical word importance for fast keyphrase extraction. In Proceedings of the 24th International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee, pp. 121–122.

  • Sterckx, L., Demeester, T., Deleu, J., & Develder, C. (2015b). When topic models disagree: Keyphrase extraction with multiple topic models. In Proceedings of the 24th International Conference on World Wide Web Companion, International World Wide Web Conferences Steering Committee. pp. 123–124.

  • Sterckx, L., Caragea, C., Demeester, T., & Develder, C. (2016). Supervised keyphrase extraction as positive unlabeled learning. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, November 2–4, 2016, Austin, Texas.

  • Turney, P. (1999). Learning to extract keyphrases from text. http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=8913245.

  • Turney, P. (2000). Learning algorithms for keyphrase extraction. Information Retrieval http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=8913713.

  • Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd National Conference on Artificial Intelligence—Volume 2, AAAI 2008. pp. 855–860. http://dl.acm.org/citation.cfm?id=1620163.1620205.

  • Wartena, C., Brussee, R., & Slakhorst, W. (2010). Keyword extraction using word co-occurrence. In 2010 Workshops on Database and Expert Systems Applications, pp. 54–58. doi:10.1109/DEXA.2010.32, http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5592000.

  • Witten, I., Paynter, G., & Frank, E. (1999). KEA: Practical automatic keyphrase extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries. http://dl.acm.org/citation.cfm?id=313437.

  • Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

    Article  Google Scholar 

  • Yih, Wt., Goodman, J., & Carvalho, V. R. (2006). Finding advertising keywords on web pages. WWW, 2006, 213–222.

    Google Scholar 

  • Zhang, Y., Zincir-Heywood, N., & Milios, E. (2005). Narrative text classification for automatic key phrase extraction in web document corpora. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, ACM, New York, NY, USA, WIDM ’05, pp. 51–58. doi:10.1145/1097047.1097059.

  • Zhao, W. X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E. P., & Li, X. (2011). Topical keyphrase extraction from twitter. In Proceedings of the 49th Annual Meeting of the ACL: HLT- Volume 1, Stroudsburg, PA, USA, HLT ’11, pp. 379–388. http://dl.acm.org/citation.cfm?id=2002472.2002521.

Download references

Acknowledgements

The research presented in this article relates to STEAMER (http://www.iminds.be/en/projects/2014/07/12/steamer), a MiX-ICON project facilitated by iMinds Media and funded by IWT (now known as Flanders Innovation and Entrepreneurship) and Innoviris.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Sterckx.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sterckx, L., Demeester, T., Deleu, J. et al. Creation and evaluation of large keyphrase extraction collections with multiple opinions. Lang Resources & Evaluation 52, 503–532 (2018). https://doi.org/10.1007/s10579-017-9395-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-017-9395-6

Keywords

Navigation