Advertisement

A Supervised Keyphrase Extraction System Based on Graph Representation Learning

  • Corina FlorescuEmail author
  • Wei Jin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)

Abstract

Current supervised approaches for keyphrase extraction represent each candidate phrase with a set of hand-crafted features and machine learning algorithms are trained to discriminate keyphrases from non-keyphrases. Although the manually-designed features have shown to work well in practice, feature engineering is a labor-intensive process that requires expert knowledge and normally does not generalize well. To address this, we present SurfKE, an approach that represents the document as a word graph and exploits its structure in order to reveal underlying explanatory factors hidden in the data that may distinguish keyphrases from non-keyphrases. Experimental results show that SurfKE, which uses its self-discovered features in a supervised probabilistic framework, obtains remarkable improvements in performance over previous supervised and unsupervised keyphrase extraction systems.

Keywords

Keyphrase extraction Feature learning Phrase embeddings Graph representation learning 

Notes

Acknowledgments

This research is supported by the National Science Foundation award IIS-1739095.

References

  1. 1.
    Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Hamilton, H.J. (ed.) AI 2000. LNCS (LNAI), vol. 1822, pp. 40–52. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45486-1_4CrossRefGoogle Scholar
  2. 2.
    Bhaskar, P., Nongmeikapam, K., Bandyopadhyay, S.: Keyphrase extraction in scientific articles: a supervised approach. In: COLING (Demos), pp. 17–24 (2012)Google Scholar
  3. 3.
    Boudin, F.: pke: an open source python-based keyphrase extraction toolkit. In: COLING 2016, pp. 69–73 (2016)Google Scholar
  4. 4.
    Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)Google Scholar
  5. 5.
    Cao, S., Lu, W., Xu, Q.: GraRep: learning graph representations with global structural information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 891–900. ACM (2015)Google Scholar
  6. 6.
    Caragea, C., Bulgarov, F.A., Godea, A., Gollapalli, S.D.: Citation-enhanced keyphrase extraction from research papers: a supervised approach. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1435–1446 (2014)Google Scholar
  7. 7.
    Chuang, J., Manning, C.D., Heer, J.: Without the clutter of unimportant words: descriptive keyphrases for text visualization. ACM Trans. Comput. Hum. Interact. 19(3), 19 (2012)CrossRefGoogle Scholar
  8. 8.
    El-Beltagy, S.R., Rafea, A.: KP-miner: participation in semeval-2. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 190–193. Association for Computational Linguistics (2010)Google Scholar
  9. 9.
    Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1105–1115 (2017)Google Scholar
  10. 10.
    Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, pp. 668–673 (1999)Google Scholar
  11. 11.
    Gollapalli, S.D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: Proceedings of the 28th American Association for Artificial Intelligence, pp. 1629–1635 (2014)Google Scholar
  12. 12.
    Gollapalli, S.D., Li, X.L., Yang, P.: Incorporating expert knowledge into keyphrase extraction. In: American Association for Artificial Intelligence, pp. 3180–3187 (2017)Google Scholar
  13. 13.
    Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)Google Scholar
  14. 14.
    Hammouda, K.M., Matute, D.N., Kamel, M.S.: CorePhrase: keyphrase extraction for document clustering. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 265–274. Springer, Heidelberg (2005).  https://doi.org/10.1007/11510888_26CrossRefGoogle Scholar
  15. 15.
    Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 1262–1273 (2014)Google Scholar
  16. 16.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)Google Scholar
  17. 17.
    Jiang, X., Hu, Y., Li, H.: A ranking approach to keyphrase extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 756–757. ACM (2009)Google Scholar
  18. 18.
    Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval 2010, pp. 21–26 (2010)Google Scholar
  19. 19.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)Google Scholar
  20. 20.
    Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 366–376 (2010)Google Scholar
  21. 21.
    Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 257–266. ACL (2009)Google Scholar
  22. 22.
    Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 248–251. Association for Computational Linguistics (2010)Google Scholar
  23. 23.
    Martinez-Romo, J., Araujo, L., Duque Fernandez, A.: Semgraph: extracting keyphrases following a novel semantic graph-based approach. J. Assoc. Inf. Sci. Technol. 67(1), 71–82 (2016)CrossRefGoogle Scholar
  24. 24.
    Marujo, L., Ribeiro, R., de Matos, D.M., Neto, J.P., Gershman, A., Carbonell, J.: Key phrase extraction of lightly filtered broadcast news. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS (LNAI), vol. 7499, pp. 290–297. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32790-2_35CrossRefGoogle Scholar
  25. 25.
    Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 1318–1327. ACL (2009)Google Scholar
  26. 26.
    Meng, R., Zhao, S., Han, S., He, D., Brusilovsky, P., Chi, Y.: Deep keyphrase generation. arXiv preprint arXiv:1704.06879 (2017)
  27. 27.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)Google Scholar
  28. 28.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  29. 29.
    Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. In: AISTATS, vol. 5, pp. 246–252 (2005)Google Scholar
  30. 30.
    Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-77094-7_41CrossRefGoogle Scholar
  31. 31.
    Nguyen, T.D., Luong, M.T.: Wingnus: keyphrase extraction utilizing document logical structure. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 166–169. Association for Computational Linguistics (2010)Google Scholar
  32. 32.
    Over, P.: Introduction to DUC-2001: an intrinsic evaluation of generic news text summarization systems. In: Proceedings of DUC 2001 (2001)Google Scholar
  33. 33.
    Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD, pp. 701–710 (2014)Google Scholar
  34. 34.
    Pudota, N., Dattolo, A., Baruzzo, A., Ferrara, F., Tasso, C.: Automatic keyphrase extraction and ontology mining for content-based tag recommendation. Int. J. Intell. Syst. 25(12), 1158–1186 (2010)CrossRefGoogle Scholar
  35. 35.
    Verberne, S., Sappelli, M., Hiemstra, D., Kraaij, W.: Evaluation and analysis of term scoring methods for term extraction. Inform. Retrieval J. 19(5), 510–545 (2016)CrossRefGoogle Scholar
  36. 36.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 2008 American Association for Artificial Intelligence, pp. 855–860 (2008)Google Scholar
  37. 37.
    Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)Google Scholar
  38. 38.
    Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, p. 39 (2014)Google Scholar
  39. 39.
    Yih, W.t., Goodman, J., Carvalho, V.R.: Finding advertising keywords on web pages. In: WWW 2006, pp. 213–222 (2006)Google Scholar
  40. 40.
    Zhang, D., Yin, J., Zhu, X., Zhang, C.: Network representation learning: a survey. arXiv preprint arXiv:1801.05852 (2017)
  41. 41.
    Zhang, Y., Milios, E., Zincir-Heywood, N.: A comparative study on key phrase extraction methods in automatic web site summarization. J. Digit. Inf. Manage. 5(5), 323 (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science and EngineeringUniversity of North TexasDentonUSA

Personalised recommendations