Skip to main content

Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion

  • Conference paper
  • First Online:
Rough Sets (IJCRS 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14481))

Included in the following conference series:

  • 205 Accesses

Abstract

In Unsupervised Keyphrase Extraction (UKE) tasks, candidate phrases are ranked based on their similarity to the document embedding. However, This method assumes that every document focuses on only one topic. As a result, it can be difficult to distinguish the significance of potential keyphrases among different topics. Hence, it is necessary to discover a method for acquiring diversified topic information to obtain accurate key phrases. In this paper, we propose a new unsupervised key phrase extraction method (MSFFUKE) that utilizes multi-granularity semantic feature fusion. We first cluster phrases into different clusters through granulation, calculate the semantic similarity between phrases and each cluster, and take the mean to obtain the semantic features of topic granularity. Then, we obtain semantic features of phrase granularity based on the degree centrality of candidate phrases in the graph structure. Finally, we integrate semantic features of different granularity to sort candidate phrases. Three public benchmarks (Inspec, DUC 2001, SemEval 2010) are used to evaluate our model and compared it to the most advanced models currently available. The results demonstrate that our model performs better than most models and can generalize well when processing input documents from various domains and of different lengths. Another ablation study indicates that both topic granularity semantic features and phrase granularity semantic features are crucial for unsupervised keyphrase extraction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)

  2. Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)

  3. Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)

    Google Scholar 

  4. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80

    Chapter  Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)

    Google Scholar 

  7. Kong, A., et al.: Promptrank: unsupervised keyphrase extraction using prompt. ACL (2023)

    Google Scholar 

  8. Liang, X., Wu, S., Li, M., Li, Z.: Unsupervised keyphrase extraction by jointly modeling local and global context. arXiv preprint arXiv:2109.07293 (2021)

  9. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  10. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  11. Papagiannopoulou, E., Tsoumakas, G.: Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54(6), 888–902 (2018)

    Article  Google Scholar 

  12. Sarwar, T.B., Noor, N.M., Miah, M.S.U.: Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding. PeerJ Comput. Sci. 8, e1024 (2022)

    Article  Google Scholar 

  13. Schopf, T., Klimek, S., Matthes, F.: Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022)

  14. Song, M., Feng, Y., Jing, L.: Hyperbolic relevance matching for neural keyphrase extraction. arXiv preprint arXiv:2205.02047 (2022)

  15. Song, M., Feng, Y., Jing, L.: A survey on recent advances in keyphrase extraction from pre-trained language models. Find. Assoc. Comput. Linguist. EACL 2023, 2108–2119 (2023)

    Google Scholar 

  16. Song, M., Liu, H., Feng, Y., Jing, L.: Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. ACL Finds (2023)

    Google Scholar 

  17. Song, M., Xiao, L., Jing, L.: Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang. 81, 101502 (2023)

    Article  Google Scholar 

  18. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)

    Article  Google Scholar 

  19. Sun, Y., Qiu, H., Zheng, Y., Wang, Z., Zhang, C.: SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)

    Article  Google Scholar 

  20. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860 (2008)

    Google Scholar 

  21. Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, vol. 39, pp. 1–8 (2014)

    Google Scholar 

  22. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  23. Zhang, C., Zhao, L., Zhao, M., Zhang, Y.: Enhancing keyphrase extraction from academic articles with their reference information. Scientometrics 127(2), 703–731 (2022)

    Article  Google Scholar 

  24. Zhang, L., et al.: Mderank: a masked document embedding rank approach for unsupervised keyphrase extraction. arXiv preprint arXiv:2110.06651 (2021)

Download references

Acknowledgements

This work was supported by the Major Program of the National Natural Science Foundation of China (Grant No.61876001, 61876157), the National Social Science Foundation of China (GrantNo.18ZDA032), the Natural Science Foundation for the Higher Education Institutions of Anhui Province of China (KJ2021A0039).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shu Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, J., Hu, H., Zhao, S., Zhang, Y. (2023). Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-50959-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-50958-2

  • Online ISBN: 978-3-031-50959-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics