Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion

Chen, Jie; Hu, Hainan; Zhao, Shu; Zhang, Yanping

doi:10.1007/978-3-031-50959-9_21

Jie Chen^13,14,15,
Hainan Hu^13,14,15,
Shu Zhao^13,14,15 &
…
Yanping Zhang^13,14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14481))

Included in the following conference series:

International Joint Conference on Rough Sets

205 Accesses

Abstract

In Unsupervised Keyphrase Extraction (UKE) tasks, candidate phrases are ranked based on their similarity to the document embedding. However, This method assumes that every document focuses on only one topic. As a result, it can be difficult to distinguish the significance of potential keyphrases among different topics. Hence, it is necessary to discover a method for acquiring diversified topic information to obtain accurate key phrases. In this paper, we propose a new unsupervised key phrase extraction method (MSFFUKE) that utilizes multi-granularity semantic feature fusion. We first cluster phrases into different clusters through granulation, calculate the semantic similarity between phrases and each cluster, and take the mean to obtain the semantic features of topic granularity. Then, we obtain semantic features of phrase granularity based on the degree centrality of candidate phrases in the graph structure. Finally, we integrate semantic features of different granularity to sort candidate phrases. Three public benchmarks (Inspec, DUC 2001, SemEval 2010) are used to evaluate our model and compared it to the most advanced models currently available. The results demonstrate that our model performs better than most models and can generalize well when processing input documents from various domains and of different lengths. Another ablation study indicates that both topic granularity semantic features and phrase granularity semantic features are crucial for unsupervised keyphrase extraction tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)
Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Chapter Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
Google Scholar
Kong, A., et al.: Promptrank: unsupervised keyphrase extraction using prompt. ACL (2023)
Google Scholar
Liang, X., Wu, S., Li, M., Li, Z.: Unsupervised keyphrase extraction by jointly modeling local and global context. arXiv preprint arXiv:2109.07293 (2021)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Papagiannopoulou, E., Tsoumakas, G.: Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54(6), 888–902 (2018)
Article Google Scholar
Sarwar, T.B., Noor, N.M., Miah, M.S.U.: Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding. PeerJ Comput. Sci. 8, e1024 (2022)
Article Google Scholar
Schopf, T., Klimek, S., Matthes, F.: Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022)
Song, M., Feng, Y., Jing, L.: Hyperbolic relevance matching for neural keyphrase extraction. arXiv preprint arXiv:2205.02047 (2022)
Song, M., Feng, Y., Jing, L.: A survey on recent advances in keyphrase extraction from pre-trained language models. Find. Assoc. Comput. Linguist. EACL 2023, 2108–2119 (2023)
Google Scholar
Song, M., Liu, H., Feng, Y., Jing, L.: Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. ACL Finds (2023)
Google Scholar
Song, M., Xiao, L., Jing, L.: Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang. 81, 101502 (2023)
Article Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Sun, Y., Qiu, H., Zheng, Y., Wang, Z., Zhang, C.: SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)
Article Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860 (2008)
Google Scholar
Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, vol. 39, pp. 1–8 (2014)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Zhang, C., Zhao, L., Zhao, M., Zhang, Y.: Enhancing keyphrase extraction from academic articles with their reference information. Scientometrics 127(2), 703–731 (2022)
Article Google Scholar
Zhang, L., et al.: Mderank: a masked document embedding rank approach for unsupervised keyphrase extraction. arXiv preprint arXiv:2110.06651 (2021)

Download references

Acknowledgements

This work was supported by the Major Program of the National Natural Science Foundation of China (Grant No.61876001, 61876157), the National Social Science Foundation of China (GrantNo.18ZDA032), the Natural Science Foundation for the Higher Education Institutions of Anhui Province of China (KJ2021A0039).

Author information

Authors and Affiliations

Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Beijing, 230601, Anhui, People’s Republic of China
Jie Chen, Hainan Hu, Shu Zhao & Yanping Zhang
School of Computer Science and Technology, Anhui University, Hefei, 230601, Anhui, People’s Republic of China
Jie Chen, Hainan Hu, Shu Zhao & Yanping Zhang
Information Materials and Intelligent Sensing Laboratory of Anhui Province, Beijing, 230601, Anhui, People’s Republic of China
Jie Chen, Hainan Hu, Shu Zhao & Yanping Zhang

Authors

Jie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hainan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Shu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yanping Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shu Zhao .

Editor information

Editors and Affiliations

IRCCS Istituto Ortopedico Galeazzi, Milano, Italy
Andrea Campagner
Ghent University, Ghent, Belgium
Oliver Urs Lenz
Chongqing University of Posts and Telecommunications, Chongqing, China
Shuyin Xia
University of Warsaw, Warsaw, Poland
Dominik Ślęzak
AGH University of Science and Technology, Kraków, Poland
Jarosław Wąs
University of Regina, Regina, SK, Canada
JingTao Yao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Hu, H., Zhao, S., Zhang, Y. (2023). Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-50959-9_21
Published: 31 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50958-2
Online ISBN: 978-3-031-50959-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion