TopicLPRank: a keyphrase extraction method based on improved TopicRank

Liao, Shengbin; Yang, Zongkai; Liao, Qingzhou; zheng, Zhangxiong

doi:10.1007/s11227-022-05022-0

TopicLPRank: a keyphrase extraction method based on improved TopicRank

Published: 07 January 2023

Volume 79, pages 9073–9092, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Shengbin Liao ORCID: orcid.org/0000-0001-6431-7566¹,
Zongkai Yang²,
Qingzhou Liao³ &
…
Zhangxiong zheng¹

290 Accesses
2 Citations
Explore all metrics

Abstract

We present a keyphrase extraction algorithm named TopicLPRank in this paper, which is an improved TopicRank algorithm. Different from the TopicRank which only uses the relative distance information of the text, we think that the length and absolute position of the text candidate keyphrases also have a certain influence on the results of the model for extraction keyphrases. Therefore, the proposed TopicLPRank incorporates these two factors on the basis of the TopicRank. The experimental results show that adding the location information and length information of candidate keyphrases can, respectively, increase the F-Score of the model by around 2.7\(\%\) points and 1.7\(\%\) points, which is equivalent to an increase of 19.6 and 12.3\(\%\) compared with the TopicRank. At the same time, the fusion of the length and location information of the candidate keyphrase can increase the F-Score by around 3.5 percentage points, which is equivalent to an increase of 25.21\(\%\) compared with the TopicRank in the dataset NUS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model

Article Open access 12 October 2023

An unsupervised keyphrase extraction model by incorporating structural and semantic information

Article 26 October 2019

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Article Open access 16 August 2018

Data availability

All data included in this study are available upon request by contact with the corresponding author.

References

Dung NT, Min-Yen K (2007) Keyphrase Extraction in Scientific Publications. In: International Conference on Asian Digital Libraries (ICADL), pp 317–326
Martinez-Romo J, Araujo L, Fernandez AD (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Am Soc Inf Sci 67(1):71–82
Google Scholar
Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815
Google Scholar
Saef Ullah Miah M, Junaida S, Bin ST, Zamli Kamal Z, Rajan J (2021) Study of keyword extraction techniques for electric double layer capacitor domain using text similarity indexes: an experimental analysis. arXiv:2111.07068
Rossi F, Caloffi A, Colovic A, Russo M (2022) New business models for public innovation intermediaries supporting emerging innovation systems: the case of the Internet of Things. Technol Forecast Soc Chang 175:121357
Article Google Scholar
Krutarth K, Cornelia C, Wu J, Lee Giles C (2020) Keyphrase Extraction in Scholarly Digital Library Search Engines. In: International Conference on Web Services (ICWA), pp 179–196
Xinyun W, Hongyun N (2020) TF-IDF Keyword Extraction Method Combining Context and Semantic Classification. In: international Conference on Computer Information and Big Data Applications (CIBDA), pp 344–347
Adrien B, Florian B, Béatrice D (2013) TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp 543–551
Rada M and Paul T (2004) TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1–9
Elsheh M (2020) An investigation of keywords extraction from textual documents using Word2Vec and decision tree. Int J Comput Sci Inform Secur 18(5):13–18
Google Scholar
Eibe F, Paynter Gordon W, Witten Ian H, Gutwin Carl, Nevill-Manning Craig G (1999). Domain-Specic Keyphrase Extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp 668-673
Kuo Z, Hui X, Jie T, Zi LJ (2006) Keyword Extraction Using Support Vector Machine. In: Proceedings of the 7th International Conference on Advances in Web-Age Information Management (WAIM), pp 85-96
Wen-tau Y, Joshua G, Vitor C (2006) Finding Advertising Keywords on Web Pages. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp 213–222
Kumar BS, Sathya BK (2017) Automatic keyword extraction for text summarization: a survey. arXiv:1704.03242
Ding T, Yang W, Wei F, Ding C, Kang P, Wenxiu B (2022) Chinese keyword extraction model with distributed computing. Comput Electr Eng 97:107639
Article Google Scholar
Xiaojun W, Jianguo X (2008) Single Document Keyphrase Extraction Using Neighborhood Knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), pp 855–860
Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inf Sci 509:257–289
Article Google Scholar
Rossi RG, Marcacini RM, Rezende SO (2014) Analysis of domain independent statistical keyword extraction methods for incremental clustering. Learn Nonlinear Models 12(1):17–37
Article Google Scholar
Tian X (2013) Study on keyword extraction using word position weighted TextRank. New Technol Libr Inform Serv 29(9):30–34
Google Scholar
Hung SM, Herbert G, Arthur C, William B, Steve L (2014) Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput Speech Lang 28(1):210–223
Article Google Scholar
Kr S, Biswas MB, Shreya J (2018) A graph based keyword extraction model using collective node weight. Expert Syst Appl 97:51–59
Article Google Scholar
Corina F, Cornelia C (2017) PositionRank: an unsupervised approach to Keyphrase extraction from scholarly documents. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 1105–1115
Arroyo-Fernandez I, Mendez-Cruz C-F, Sierra G, Torres-Moreno J-M, Sidorov G (2019) Unsupervised sentence representations as word information series: revisiting TF-IDF. Comput Speech Lang 56:107–129
Article Google Scholar
Zhiyuan L, Wenyi H, Yabin Z, Maosong S (2010) Automatic Keyphrase Extraction via Topic Decomposition. In: Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (EMNLP), pp 366–376
Xiong A, Liu D, Tian H, Liu Z, Peng Yu, Kadoch M (2021) News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Sci Technol 26(6):886–893
Article Google Scholar
Lawrence P, Sergey B, Rajeev M, Terry W (1999) The PageRank citation ranking : bringing order to the web. Available online: https://www.bibsonomy.org/bibtex/2eb5a6b6671b4dd97e6921da016f85993/albinzehe
Tomas M, Kai C, Greg C, Jeffrey D (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Armand J, Edouard G, Piotr B, Tomas M (2017) Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp 427–431
Matthew P, Mark N, Mohit I, Matt G, Christopher C, Kenton L, Luke Z (2018) Deep contextualized word representations. In: North American Association for Computational Linguistics (NAACL), pp 1–16
Tuohang L, Liang H, Hongtu L, Chengyu S, Shuai L, Ling C (2021) TripleRank: an unsupervised keyphrase extraction algorithm. Knowl-Based Syst 219:106864. https://doi.org/10.1016/j.knosys.2021.106846
Article Google Scholar
Muskan G, Mukesh K (2022) KEST: a graph-based keyphrase extraction technique for tweets summarization using Markov Decision Process. Knowl-Based Syst 209:118110. https://doi.org/10.1016/j.eswa.2022.118110
Article Google Scholar
Bodlaj Jernej and Batagelj Vladimir (2015) Hierarchical link clustering algorithm in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 91(6):062814
Article Google Scholar
Park Y, Bader JS (2011) Resolving the structure of interactomes with hierarchical agglomerative clustering. BMC Bioinform 12(Suppl 1):S44. https://doi.org/10.1186/1471-2105-12-S1-S44
Article Google Scholar
Sergey B, Lawrence P (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117
Article Google Scholar
Anette H (2003) Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 216–223
Kim SN, Medelyan O, Kan MY, Baldwin T (2013) Automatic Keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742
Article Google Scholar
Kristina T, Dan K, Christoper M, Yoram S (2003) Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp 173–180
Xiaolei L, Tommy C (2021) Duration modeling with semi-Markov conditional random fields for Keyphrase extraction. IEEE Trans Knowl Data Eng 33(4):1453–1466
Article Google Scholar
Liu R, Lin Z, Wang W (2021) Addressing extraction and generation separately: Keyphrase prediction with pre-trained language models. IEEE/ACM Trans Audio Speech Langu Process 29:3180–3191. https://doi.org/10.1109/TASLP.2021.3120587
Article Google Scholar
Linhan Z, Qian C, Wen W, Chong D, Shiliang Z, Bing L, Wei W, Xin C (2022) MDERank-a masked document embedding rank approach for unsupervised Keyphrase extraction, Annual Meeting of the Association for Computational Linguistics (ACL), pp 396–409

Download references

Acknowledgments

This work was support by National Natural Science Foundation of China (grant number: 62077023 and 61937001), and National Key R &D Program of China titled with the Large-Scale Longitudinal and Cross-Sectional Study of Student Development (grant number: 2021YFC3340800).

Author information

Authors and Affiliations

National Engineering Research Center for E-Learning, Huazhong Normal University, Wuhan, China
Shengbin Liao & Zhangxiong zheng
National Engineering Laboratory for Educational Big Data, Huazhong Normal University, Wuhan, China
Zongkai Yang
Wuhan Vocational College of Software and Engineering, Wuhan, China
Qingzhou Liao

Authors

Shengbin Liao
View author publications
You can also search for this author in PubMed Google Scholar
Zongkai Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qingzhou Liao
View author publications
You can also search for this author in PubMed Google Scholar
Zhangxiong zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shengbin Liao.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liao, S., Yang, Z., Liao, Q. et al. TopicLPRank: a keyphrase extraction method based on improved TopicRank. J Supercomput 79, 9073–9092 (2023). https://doi.org/10.1007/s11227-022-05022-0

Download citation

Accepted: 24 December 2022
Published: 07 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s11227-022-05022-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TopicLPRank: a keyphrase extraction method based on improved TopicRank

Abstract

Access this article

Similar content being viewed by others

Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model

An unsupervised keyphrase extraction model by incorporating structural and semantic information

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TopicLPRank: a keyphrase extraction method based on improved TopicRank

Abstract

Access this article

Similar content being viewed by others

Contextual topic discovery using unsupervised keyphrase extraction and hierarchical semantic graph model

An unsupervised keyphrase extraction model by incorporating structural and semantic information

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Data availability

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation