Skip to main content
Log in

TopicLPRank: a keyphrase extraction method based on improved TopicRank

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

We present a keyphrase extraction algorithm named TopicLPRank in this paper, which is an improved TopicRank algorithm. Different from the TopicRank which only uses the relative distance information of the text, we think that the length and absolute position of the text candidate keyphrases also have a certain influence on the results of the model for extraction keyphrases. Therefore, the proposed TopicLPRank incorporates these two factors on the basis of the TopicRank. The experimental results show that adding the location information and length information of candidate keyphrases can, respectively, increase the F-Score of the model by around 2.7\(\%\) points and 1.7\(\%\) points, which is equivalent to an increase of 19.6 and 12.3\(\%\) compared with the TopicRank. At the same time, the fusion of the length and location information of the candidate keyphrase can increase the F-Score by around 3.5 percentage points, which is equivalent to an increase of 25.21\(\%\) compared with the TopicRank in the dataset NUS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

All data included in this study are available upon request by contact with the corresponding author.

References

  1. Dung NT, Min-Yen K (2007) Keyphrase Extraction in Scientific Publications. In: International Conference on Asian Digital Libraries (ICADL), pp 317–326

  2. Martinez-Romo J, Araujo L, Fernandez AD (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Am Soc Inf Sci 67(1):71–82

    Google Scholar 

  3. Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815

    Google Scholar 

  4. Saef Ullah Miah M, Junaida S, Bin ST, Zamli Kamal Z, Rajan J (2021) Study of keyword extraction techniques for electric double layer capacitor domain using text similarity indexes: an experimental analysis. arXiv:2111.07068

  5. Rossi F, Caloffi A, Colovic A, Russo M (2022) New business models for public innovation intermediaries supporting emerging innovation systems: the case of the Internet of Things. Technol Forecast Soc Chang 175:121357

    Article  Google Scholar 

  6. Krutarth K, Cornelia C, Wu J, Lee Giles C (2020) Keyphrase Extraction in Scholarly Digital Library Search Engines. In: International Conference on Web Services (ICWA), pp 179–196

  7. Xinyun W, Hongyun N (2020) TF-IDF Keyword Extraction Method Combining Context and Semantic Classification. In: international Conference on Computer Information and Big Data Applications (CIBDA), pp 344–347

  8. Adrien B, Florian B, Béatrice D (2013) TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp 543–551

  9. Rada M and Paul T (2004) TextRank: Bringing Order into Texts. In: Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1–9

  10. Elsheh M (2020) An investigation of keywords extraction from textual documents using Word2Vec and decision tree. Int J Comput Sci Inform Secur 18(5):13–18

    Google Scholar 

  11. Eibe F, Paynter Gordon W, Witten Ian H, Gutwin Carl, Nevill-Manning Craig G (1999). Domain-Specic Keyphrase Extraction. In: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI), pp 668-673

  12. Kuo Z, Hui X, Jie T, Zi LJ (2006) Keyword Extraction Using Support Vector Machine. In: Proceedings of the 7th International Conference on Advances in Web-Age Information Management (WAIM), pp 85-96

  13. Wen-tau Y, Joshua G, Vitor C (2006) Finding Advertising Keywords on Web Pages. In: Proceedings of the 15th International Conference on World Wide Web (WWW), pp 213–222

  14. Kumar BS, Sathya BK (2017) Automatic keyword extraction for text summarization: a survey. arXiv:1704.03242

  15. Ding T, Yang W, Wei F, Ding C, Kang P, Wenxiu B (2022) Chinese keyword extraction model with distributed computing. Comput Electr Eng 97:107639

    Article  Google Scholar 

  16. Xiaojun W, Jianguo X (2008) Single Document Keyphrase Extraction Using Neighborhood Knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), pp 855–860

  17. Campos R, Mangaravite V, Pasquali A, Jorge A, Nunes C, Jatowt A (2020) YAKE! Keyword extraction from single documents using multiple local features. Inf Sci 509:257–289

    Article  Google Scholar 

  18. Rossi RG, Marcacini RM, Rezende SO (2014) Analysis of domain independent statistical keyword extraction methods for incremental clustering. Learn Nonlinear Models 12(1):17–37

    Article  Google Scholar 

  19. Tian X (2013) Study on keyword extraction using word position weighted TextRank. New Technol Libr Inform Serv 29(9):30–34

    Google Scholar 

  20. Hung SM, Herbert G, Arthur C, William B, Steve L (2014) Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput Speech Lang 28(1):210–223

    Article  Google Scholar 

  21. Kr S, Biswas MB, Shreya J (2018) A graph based keyword extraction model using collective node weight. Expert Syst Appl 97:51–59

    Article  Google Scholar 

  22. Corina F, Cornelia C (2017) PositionRank: an unsupervised approach to Keyphrase extraction from scholarly documents. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), pp 1105–1115

  23. Arroyo-Fernandez I, Mendez-Cruz C-F, Sierra G, Torres-Moreno J-M, Sidorov G (2019) Unsupervised sentence representations as word information series: revisiting TF-IDF. Comput Speech Lang 56:107–129

    Article  Google Scholar 

  24. Zhiyuan L, Wenyi H, Yabin Z, Maosong S (2010) Automatic Keyphrase Extraction via Topic Decomposition. In: Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (EMNLP), pp 366–376

  25. Xiong A, Liu D, Tian H, Liu Z, Peng Yu, Kadoch M (2021) News keyword extraction algorithm based on semantic clustering and word graph model. Tsinghua Sci Technol 26(6):886–893

    Article  Google Scholar 

  26. Lawrence P, Sergey B, Rajeev M, Terry W (1999) The PageRank citation ranking : bringing order to the web. Available online: https://www.bibsonomy.org/bibtex/2eb5a6b6671b4dd97e6921da016f85993/albinzehe

  27. Tomas M, Kai C, Greg C, Jeffrey D (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  28. Armand J, Edouard G, Piotr B, Tomas M (2017) Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp 427–431

  29. Matthew P, Mark N, Mohit I, Matt G, Christopher C, Kenton L, Luke Z (2018) Deep contextualized word representations. In: North American Association for Computational Linguistics (NAACL), pp 1–16

  30. Tuohang L, Liang H, Hongtu L, Chengyu S, Shuai L, Ling C (2021) TripleRank: an unsupervised keyphrase extraction algorithm. Knowl-Based Syst 219:106864. https://doi.org/10.1016/j.knosys.2021.106846

    Article  Google Scholar 

  31. Muskan G, Mukesh K (2022) KEST: a graph-based keyphrase extraction technique for tweets summarization using Markov Decision Process. Knowl-Based Syst 209:118110. https://doi.org/10.1016/j.eswa.2022.118110

    Article  Google Scholar 

  32. Bodlaj Jernej and Batagelj Vladimir (2015) Hierarchical link clustering algorithm in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 91(6):062814

    Article  Google Scholar 

  33. Park Y, Bader JS (2011) Resolving the structure of interactomes with hierarchical agglomerative clustering. BMC Bioinform 12(Suppl 1):S44. https://doi.org/10.1186/1471-2105-12-S1-S44

    Article  Google Scholar 

  34. Sergey B, Lawrence P (1998) The anatomy of a large-scale hypertextual Web search engine. Comput Netw ISDN Syst 30:107–117

    Article  Google Scholar 

  35. Anette H (2003) Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 216–223

  36. Kim SN, Medelyan O, Kan MY, Baldwin T (2013) Automatic Keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742

    Article  Google Scholar 

  37. Kristina T, Dan K, Christoper M, Yoram S (2003) Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL), pp 173–180

  38. Xiaolei L, Tommy C (2021) Duration modeling with semi-Markov conditional random fields for Keyphrase extraction. IEEE Trans Knowl Data Eng 33(4):1453–1466

    Article  Google Scholar 

  39. Liu R, Lin Z, Wang W (2021) Addressing extraction and generation separately: Keyphrase prediction with pre-trained language models. IEEE/ACM Trans Audio Speech Langu Process 29:3180–3191. https://doi.org/10.1109/TASLP.2021.3120587

    Article  Google Scholar 

  40. Linhan Z, Qian C, Wen W, Chong D, Shiliang Z, Bing L, Wei W, Xin C (2022) MDERank-a masked document embedding rank approach for unsupervised Keyphrase extraction, Annual Meeting of the Association for Computational Linguistics (ACL), pp 396–409

Download references

Acknowledgments

This work was support by National Natural Science Foundation of China (grant number: 62077023 and 61937001), and National Key R &D Program of China titled with the Large-Scale Longitudinal and Cross-Sectional Study of Student Development (grant number: 2021YFC3340800).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shengbin Liao.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, S., Yang, Z., Liao, Q. et al. TopicLPRank: a keyphrase extraction method based on improved TopicRank. J Supercomput 79, 9073–9092 (2023). https://doi.org/10.1007/s11227-022-05022-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-05022-0

Keywords

Navigation