Skip to main content
Log in

Keyphrase extraction using graph-based statistical approach with NLP patterns

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Extracting keyphrases plays a vital role in the field of natural language processing, that focuses on recognizing and retrieving significant phrases that summarize the essential information in a document. This research paper introduces a novel approach to extract keyphrases using a statistical approach based on graphs that incorporates degree centrality, TextRank, closeness, and betweenness measures and natural language processing patterns. This approach involves constructing a graph representation of the document and identifying the most important nodes in the graph and leveraging natural language processing patterns to enhance the accuracy and relevance of the extracted keyphrases. The proposed model is examined on a standard dataset for performance evaluation and its outcomes are evaluated by comparing them with the state-of-art methods for extracting keyphrases. The precision, recall, and F-measure achieved by the proposed model are 0.5263, 0.5498, and 0.5323, respectively which shows that proposed model outperforms existing models. The principal novelty of this methodology resides in the utilization of statistical techniques based on graphs and patterns of natural language processing, which enable the detection of the most pertinent nodes and keyphrases of utmost significance. The proposed approach is generalizable to a wide range of domains and text types, making it a promising approach for keyphrase extraction in various applications, including content analysis, document classification, and search engine optimization. In conclusion, the proposed approach offers a robust and scalable solution for identifying keyphrases that capture the essential information of a document. Future research can build upon this approach to improve the efficiency and effectiveness of automated text analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  1. Hammouda Khaled M, Matute Diego N and Kamel Mohamed S 2005 Corephrase: Keyphrase extraction for document clustering. In Machine Learning and Data Mining in Pattern Recognition: 4th International Conference, MLDM 2005, Leipzig, Germany, July 9-11, 2005, Springer, Berlin, vol 4, pp. 265–274

  2. Amit S 2001 Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24(4): 35–43

    Google Scholar 

  3. Borko H and Bernick M 1963 Automatic document classification. J. ACM (JACM) 10(2): 151–162

    Article  Google Scholar 

  4. Papagiannopoulou Eirini and Tsoumakas Grigorios 2020 A review of keyphrase extraction. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 10(2): 1339

    Article  Google Scholar 

  5. Nazanin F, Adeline N, Fabrice A and Béatrice D 2020 Keyword extraction: Issues and methods. Nat. Lang. Eng. 26(3): 259–291

    Article  Google Scholar 

  6. Ajallouda L, Fagroud F Z, Zellou A and Lahmar E B 2022 Kp-use: an unsupervised approach for key-phrases extraction from documents. Int. J. Adv. Comput. Sci. Appl. 13(4)

  7. Singhal A and Sharma D K 2021 March Keyword extraction using Renyi entropy: a statistical and domain independent method. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) 1, pp. 1970–1975. IEEE

  8. Ding H and Luo X 2021 November AttentionRank: unsupervised keyphrase extraction using self and cross attentions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing pp. 1919–1928

  9. Liu F, Liu F and Liu Y 2008 December Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion. In 2008 IEEE Spoken Language Technology Workshop, pp. 181–184. IEEE

  10. Duari S and Bhatnagar V 2020 Complex network based supervised keyword extractor. Expert Syst. Appl. 140: 12876

    Article  Google Scholar 

  11. Santosh T Y S S, Sanyal D K, Bhowmick P K, Das P P 2020 Dake: Document-level attention for keyphrase extraction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer International Publishing, pp. 392–401

  12. Akiko A 2003 An information-theoretic perspective of tf-idf measures. Inf. Process. Manage. 39(1): 45–65

    Article  Google Scholar 

  13. Umadevi M 2020 Document comparison based on TF-IDF metric. Int. Res. J. Eng. Technol. 7(02): 1546–1550

    Google Scholar 

  14. Hashemzadeh B and Abdolrazzagh-Nezhad M 2020 Improving keyword extraction in multilingual texts. Int. J. Electric. Comput. Eng. 10(6): 2088–8708

    Google Scholar 

  15. Boudin F 2013 A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 834–838

  16. Florescu C and Caragea C 2017 February A position-biased pagerank algorithm for keyphrase extraction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1)

  17. Kamal S, Mita N and Suranjan G 2012 Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks. J. Inf. Process. Syst. 8(4): 693–712

    Article  Google Scholar 

  18. Zhang Q, Wang Y, Gong Y and Huang X-J 2016 November Keyphrase extraction using deep recurrent neural networks on twitter. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845

  19. Qinjun Q, Xie Z, Wu L and Li W 2019 Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst. Appl. 125: 157–169

    Article  Google Scholar 

  20. Gollam R, Saiful A, Mufti M, Zamli Kamal Z and Rahman Mohammed M 2020 Teket: a tree-based unsupervised keyphrase extraction technique. Cogn. Comput. 12: 811–833

    Article  Google Scholar 

  21. Litvak M and Last M 2008 August Graph-based keyword extraction for single-document summarization. In Coling 2008: Proceedings of the Workshop Multi-source Multilingual Information Extraction and summarization, pp. 17–24

  22. Dey P, Chaterjee A and Roy S 2019 Influence maximization in online social network using different centrality measures as seed node of information propagation. Sādhanā 44: 1–13

    Article  MathSciNet  Google Scholar 

  23. Zhang J and Luo Y 2017 March Degree centrality, betweenness centrality, and closeness centrality in social network. In 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017). Atlantis Press, pp. 300–303

  24. Mark B 2004 Betweenness centrality in large complex networks. Eur. Phys. J. B 38(2): 163–168

    Article  Google Scholar 

  25. Kazuya O, Wei C and Li X-Y 2008 Ranking of closeness centrality for large-scale social networks. Lect. Notes Comput. Sci. 5059: 186–195

    Article  Google Scholar 

  26. Mihalcea R and Tarau P 2004 Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411

  27. Siddiqi S and Sharan A 2015 Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2)

  28. Wang L and Li S 2017 August PKU_ICL at SemEval-2017 task 10: Keyphrase extraction with model ensemble and external knowledge. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 934–937

Download references

Funding

The work has not received financial support from any funding agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siddhesh Mehta.

Ethics declarations

Conflict of Interest

The authors do not have any conflict of interest for this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mehta, S., Karwa, R., Chavan, R. et al. Keyphrase extraction using graph-based statistical approach with NLP patterns. Sādhanā 49, 170 (2024). https://doi.org/10.1007/s12046-024-02494-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-024-02494-z

Keywords

Navigation