Keyphrase extraction using graph-based statistical approach with NLP patterns

Mehta, Siddhesh; Karwa, Rushikesh; Chavan, Rahul; Khatavkar, Vaibhav; Joshi, Amit

doi:10.1007/s12046-024-02494-z

Keyphrase extraction using graph-based statistical approach with NLP patterns

Published: 05 May 2024

Volume 49, article number 170, (2024)
Cite this article

Sādhanā Aims and scope Submit manuscript

Siddhesh Mehta¹,
Rushikesh Karwa¹,
Rahul Chavan¹,
Vaibhav Khatavkar¹ &
…
Amit Joshi¹

60 Accesses
Explore all metrics

Abstract

Extracting keyphrases plays a vital role in the field of natural language processing, that focuses on recognizing and retrieving significant phrases that summarize the essential information in a document. This research paper introduces a novel approach to extract keyphrases using a statistical approach based on graphs that incorporates degree centrality, TextRank, closeness, and betweenness measures and natural language processing patterns. This approach involves constructing a graph representation of the document and identifying the most important nodes in the graph and leveraging natural language processing patterns to enhance the accuracy and relevance of the extracted keyphrases. The proposed model is examined on a standard dataset for performance evaluation and its outcomes are evaluated by comparing them with the state-of-art methods for extracting keyphrases. The precision, recall, and F-measure achieved by the proposed model are 0.5263, 0.5498, and 0.5323, respectively which shows that proposed model outperforms existing models. The principal novelty of this methodology resides in the utilization of statistical techniques based on graphs and patterns of natural language processing, which enable the detection of the most pertinent nodes and keyphrases of utmost significance. The proposed approach is generalizable to a wide range of domains and text types, making it a promising approach for keyphrase extraction in various applications, including content analysis, document classification, and search engine optimization. In conclusion, the proposed approach offers a robust and scalable solution for identifying keyphrases that capture the essential information of a document. Future research can build upon this approach to improve the efficiency and effectiveness of automated text analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Creating Automatic Connections for Personal Knowledge Management

Article 08 May 2024

Recent automatic text summarization techniques: a survey

Article 29 March 2016

Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents

Article 16 May 2024

References

Hammouda Khaled M, Matute Diego N and Kamel Mohamed S 2005 Corephrase: Keyphrase extraction for document clustering. In Machine Learning and Data Mining in Pattern Recognition: 4th International Conference, MLDM 2005, Leipzig, Germany, July 9-11, 2005, Springer, Berlin, vol 4, pp. 265–274
Amit S 2001 Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24(4): 35–43
Google Scholar
Borko H and Bernick M 1963 Automatic document classification. J. ACM (JACM) 10(2): 151–162
Article Google Scholar
Papagiannopoulou Eirini and Tsoumakas Grigorios 2020 A review of keyphrase extraction. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 10(2): 1339
Article Google Scholar
Nazanin F, Adeline N, Fabrice A and Béatrice D 2020 Keyword extraction: Issues and methods. Nat. Lang. Eng. 26(3): 259–291
Article Google Scholar
Ajallouda L, Fagroud F Z, Zellou A and Lahmar E B 2022 Kp-use: an unsupervised approach for key-phrases extraction from documents. Int. J. Adv. Comput. Sci. Appl. 13(4)
Singhal A and Sharma D K 2021 March Keyword extraction using Renyi entropy: a statistical and domain independent method. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) 1, pp. 1970–1975. IEEE
Ding H and Luo X 2021 November AttentionRank: unsupervised keyphrase extraction using self and cross attentions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing pp. 1919–1928
Liu F, Liu F and Liu Y 2008 December Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion. In 2008 IEEE Spoken Language Technology Workshop, pp. 181–184. IEEE
Duari S and Bhatnagar V 2020 Complex network based supervised keyword extractor. Expert Syst. Appl. 140: 12876
Article Google Scholar
Santosh T Y S S, Sanyal D K, Bhowmick P K, Das P P 2020 Dake: Document-level attention for keyphrase extraction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer International Publishing, pp. 392–401
Akiko A 2003 An information-theoretic perspective of tf-idf measures. Inf. Process. Manage. 39(1): 45–65
Article Google Scholar
Umadevi M 2020 Document comparison based on TF-IDF metric. Int. Res. J. Eng. Technol. 7(02): 1546–1550
Google Scholar
Hashemzadeh B and Abdolrazzagh-Nezhad M 2020 Improving keyword extraction in multilingual texts. Int. J. Electric. Comput. Eng. 10(6): 2088–8708
Google Scholar
Boudin F 2013 A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 834–838
Florescu C and Caragea C 2017 February A position-biased pagerank algorithm for keyphrase extraction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1)
Kamal S, Mita N and Suranjan G 2012 Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks. J. Inf. Process. Syst. 8(4): 693–712
Article Google Scholar
Zhang Q, Wang Y, Gong Y and Huang X-J 2016 November Keyphrase extraction using deep recurrent neural networks on twitter. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845
Qinjun Q, Xie Z, Wu L and Li W 2019 Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst. Appl. 125: 157–169
Article Google Scholar
Gollam R, Saiful A, Mufti M, Zamli Kamal Z and Rahman Mohammed M 2020 Teket: a tree-based unsupervised keyphrase extraction technique. Cogn. Comput. 12: 811–833
Article Google Scholar
Litvak M and Last M 2008 August Graph-based keyword extraction for single-document summarization. In Coling 2008: Proceedings of the Workshop Multi-source Multilingual Information Extraction and summarization, pp. 17–24
Dey P, Chaterjee A and Roy S 2019 Influence maximization in online social network using different centrality measures as seed node of information propagation. Sādhanā 44: 1–13
Article MathSciNet Google Scholar
Zhang J and Luo Y 2017 March Degree centrality, betweenness centrality, and closeness centrality in social network. In 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017). Atlantis Press, pp. 300–303
Mark B 2004 Betweenness centrality in large complex networks. Eur. Phys. J. B 38(2): 163–168
Article Google Scholar
Kazuya O, Wei C and Li X-Y 2008 Ranking of closeness centrality for large-scale social networks. Lect. Notes Comput. Sci. 5059: 186–195
Article Google Scholar
Mihalcea R and Tarau P 2004 Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411
Siddiqi S and Sharan A 2015 Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2)
Wang L and Li S 2017 August PKU_ICL at SemEval-2017 task 10: Keyphrase extraction with model ensemble and external knowledge. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 934–937

Download references

Funding

The work has not received financial support from any funding agency.

Author information

Authors and Affiliations

Department of Computer Engineering and IT, CoEP Technological University, Pune, India
Siddhesh Mehta, Rushikesh Karwa, Rahul Chavan, Vaibhav Khatavkar & Amit Joshi

Authors

Siddhesh Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Rushikesh Karwa
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Chavan
View author publications
You can also search for this author in PubMed Google Scholar
Vaibhav Khatavkar
View author publications
You can also search for this author in PubMed Google Scholar
Amit Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siddhesh Mehta.

Ethics declarations

Conflict of Interest

The authors do not have any conflict of interest for this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mehta, S., Karwa, R., Chavan, R. et al. Keyphrase extraction using graph-based statistical approach with NLP patterns. Sādhanā 49, 170 (2024). https://doi.org/10.1007/s12046-024-02494-z

Download citation

Received: 10 April 2023
Revised: 12 July 2023
Accepted: 11 February 2024
Published: 05 May 2024
DOI: https://doi.org/10.1007/s12046-024-02494-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Keyphrase extraction using graph-based statistical approach with NLP patterns

Abstract

Access this article

Similar content being viewed by others

Creating Automatic Connections for Personal Knowledge Management

Recent automatic text summarization techniques: a survey

Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Keyphrase extraction using graph-based statistical approach with NLP patterns

Abstract

Access this article

Similar content being viewed by others

Creating Automatic Connections for Personal Knowledge Management

Recent automatic text summarization techniques: a survey

Privacy BERT-LSTM: a novel NLP algorithm for sensitive information detection in textual documents

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation