Abstract
Extracting keyphrases plays a vital role in the field of natural language processing, that focuses on recognizing and retrieving significant phrases that summarize the essential information in a document. This research paper introduces a novel approach to extract keyphrases using a statistical approach based on graphs that incorporates degree centrality, TextRank, closeness, and betweenness measures and natural language processing patterns. This approach involves constructing a graph representation of the document and identifying the most important nodes in the graph and leveraging natural language processing patterns to enhance the accuracy and relevance of the extracted keyphrases. The proposed model is examined on a standard dataset for performance evaluation and its outcomes are evaluated by comparing them with the state-of-art methods for extracting keyphrases. The precision, recall, and F-measure achieved by the proposed model are 0.5263, 0.5498, and 0.5323, respectively which shows that proposed model outperforms existing models. The principal novelty of this methodology resides in the utilization of statistical techniques based on graphs and patterns of natural language processing, which enable the detection of the most pertinent nodes and keyphrases of utmost significance. The proposed approach is generalizable to a wide range of domains and text types, making it a promising approach for keyphrase extraction in various applications, including content analysis, document classification, and search engine optimization. In conclusion, the proposed approach offers a robust and scalable solution for identifying keyphrases that capture the essential information of a document. Future research can build upon this approach to improve the efficiency and effectiveness of automated text analysis.
Similar content being viewed by others
References
Hammouda Khaled M, Matute Diego N and Kamel Mohamed S 2005 Corephrase: Keyphrase extraction for document clustering. In Machine Learning and Data Mining in Pattern Recognition: 4th International Conference, MLDM 2005, Leipzig, Germany, July 9-11, 2005, Springer, Berlin, vol 4, pp. 265–274
Amit S 2001 Modern information retrieval: A brief overview. IEEE Data Eng. Bull. 24(4): 35–43
Borko H and Bernick M 1963 Automatic document classification. J. ACM (JACM) 10(2): 151–162
Papagiannopoulou Eirini and Tsoumakas Grigorios 2020 A review of keyphrase extraction. Wiley Interdiscip. Rev. Data Mining Knowl. Discov. 10(2): 1339
Nazanin F, Adeline N, Fabrice A and Béatrice D 2020 Keyword extraction: Issues and methods. Nat. Lang. Eng. 26(3): 259–291
Ajallouda L, Fagroud F Z, Zellou A and Lahmar E B 2022 Kp-use: an unsupervised approach for key-phrases extraction from documents. Int. J. Adv. Comput. Sci. Appl. 13(4)
Singhal A and Sharma D K 2021 March Keyword extraction using Renyi entropy: a statistical and domain independent method. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) 1, pp. 1970–1975. IEEE
Ding H and Luo X 2021 November AttentionRank: unsupervised keyphrase extraction using self and cross attentions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing pp. 1919–1928
Liu F, Liu F and Liu Y 2008 December Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion. In 2008 IEEE Spoken Language Technology Workshop, pp. 181–184. IEEE
Duari S and Bhatnagar V 2020 Complex network based supervised keyword extractor. Expert Syst. Appl. 140: 12876
Santosh T Y S S, Sanyal D K, Bhowmick P K, Das P P 2020 Dake: Document-level attention for keyphrase extraction. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42. Springer International Publishing, pp. 392–401
Akiko A 2003 An information-theoretic perspective of tf-idf measures. Inf. Process. Manage. 39(1): 45–65
Umadevi M 2020 Document comparison based on TF-IDF metric. Int. Res. J. Eng. Technol. 7(02): 1546–1550
Hashemzadeh B and Abdolrazzagh-Nezhad M 2020 Improving keyword extraction in multilingual texts. Int. J. Electric. Comput. Eng. 10(6): 2088–8708
Boudin F 2013 A comparison of centrality measures for graph-based keyphrase extraction. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 834–838
Florescu C and Caragea C 2017 February A position-biased pagerank algorithm for keyphrase extraction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1)
Kamal S, Mita N and Suranjan G 2012 Machine learning based keyphrase extraction: comparing decision trees, naïve Bayes, and artificial neural networks. J. Inf. Process. Syst. 8(4): 693–712
Zhang Q, Wang Y, Gong Y and Huang X-J 2016 November Keyphrase extraction using deep recurrent neural networks on twitter. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 836–845
Qinjun Q, Xie Z, Wu L and Li W 2019 Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst. Appl. 125: 157–169
Gollam R, Saiful A, Mufti M, Zamli Kamal Z and Rahman Mohammed M 2020 Teket: a tree-based unsupervised keyphrase extraction technique. Cogn. Comput. 12: 811–833
Litvak M and Last M 2008 August Graph-based keyword extraction for single-document summarization. In Coling 2008: Proceedings of the Workshop Multi-source Multilingual Information Extraction and summarization, pp. 17–24
Dey P, Chaterjee A and Roy S 2019 Influence maximization in online social network using different centrality measures as seed node of information propagation. Sādhanā 44: 1–13
Zhang J and Luo Y 2017 March Degree centrality, betweenness centrality, and closeness centrality in social network. In 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017). Atlantis Press, pp. 300–303
Mark B 2004 Betweenness centrality in large complex networks. Eur. Phys. J. B 38(2): 163–168
Kazuya O, Wei C and Li X-Y 2008 Ranking of closeness centrality for large-scale social networks. Lect. Notes Comput. Sci. 5059: 186–195
Mihalcea R and Tarau P 2004 Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411
Siddiqi S and Sharan A 2015 Keyword and keyphrase extraction techniques: a literature review. Int. J. Comput. Appl. 109(2)
Wang L and Li S 2017 August PKU_ICL at SemEval-2017 task 10: Keyphrase extraction with model ensemble and external knowledge. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 934–937
Funding
The work has not received financial support from any funding agency.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors do not have any conflict of interest for this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mehta, S., Karwa, R., Chavan, R. et al. Keyphrase extraction using graph-based statistical approach with NLP patterns. Sādhanā 49, 170 (2024). https://doi.org/10.1007/s12046-024-02494-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12046-024-02494-z