Abstract
During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article.
References
Alzaidy R, Caragea C, Giles CL (2019) Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents World Wide Web Conf 2551–2557
Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815
Rashid J, Shah SM, Irtaza A (2019) Fuzzy topic modeling approach for text mining over short text. Inf Process Manage 56(6):102060
Hong T, Kim D, Ji M, Hwang W, Nam D, Park S (2022) Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents. Proc AAAI Conf Artif Intell 36(10):10767-10775
Martinc M, Škrlj B, Pollak S (2022) TNT-KID: Transformer-based neural tagger for keyword identification. Nat Lang Eng 28(4):409–448
Duari S, Bhatnagar V (2020) Complex network based supervised keyword extractor. Expert Syst Appl 140:112876
Veisi H, Aflaki N, Parsafard P (2020) Variance-based features for keyword extraction in Persian and English text documents. Sci Iran 27(3):1301–1315
Willis A, Davis G, Ruan S, Manoharan L, Landay J, Brunskill E (2019) Key phrase extraction for generating educational question-answer pairs. Proc Sixth (2019) ACM Confe Learning@ Scale (20)1–10
Zhang X, Wang Y, Wu L (2019) Research on cross language text keyword extraction based on information entropy and TextRank. 2019 IEEE 3rd Inform Technol, Netw, Electron Autom Control Conf (ITNEC) 16–19
Rezqa EY, Baraka RS (2021) Document classification based on metadata and keywords extraction. 2021 Palestinian Int Conf Inform Commun Technol (PICICT) IEEE 18–24
Wang H, Ye J, Yu Z, Wang J, Mao C (2020) Unsupervised keyword extraction methods based on a word graph network. Int J Ambient Comput Intell (IJACI) 11(2):68–79
Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: Issues and methods. Nat Lang Eng 26(3):259–291
Garg M (2021) A survey on different dimensions for graphical keyword extraction techniques: Issues and challenges. Artif Intell Rev 54:4731–4770
Thushara MG, Anjali S, Nai MM (2019) An analysis on different document keyword extraction methods. 2019 3rd Int Conf Comput Methodologies Commun (ICCMC) IEEE 933–937
Goz F, Mutlu A (2022) MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure. Knowl-Based Syst 251:109292
Kabasakal O, Mutlu A (2021) On the effect of word positions in graph-based keyword extraction. J Naval Sci Eng 17(2):217–39
Vanyushkin A, Graschenko L (2020) Analysis of text collections for the purposes of keyword extraction task. J Inform Organ Sci 44(1):171–184
Hashemzadeh B, Abdolrazzagh-Nezhad M (2020) Improving keyword extraction in multi-lingual texts. Int J Electr Comput Eng (2088–8708) 10(6):5909–5916
Wu X, Yang L (2022) Extraction of English Keyword Information Based on CAD Mesh Model. Comput Intell Neuroscience 2022:1–8
Koloski B, Pollak S, Škrlj B, Martinc M (2021) Extending neural keyword extraction with TF-IDF tagset matching. arXiv preprint arXiv:2102.00472
Lin JR, Hu ZZ, Li JL, Chen LM (2020) Understanding on-site inspection of construction projects based on keyword extraction and topic modeling. IEEE Access 8:198503–198517
Guo W, Wang Z, Han F (2022) Multifeature fusion keyword extraction algorithm based on TextRank. IEEE Access 10:71805–71813
Benghuzzi H, Elsheh MM (2020) An investigation of keywords extraction from textual documents using Word2Vec and Decision Tree. Int J Comput Sci Inform Secur (IJCSIS) 18(5):13–18
Zhang M, Li X, Yue S, Yang L (2020) An empirical study of TextRank for keyword extraction. IEEE Access 8:178849–178858
Ma J (2022) Research on keyword extraction algorithm in english text based on cluster analysis. Comput Intell Neuroscience 2022:1–8
Joshi ML, Mittal N, Joshi N (2021) SGAKE: Semantic graph-based automatic keyword extraction from Hindi text documents. Int J Comput Digit Syst 12(01):367–381
Zhang Y, Tuo M, Yin Q, Qi L, Wang X, Liu T (2020) Keywords extraction with deep neural network model. Neurocomputing 383:113–121
Abid MA, Mushtaq MF, Akram U, Abbasi MA, Rustam F (2023) Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data. Mehran Univ Res J Eng Technol 42(1):88–94
Yilahun H, Hamdulla A (2023) Entity extraction based on the combination of information entropy and TF-IDF. Int J Reasoning-based Intell Syst 15(1):71–78
Du W, Ge C, Yao S, Chen N, Xu L (2023) Applicability analysis and ensemble application of BERT with TF-IDF, TextRank, MMR, and LDA for topic classification based on flood-related VGI. ISPRS Int J Geo Inf 12(6):240
Manjula S (2021) Identification of languages from the text document using natural language processing system. Turk J Comput Math Educ (TURCOMAT) 12(13):2465–2472
Ma L, Zhang Y (2015) Using Word2Vec to process big text data. 2015 IEEE Int Conf Big Data (Big Data) 2895–3897
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570
Zhou H (2022) Classification based on TF-IDF and CNN-LSTM. J Phys: Conf Ser, IOP Publ 2171(1):012021
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript give full consent for publication of this submitted article.
Conflict of interest
Authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khatun, R., Sarkar, A. Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18110-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18110-5