Skip to main content
Log in

Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article.

References

  1. Alzaidy R, Caragea C, Giles CL (2019) Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents World Wide Web Conf 2551–2557

  2. Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815

    Google Scholar 

  3. Rashid J, Shah SM, Irtaza A (2019) Fuzzy topic modeling approach for text mining over short text. Inf Process Manage 56(6):102060

    Article  Google Scholar 

  4. Hong T, Kim D, Ji M, Hwang W, Nam D, Park S (2022) Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents. Proc AAAI Conf Artif Intell 36(10):10767-10775

  5. Martinc M, Škrlj B, Pollak S (2022) TNT-KID: Transformer-based neural tagger for keyword identification. Nat Lang Eng 28(4):409–448

    Article  Google Scholar 

  6. Duari S, Bhatnagar V (2020) Complex network based supervised keyword extractor. Expert Syst Appl 140:112876

    Article  Google Scholar 

  7. Veisi H, Aflaki N, Parsafard P (2020) Variance-based features for keyword extraction in Persian and English text documents. Sci Iran 27(3):1301–1315

    Google Scholar 

  8. Willis A, Davis G, Ruan S, Manoharan L, Landay J, Brunskill E (2019) Key phrase extraction for generating educational question-answer pairs. Proc Sixth (2019) ACM Confe Learning@ Scale (20)1–10

  9. Zhang X, Wang Y, Wu L (2019) Research on cross language text keyword extraction based on information entropy and TextRank. 2019 IEEE 3rd Inform Technol, Netw, Electron Autom Control Conf (ITNEC) 16–19

  10. Rezqa EY, Baraka RS (2021) Document classification based on metadata and keywords extraction. 2021 Palestinian Int Conf Inform Commun Technol (PICICT) IEEE 18–24

  11. Wang H, Ye J, Yu Z, Wang J, Mao C (2020) Unsupervised keyword extraction methods based on a word graph network. Int J Ambient Comput Intell (IJACI) 11(2):68–79

    Article  Google Scholar 

  12. Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: Issues and methods. Nat Lang Eng 26(3):259–291

    Article  Google Scholar 

  13. Garg M (2021) A survey on different dimensions for graphical keyword extraction techniques: Issues and challenges. Artif Intell Rev 54:4731–4770

    Article  Google Scholar 

  14. Thushara MG, Anjali S, Nai MM (2019) An analysis on different document keyword extraction methods. 2019 3rd Int Conf Comput Methodologies Commun (ICCMC) IEEE 933–937

  15. Goz F, Mutlu A (2022) MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure. Knowl-Based Syst 251:109292

    Article  Google Scholar 

  16. Kabasakal O, Mutlu A (2021) On the effect of word positions in graph-based keyword extraction. J Naval Sci Eng 17(2):217–39

    Google Scholar 

  17. Vanyushkin A, Graschenko L (2020) Analysis of text collections for the purposes of keyword extraction task. J Inform Organ Sci 44(1):171–184

    Google Scholar 

  18. Hashemzadeh B, Abdolrazzagh-Nezhad M (2020) Improving keyword extraction in multi-lingual texts. Int J Electr Comput Eng (2088–8708) 10(6):5909–5916

  19. Wu X, Yang L (2022) Extraction of English Keyword Information Based on CAD Mesh Model. Comput Intell Neuroscience 2022:1–8

  20. Koloski B, Pollak S, Škrlj B, Martinc M (2021) Extending neural keyword extraction with TF-IDF tagset matching. arXiv preprint arXiv:2102.00472

  21. Lin JR, Hu ZZ, Li JL, Chen LM (2020) Understanding on-site inspection of construction projects based on keyword extraction and topic modeling. IEEE Access 8:198503–198517

    Article  Google Scholar 

  22. Guo W, Wang Z, Han F (2022) Multifeature fusion keyword extraction algorithm based on TextRank. IEEE Access 10:71805–71813

    Article  Google Scholar 

  23. Benghuzzi H, Elsheh MM (2020) An investigation of keywords extraction from textual documents using Word2Vec and Decision Tree. Int J Comput Sci Inform Secur (IJCSIS) 18(5):13–18

  24. Zhang M, Li X, Yue S, Yang L (2020) An empirical study of TextRank for keyword extraction. IEEE Access 8:178849–178858

    Article  Google Scholar 

  25. Ma J (2022) Research on keyword extraction algorithm in english text based on cluster analysis. Comput Intell Neuroscience 2022:1–8

  26. Joshi ML, Mittal N, Joshi N (2021) SGAKE: Semantic graph-based automatic keyword extraction from Hindi text documents. Int J Comput Digit Syst 12(01):367–381

  27. Zhang Y, Tuo M, Yin Q, Qi L, Wang X, Liu T (2020) Keywords extraction with deep neural network model. Neurocomputing 383:113–121

    Article  Google Scholar 

  28. Abid MA, Mushtaq MF, Akram U, Abbasi MA, Rustam F (2023) Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data. Mehran Univ Res J Eng Technol 42(1):88–94

    Article  Google Scholar 

  29. Yilahun H, Hamdulla A (2023) Entity extraction based on the combination of information entropy and TF-IDF. Int J Reasoning-based Intell Syst 15(1):71–78

    Google Scholar 

  30. Du W, Ge C, Yao S, Chen N, Xu L (2023) Applicability analysis and ensemble application of BERT with TF-IDF, TextRank, MMR, and LDA for topic classification based on flood-related VGI. ISPRS Int J Geo Inf 12(6):240

    Article  Google Scholar 

  31. Manjula S (2021) Identification of languages from the text document using natural language processing system. Turk J Comput Math Educ (TURCOMAT) 12(13):2465–2472

    Google Scholar 

  32. Ma L, Zhang Y (2015) Using Word2Vec to process big text data. 2015 IEEE Int Conf Big Data (Big Data) 2895–3897

  33. Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570

    Article  MathSciNet  Google Scholar 

  34. Zhou H (2022) Classification based on TF-IDF and CNN-LSTM. J Phys: Conf Ser, IOP Publ 2171(1):012021

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rubaya Khatun.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khatun, R., Sarkar, A. Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18110-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18110-5

Keywords

Navigation