Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

Khatun, Rubaya; Sarkar, Arup

doi:10.1007/s11042-024-18110-5

Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

Published: 30 January 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Rubaya Khatun¹ &
Arup Sarkar¹

108 Accesses
Explore all metrics

Abstract

During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Neural Semantic Network for Keywords Extraction on Short Text

Keyword Extraction with Character-Level Convolutional Neural Tensor Networks

A Deep Neural Network Architecture for Extracting Contextual Information

Data availability

Data sharing is not applicable to this article.

References

Alzaidy R, Caragea C, Giles CL (2019) Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents World Wide Web Conf 2551–2557
Li J (2021) A comparative study of keyword extraction algorithms for English texts. J Intell Syst 30(1):808–815
Google Scholar
Rashid J, Shah SM, Irtaza A (2019) Fuzzy topic modeling approach for text mining over short text. Inf Process Manage 56(6):102060
Article Google Scholar
Hong T, Kim D, Ji M, Hwang W, Nam D, Park S (2022) Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents. Proc AAAI Conf Artif Intell 36(10):10767-10775
Martinc M, Škrlj B, Pollak S (2022) TNT-KID: Transformer-based neural tagger for keyword identification. Nat Lang Eng 28(4):409–448
Article Google Scholar
Duari S, Bhatnagar V (2020) Complex network based supervised keyword extractor. Expert Syst Appl 140:112876
Article Google Scholar
Veisi H, Aflaki N, Parsafard P (2020) Variance-based features for keyword extraction in Persian and English text documents. Sci Iran 27(3):1301–1315
Google Scholar
Willis A, Davis G, Ruan S, Manoharan L, Landay J, Brunskill E (2019) Key phrase extraction for generating educational question-answer pairs. Proc Sixth (2019) ACM Confe Learning@ Scale (20)1–10
Zhang X, Wang Y, Wu L (2019) Research on cross language text keyword extraction based on information entropy and TextRank. 2019 IEEE 3rd Inform Technol, Netw, Electron Autom Control Conf (ITNEC) 16–19
Rezqa EY, Baraka RS (2021) Document classification based on metadata and keywords extraction. 2021 Palestinian Int Conf Inform Commun Technol (PICICT) IEEE 18–24
Wang H, Ye J, Yu Z, Wang J, Mao C (2020) Unsupervised keyword extraction methods based on a word graph network. Int J Ambient Comput Intell (IJACI) 11(2):68–79
Article Google Scholar
Firoozeh N, Nazarenko A, Alizon F, Daille B (2020) Keyword extraction: Issues and methods. Nat Lang Eng 26(3):259–291
Article Google Scholar
Garg M (2021) A survey on different dimensions for graphical keyword extraction techniques: Issues and challenges. Artif Intell Rev 54:4731–4770
Article Google Scholar
Thushara MG, Anjali S, Nai MM (2019) An analysis on different document keyword extraction methods. 2019 3rd Int Conf Comput Methodologies Commun (ICCMC) IEEE 933–937
Goz F, Mutlu A (2022) MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure. Knowl-Based Syst 251:109292
Article Google Scholar
Kabasakal O, Mutlu A (2021) On the effect of word positions in graph-based keyword extraction. J Naval Sci Eng 17(2):217–39
Google Scholar
Vanyushkin A, Graschenko L (2020) Analysis of text collections for the purposes of keyword extraction task. J Inform Organ Sci 44(1):171–184
Google Scholar
Hashemzadeh B, Abdolrazzagh-Nezhad M (2020) Improving keyword extraction in multi-lingual texts. Int J Electr Comput Eng (2088–8708) 10(6):5909–5916
Wu X, Yang L (2022) Extraction of English Keyword Information Based on CAD Mesh Model. Comput Intell Neuroscience 2022:1–8
Koloski B, Pollak S, Škrlj B, Martinc M (2021) Extending neural keyword extraction with TF-IDF tagset matching. arXiv preprint arXiv:2102.00472
Lin JR, Hu ZZ, Li JL, Chen LM (2020) Understanding on-site inspection of construction projects based on keyword extraction and topic modeling. IEEE Access 8:198503–198517
Article Google Scholar
Guo W, Wang Z, Han F (2022) Multifeature fusion keyword extraction algorithm based on TextRank. IEEE Access 10:71805–71813
Article Google Scholar
Benghuzzi H, Elsheh MM (2020) An investigation of keywords extraction from textual documents using Word2Vec and Decision Tree. Int J Comput Sci Inform Secur (IJCSIS) 18(5):13–18
Zhang M, Li X, Yue S, Yang L (2020) An empirical study of TextRank for keyword extraction. IEEE Access 8:178849–178858
Article Google Scholar
Ma J (2022) Research on keyword extraction algorithm in english text based on cluster analysis. Comput Intell Neuroscience 2022:1–8
Joshi ML, Mittal N, Joshi N (2021) SGAKE: Semantic graph-based automatic keyword extraction from Hindi text documents. Int J Comput Digit Syst 12(01):367–381
Zhang Y, Tuo M, Yin Q, Qi L, Wang X, Liu T (2020) Keywords extraction with deep neural network model. Neurocomputing 383:113–121
Article Google Scholar
Abid MA, Mushtaq MF, Akram U, Abbasi MA, Rustam F (2023) Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data. Mehran Univ Res J Eng Technol 42(1):88–94
Article Google Scholar
Yilahun H, Hamdulla A (2023) Entity extraction based on the combination of information entropy and TF-IDF. Int J Reasoning-based Intell Syst 15(1):71–78
Google Scholar
Du W, Ge C, Yao S, Chen N, Xu L (2023) Applicability analysis and ensemble application of BERT with TF-IDF, TextRank, MMR, and LDA for topic classification based on flood-related VGI. ISPRS Int J Geo Inf 12(6):240
Article Google Scholar
Manjula S (2021) Identification of languages from the text document using natural language processing system. Turk J Comput Math Educ (TURCOMAT) 12(13):2465–2472
Google Scholar
Ma L, Zhang Y (2015) Using Word2Vec to process big text data. 2015 IEEE Int Conf Big Data (Big Data) 2895–3897
Agushaka JO, Ezugwu AE, Abualigah L (2022) Dwarf mongoose optimization algorithm. Comput Methods Appl Mech Eng 391:114570
Article MathSciNet Google Scholar
Zhou H (2022) Classification based on TF-IDF and CNN-LSTM. J Phys: Conf Ser, IOP Publ 2171(1):012021
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Raiganj University, University Road, College Para, Raiganj, West Bengal, 733134, India
Rubaya Khatun & Arup Sarkar

Authors

Rubaya Khatun
View author publications
You can also search for this author in PubMed Google Scholar
Arup Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Rubaya Khatun.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khatun, R., Sarkar, A. Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18110-5

Download citation

Received: 27 April 2023
Revised: 09 September 2023
Accepted: 01 January 2024
Published: 30 January 2024
DOI: https://doi.org/10.1007/s11042-024-18110-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

Abstract

Access this article

Similar content being viewed by others

Deep Neural Semantic Network for Keywords Extraction on Short Text

Keyword Extraction with Character-Level Convolutional Neural Tensor Networks

A Deep Neural Network Architecture for Extracting Contextual Information

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

Abstract

Access this article

Similar content being viewed by others

Deep Neural Semantic Network for Keywords Extraction on Short Text

Keyword Extraction with Character-Level Convolutional Neural Tensor Networks

A Deep Neural Network Architecture for Extracting Contextual Information

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Consent to participate

Consent to publish

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation