A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Kılıç, Hüma; Çetin, Aydın

doi:10.1007/s13369-023-07721-z

A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Research Article-Computer Engineering and Computer Science
Published: 04 March 2023

Volume 48, pages 10673–10680, (2023)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

252 Accesses
2 Citations
Explore all metrics

Abstract

Keyword extraction is a fundamental problem in natural language processing applications. Many graph-based models can be found in the literature that construct a graph of word co-occurrences from the input text to solve this problem. These models use graph-based features, such as Betweenness Centrality, Closeness Centrality, Eigenvector Centrality, Degree, PageRank, Clustering Coefficient, Eccentricity, Structural Hole and Coreness. In this paper, we propose a novel graph-based token classification model based on commonly used graph-based features. We used extra tree, lasso, genetic algorithm and wrapper methods to filter most informative group from all features. The token classification module of the model uses the Random Forest Ensemble classification algorithm. The performance results were evaluated with the commonly used datasets Inspec, Semeval-2017, and 500N-KPCrowd. The proposed model was also evaluated with the newly collected TRDizinEn and DergiParkEn datasets. Semeval-2017, 500N-KPCrowd, DergiParkEn, and TRDizinEn achieved the highest \({F_1}\)-scores of 0.641, 0.694, 0.707, and 0.766, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A Review on Word Embedding Techniques for Text Classification

Transformer models for text-based emotion detection: a review of BERT-based approaches

Article 08 February 2021

References

Al-Sulttani, A.O.; Al-Mukhtar, M.; Roomi, A.B.; Farooque, A.A.; Khedher, K.M.; Yaseen, Z.M.: Proposition of new ensemble data-intelligence models for surface water quality prediction. IEEE Access 9, 108527–108541 (2021)
Article Google Scholar
Yan, G.; Yu, C.; Bai, Y.: Wind turbine bearing temperature forecasting using a new data-driven ensemble approach. Machines 9(11), 248 (2021)
Article Google Scholar
Afan, H.A.; Osman Ibrahem Ahmed, A.; Essam, Y.; Ahmed, A.N.; Huang, Y.F.; Kisi, O.; Sherif, M.; Sefelnasr, A.; Chau, K.-W.; El-Shafie, A.: Modeling the fluctuations of groundwater level by employing ensemble deep learning techniques. Eng. Appl. Comput. Fluid Mech. 15(1), 1420–1439 (2021)
Wang, W.-C.; Du, Y.-J.; Chau, K.-W.; Xu, D.-M.; Liu, C.-J.; Ma, Q.: An ensemble hybrid forecasting model for annual runoff based on sample entropy, secondary decomposition, and long short-term memory neural network. Water Resour. Manage 35(14), 4695–4726 (2021)
Article Google Scholar
Shamshirband, S.; Jafari Nodoushan, E.; Adolf, J.E.; Abdul Manaf, A.; Mosavi, A.; Chau, K.-W.: Ensemble models with uncertainty analysis for multi-day ahead forecasting of chlorophyll a concentration in coastal waters. Eng. Appl. Comput. Fluid Mech. 13(1), 91–101 (2019)
Google Scholar
Alizadeh, M.J.; Jafari Nodoushan, E.; Kalarestaghi, N.; Chau, K.W.: Toward multi-day-ahead forecasting of suspended sediment concentration using ensemble models. Environ. Sci. Pollut. Res. 24(36), 28017–28025 (2017)
Article Google Scholar
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223 (2003)
Marujo, L.; Viveiros, M.; Neto, J.P.d.S.: Keyphrase cloud generation of broadcast news. Preprint at https://arxiv.org/abs/1306.4606 (2013)
Salton, G.; Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Ramos, J.: Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 29–48 . New Jersey, USA (2003)
El-Beltagy, S.R.; Rafea, A.: Kp-miner: a keyphrase extraction system for English and Arabic documents. Inf. Syst. 34(1), 132–144 (2009)
Article Google Scholar
Hong, B.; Zhen, D.: An extended keyword extraction method. Phys. Proc. 24, 1120–1127 (2012)
Article Google Scholar
Pay, T.: Totally automated keyword extraction. In: 2016 IEEE International Conference on Big Data (big Data), pp. 3859–3863 . IEEE (2016)
Li, J.; Fan, Q.; Zhang, K.: Keyword extraction based on TF/IDF for Chinese news document. Wuhan Univ. J. Natl. Sci. 12(5), 917–921 (2007)
Article Google Scholar
Li, T.; Hu, L.; Li, H.; Sun, C.; Li, S.; Chi, L.: Triplerank: an unsupervised keyphrase extraction algorithm. Knowl.-Based Syst. 219, 106846 (2021)
Article Google Scholar
Tomokiyo, T.; Hurst, M.: A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 33–40 (2003)
Nguyen, T.D.; Kan, M.-Y.: Keyphrase extraction in scientific publications. In: International Conference on Asian Digital Libraries, pp. 317–326. Springer (2007)
Haddoud, M.; Abdeddaïm, S.: Accurate keyphrase extraction by discriminating overlapping phrases. J. Inf. Sci. 40(4), 488–500 (2014)
Article Google Scholar
Mihalcea, R.; Tarau, P.: Textrank: Bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Zhao, W.X.; Jiang, J.; He, J.; Song, Y.; Achanauparp, P.; Lim, E.-P.; Li, X.: Topical keyphrase extraction from twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 379–388 (2011)
Florescu, C.; Caragea, C.: Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers), pp. 1105–1115 (2017)
Alfarra, M.R.; Alfarra, A.: Graph-based technique for extracting keyphrases in a single-document (gtek). In: 2018 International Conference on Promising Electronic Technologies (ICPET), pp. 92–97. IEEE(2018)
Duari, S.; Bhatnagar, V.: Complex network based supervised keyword extractor. Expert Syst. Appl. 140, 112876 (2020)
Article Google Scholar
Wang, B.; Yang, B.; Shan, S.; Chen, H.: Detecting hot topics from academic big data. IEEE Access 7, 185916–185927 (2019)
Article Google Scholar
Basaldella, M.; Antolli, E.; Serra, G.; Tasso, C.: Bidirectional lstm recurrent neural network for keyphrase extraction. In: Italian Research Conference on Digital Libraries, pp. 180–187. Springer (2018)
Bennani-Smires, K.; Musat, C.; Hossmann, A.; Baeriswyl, M.; Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. Preprint at https://arxiv.org/abs/1801.04470 (2018)
Sun, Y.; Qiu, H.; Zheng, Y.; Wang, Z.; Zhang, C.: Sifrank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)
Article Google Scholar
Liang, X.; Wu, S.; Li, M.; Li, Z.: Unsupervised keyphrase extraction by jointly modeling local and global context. Preprint at https://arxiv.org/abs/2109.07293 (2021)
Ajallouda, L.; Fagroud, F.Z.; Zellou, A.; Lahmar, E.B.: Kp-use: an unsupervised approach for key-phrases extraction from documents. Int. J. Adv. Comput. Sci. Appl. 13(4), 1–7 (2022)
Google Scholar
Zehtab-Salmasi, A.; Feizi-Derakhshi, M.-R.; Balafar, M.-A.: FRAKE: fusional real-time automatic keyword extraction. Preprint at https://arxiv.org/abs/2104.04830 (2021)
Shen, X.; Wang, Y.; Meng, R.; Shang, J.: Unsupervised deep keyphrase generation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 11303–11311 (2022)
Nikzad-Khasmakhi, N.; Feizi-Derakhshi, M.-R.; Asgari-Chenaghlu, M.; Balafar, M.-A.; Feizi-Derakhshi, A.-R.; Rahkar-Farshi, T.; Ramezani, M.; Jahanbakhsh-Nagadeh, Z.; Zafarani-Moattar, E.; Ranjbar-Khadivi, M.: Phraseformer: multimodal key-phrase extraction using transformer and graph embedding. arXiv preprint arXiv:2106.04939 (2021)
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018)
Kılıç Ünlü, H.; Çetin, A.: Keyword extraction as sequence labeling with classification algorithms. Neural Computing and Applications, 1–10. https://doi.org/10.1007/s00521-022-07906-x (2022)
Brin, S.; Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Liu, Z.; Huang, W.; Zheng, Y.; Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376 (2010)
Wan, X.; Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860 (2008)
Bougouin, A.; Boudin, F.; Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)
Prasad, A.; Kan, M.-Y.: Glocal: Incorporating global information in local convolution for keyphrase extraction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1837–1846 (2019)
Beliga, S.; Meštrović, A.; Martinčić-Ipšić, S.: Toward selectivity based keyword extraction for Croatian news. arXiv preprint arXiv:1407.4723 (2014)
Vega-Oliveros, D.A.; Gomes, P.S.; Milios, E.E.; Berton, L.: A multi-centrality index for graph-based keyword extraction. Inf. Process. Manag. 56(6), 102063 (2019)
Article Google Scholar
Škrlj, B.; Repar, A.; Pollak, S.: Rakun: Rank-based keyword extraction via unsupervised learning and meta vertex aggregation. In: International Conference on Statistical Language and Speech Processing, pp. 311–323. Springer (2019)
Das, K.; Samanta, S.; Pal, M.: Study on centrality measures in social networks: a survey. Soc. Netw. Anal. Min. 8(1), 1–11 (2018)
Article Google Scholar
Zaki, M.J.; Meira, W., Jr.; Meira, W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, Cambridge (2014)
Book MATH Google Scholar
Barrat, A.; Barthelemy, M.; Pastor-Satorras, R.; Vespignani, A.: The architecture of complex weighted networks. Proc. Natl. Acad. Sci. 101(11), 3747–3752 (2004)
Article MATH Google Scholar
Pastor-Satorras, R.; Castellano, C.; Van Mieghem, P.; Vespignani, A.: Epidemic processes in complex networks. Rev. Modern Phys. 87(3), 925 (2015)
Article MathSciNet Google Scholar
Vega-Oliveros, D.A.; Berton, L.; de Andrade Lopes, A.; Rodrigues, F.A.: Influence maximization based on the least influential spreaders. In: SocInf@ IJCAI, pp. 3–8 (2015)
Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Augenstein, I.; Das, M.; Riedel, S.; Vikraman, L.; McCallum, A.: Semeval 2017 task 10: Scienceie-extracting keyphrases and relations from scientific publications. Preprint at https://arxiv.org/abs/1704.02853 (2017)
Krapivin, M.; Autaeu, A.; Marchese, M.: Large dataset for keyphrases extraction (2009)
Aronson, A.R.; Bodenreider, O.; Chang, H.F.; Humphrey, S.M.; Mork, J.G.; Nelson, S.J.; Rindflesch, T.C.; Wilbur, W.J.: The NLM indexing initiative. In: Proceedings of the AMIA Symposium, p. 17. American Medical Informatics Association (2000)
Kim, S.N.; Medelyan, O.; Kan, M.-Y.; Baldwin, T.; Pingar, L.: Semeval-2010 task 5: automatic keyphrase extraction from scientific (2010)
Zhao, M.-J.; Edakunni, N.; Pocock, A.; Brown, G.: Beyond Fano’s inequality: Bounds on the optimal F-score, BER, and cost-sensitive risk and their implications. J. Mach. Learn. Res. 14(1), 1033–1090 (2013)
MathSciNet MATH Google Scholar
Passon, M.; Comuzzo, M.; Serra, G.; Tasso, C.: Keyphrase extraction via an attentive model. In: Italian Research Conference on Digital Libraries, pp. 304–314. Springer (2019)
Sahrawat, D.; Mahata, D.; Zhang, H.; Kulkarni, M.; Sharma, A.; Gosangi, R.; Stent, A.; Kumar, Y.; Shah, R.R.; Zimmermann, R.: Keyphrase extraction as sequence labeling using contextualized embeddings. In: European Conference on Information Retrieval, pp. 328–335. Springer (2020)
Gero, Z.; Ho, J.: Word centrality constrained representation for keyphrase extraction. In: Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 155–161 (2021)

Download references

Acknowledgements

We thank TUBITAK Ulakbim for providing the TRDizinEn dataset for this study DergiParkEn dataset is publicly available at https://github.com/humakilicunlu/DergiParkEn.

Author information

Authors and Affiliations

Computer Engineering Department Faculty of Technology, Gazi University, Ankara, 06500, Turkey
Hüma Kılıç & Aydın Çetin

Authors

Hüma Kılıç
View author publications
You can also search for this author in PubMed Google Scholar
Aydın Çetin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hüma Kılıç.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the authors.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kılıç, H., Çetin, A. A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction. Arab J Sci Eng 48, 10673–10680 (2023). https://doi.org/10.1007/s13369-023-07721-z

Download citation

Received: 28 September 2022
Accepted: 15 February 2023
Published: 04 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s13369-023-07721-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A Review on Word Embedding Techniques for Text Classification

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Graph-Based Ensemble Token Classification Model for Keyword Extraction

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A Review on Word Embedding Techniques for Text Classification

Transformer models for text-based emotion detection: a review of BERT-based approaches

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation