A feature selection model for document classification using Tom and Jerry Optimization algorithm

Thirumoorthy, K; Britto, J Jerold John

doi:10.1007/s11042-023-15828-6

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Published: 21 June 2023

Volume 83, pages 10273–10295, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

204 Accesses
1 Altmetric
Explore all metrics

Abstract

Since the last decade, high-dimensional data has been increasing in various document mining fields, such as text summarization, text clustering, and text classification. The curse of dimensionality has an impact on the classification model’s performance. The feature selection strategy is extremely effective in dealing with the curse of dimensionality issue. In this work, we present the Tom and Jerry Optimization technique(TJO) for feature subset selection. The proposed work uses the classifier error rate and the feature chosen rate to measure the candidate’s fitness. The performance of the proposed scheme is examined using two popular benchmark text corpus and compared with five metaheuristic approaches. The best success rate obtained by the proposed scheme is 95.77%, whereas the best precision is 0.9509, recall is 0.9577 and F1-score is 0.9541. According to the comparison results, the proposed feature subset selection scheme outperforms the standard strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Article 05 March 2020

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

Data Availibility Statement

The dataset used in this study are available in a public repository

Notes

References

Adam SP, Alexandropoulos SAN, Pardalos PM, Vrahatis MN (2019) No free lunch theorem: a review. Springer, Cham, pp 57–82. https://doi.org/10.1007/978-3-030-12767-1_5
Book Google Scholar
Bahassine S, Madani A, Al-Sarem M, Kissi M (2020) Feature selection using an improved chi-square for arabic text classification. J King Saud Univ Comput Inf Sci 32(2):225–231 . http://www.sciencedirect.com/science/article/pii/S131915781730544X
Bai, X, Gao, X, Xue, B (2018) Particle swarm optimization based two-stage feature selection in text mining. In: 2018 IEEE Congress on evolutionary computation (CEC), pp 1–8. https://doi.org/10.1109/CEC.2018.8477773
Balochian S, Baloochian H (2019) Social mimic optimization algorithm and engineering applications. Exp Syst Appl 134:178–191
Article Google Scholar
Behjat A, Mustapha A, Nezamabadi-pour H (2013) Sulaiman, MN. A PSO-based feature subset selection for application of spam/non-spam detection 378:183–193. https://doi.org/10.1007/978-3-642-40567-9_16
Article Google Scholar
Chakravarthy S, Rajaguru H (2019) Comparison analysis of linear discriminant analysis and cuckoo-search algorithm in the classification of breast cancer from digital mammograms. Asian Pacific J Cancer Prev 20:2333–2337. https://doi.org/10.31557/APJCP.2019.20.8.2333
Article Google Scholar
Chantar H, Mafarja M, Alsawalqah H, Heidari AA, Aljarah I, Faris H (2020) Feature selection using binary grey wolf optimizer with elite-based crossover for arabic text classification. Neural Comput Appl 32(16):12,201-12,220. https://doi.org/10.1007/s00521-019-04368-6
Article Google Scholar
Dada EG, Bassi JS, Chiroma H, Abdulhamid SM, Adetunmbi AO, Ajibuwa OE (2019) Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 5(6):e01,802. https://doi.org/10.1016/j.heliyon.2019.e01802, https://www.sciencedirect.com/science/article/pii/S2405844018353404
Dey Sarkar S, Goswami S, Agarwal A (2014) Aktar, J (2014) A novel feature selection technique for text classification using naïve bayes. Int Scholarly Res Notices 717:092. https://doi.org/10.1155/2014/717092
Article Google Scholar
Dhar, A, Dash, N, Roy, K (2019) Efficient feature selection based on modified cuckoo search optimization problem for classifying web text documents, pp 640–651. https://doi.org/10.1007/978-981-13-9187-3_57
Elakiya E, Rajkumar N (2021) In text mining: detection of topic and sub-topic using multiple spider hunting model. J Amb Intell Human Comput 12(3):3571–3580. https://doi.org/10.1007/s12652-019-01588-5
Article Google Scholar
Feng G, Guo J, Jing BY, Sun T (2015) Feature subset selection using naive bayes for text classification. Pattern Recogn Lett 65:109–115. https://doi.org/10.1016/j.patrec.2015.07.028
Article Google Scholar
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Exp Syst Appl 49:31–47. https://doi.org/10.1016/j.eswa.2015.12.004 https://www.sciencedirect.com/science/article/pii/S0957417415007952
Jalal, N, Mehmood, A, Choi, GS, Ashraf, I (2022) A novel improved random forest for text classification using feature ranking and optimal number of trees. J King Saud Univ Comput Inf Sci 34(6, Part A):2733–2742. https://doi.org/10.1016/j.jksuci.2022.03.012. https://www.sciencedirect.com/science/article/pii/S1319157822000969
Karpagalingam T, Karuppaiah M (2021) Feature selection using hybrid poor and rich optimization algorithm for text classification. Pattern Recogn Lett 147:63–70 https://doi.org/10.1016/j.patrec.2021.03.034 https://www.sciencedirect.com/science/article/pii/S016786552100129X
Kawade D (2017) Sentiment analysis: Machine learning approach. Int J Eng Technol 19:2183–2186. https://doi.org/10.21817/ijet/2017/v9i3/170903151
Article Google Scholar
Kim K, Zzang SY (2019) Trigonometric comparison measure: a feature selection method for text categorization. Data Knowl Eng 119:1–21. https://doi.org/10.1016/j.datak.2018.10.003 https://www.sciencedirect.com/science/article/pii/S0169023X18300922
Kumar A, Jaiswal A, Garg S, Verma S, Kumar S (2019) Sentiment analysis using cuckoo search for optimized feature selection on kaggle tweets. Int J Inf Retr Res 9:1–15. https://doi.org/10.4018/IJIRR.2019010101
Article Google Scholar
Kumar, A, Khorwal, R (2017) Firefly algorithm for feature selection in sentiment analysis, pp 693–703. https://doi.org/10.1007/978-981-10-3874-7_66
Larabi Marie-Sainte S, Alalyani N (2020) Firefly algorithm based feature selection for arabic text classification. J King Saud Univ Comput Inf Sci 32(3):320–328. https://doi.org/10.1016/j.jksuci.2018.06.004 . https://www.sciencedirect.com/science/article/pii/S131915781830106X
Mirjalili S (2015) Dragonfly algorithm: A new meta-heuristic optimization technique for solving single-objective, discrete, and multi-objective problems. Neural Comput Appl. https://doi.org/10.1007/s00521-015-1920-1
Mirjalili S (2016) Sca: a sine cosine algorithm for solving optimization problems. Knowl-Based Syst 96:120–133. https://doi.org/10.1016/j.knosys.2015.12.022 . http://www.sciencedirect.com/science/article/pii/S0950705115005043
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008 . http://www.sciencedirect.com/science/article/pii/S0965997816300163
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007 . http://www.sciencedirect.com/science/article/pii/S0965997813001853
Moghdani R, Salimifard K (2018) Volleyball premier league algorithm. Applied Soft Comput 64:161–185. https://doi.org/10.1016/j.asoc.2017.11.043 . http://www.sciencedirect.com/science/article/pii/S1568494617307068
Dehghani MZ, Montazeri OPMHG, Guerrero JM (2020) Shell game optimization: a novel game-based algorithm. Int J Intell Eng Syst 13:246–255. https://doi.org/10.22266/ijies2020.0630.23
Article Google Scholar
Moosavi SHS, Bardsiri VK (2019) Poor and rich optimization algorithm: a new human-based and multi populations algorithm. Eng Appl Artif Intell 86:165–181. https://doi.org/10.1016/j.engappai.2019.08.025 . http://www.sciencedirect.com/science/article/pii/S0952197619302167
Neogi PPG, Das AK, Goswami S, Mustafi J (2020) Topic modeling for text classification. In: Mandal JK, Bhattacharya D (eds) Emerging technology in modelling and graphics. Springer, Singapore, pp 395–407
Chapter Google Scholar
Parlak, B, Uysal, AK (2021) A novel filter feature selection method for text classification: extensive feature selector. J Inf Sci :1–20. https://doi.org/10.1177/0165551521991037
Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manag 53(2):473–489. https://doi.org/10.1016/j.ipm.2016.12.004
Article Google Scholar
Rehman A, Javed K, Babri HA, Asim MN (2018) Selection of the most relevant terms based on a max-min ratio metric for text classification. Exp Syst Appl 114:78–96. https://doi.org/10.1016/j.eswa.2018.07.028 . https://www.sciencedirect.com/science/article/pii/S0957417418304457
Rustam Z, Amalia Y, Hartini S, Saragih G (2021) Linear discriminant analysis and support vector machines for classifying breast cancer. IAES Int J Artif Intell (IJ-AI) 10:253. https://doi.org/10.11591/ijai.v10.i1.pp253-256
Article Google Scholar
Saigal P, Khanna V (2020) Multi-category news classification using support vector machine based classifiers. SN Appl Sci 2(3):458. https://doi.org/10.1007/s42452-020-2266-6
Article Google Scholar
Saremi S, Mirjalili S, Lewis A (2017) Grasshopper optimisation algorithm: theory and application. Adv Eng Softw 105:30–47. https://doi.org/10.1016/j.advengsoft.2017.01.004 . http://www.sciencedirect.com/science/article/pii/S0965997816305646
Sel, I, Karci, A, Hanbay, D.: Feature selection for text classification using mutual information. In: 2019 International artificial intelligence and data processing symposium (IDAP), pp 1–4. https://doi.org/10.1109/IDAP.2019.8875927
Shadravan S, Naji H, Bardsiri V (2019) The sailfish optimizer: a novel nature-inspired metaheuristic algorithm for solving constrained engineering optimization problems. Eng Appl Artif Intell 80:20–34. https://doi.org/10.1016/j.engappai.2019.01.001 . http://www.sciencedirect.com/science/article/pii/S0952197619300016
Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl-Based Syst 54:298–309. https://doi.org/10.1016/j.knosys.2013.09.019 . https://www.sciencedirect.com/science/article/pii/S0950705113003067
Thirumoorthy K, Muneeswaran K (2020) Optimal feature subset selection using hybrid binary jaya optimization algorithm for text classification. Sādhanā 45(1):201. https://doi.org/10.1007/s12046-020-01443-w
Article Google Scholar
Thirumoorthy K, Muneeswaran K (2021) Feature selection for text classification using machine learning approaches. Nat’l Acad Sci Lett. https://doi.org/10.1007/s40009-021-01043-0
Uysal AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl-Based Syst 36:226–235. https://doi.org/10.1016/j.knosys.2012.06.005. www.sciencedirect.com/science/article/pii/S0950705112001761
Venkata Rao R (2016) Jaya: a simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Ind Eng Comput 7:19–34. https://doi.org/10.5267/j.ijiec.2015.8.004
Article Google Scholar
Venkata Rao, R (2020) Rao algorithms: three metaphor-less simple algorithms for solving optimization problems. Int J Ind Eng Comput :107–130. https://doi.org/10.5267/j.ijiec.2019.6.002
Wang L, Gao Y, Li J, Wang X (2021) A feature selection method by using chaotic cuckoo search optimization algorithm with elitist preservation and uniform mutation for data classification. Discr Dyn Nat Soc 2021:1–19. https://doi.org/10.1155/2021/7796696
Wei L, Wei B, Wang B (2012) Text classification using support vector machine with mixture of kernel. J Softw Eng Appl 05:55–58. https://doi.org/10.4236/jsea.2012.512B012
Article Google Scholar
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Yazdani M, Jolai F (2016) Lion optimization algorithm (loa): A nature-inspired metaheuristic algorithm. J Comput Des Eng 3(1):24–36. https://doi.org/10.1016/j.jcde.2015.06.003. www.sciencedirect.com/science/article/pii/S2288430015000524
Yigit, F, Baykan, OK (2014) A new feature selection method for text categorization based on information gain and particle swarm optimization. In: 2014 IEEE 3rd International conference on cloud computing and intelligence systems, pp 523–529. https://doi.org/10.1109/CCIS.2014.7175792
Zhou H, Zhang Y, Liu H, Zhang Y (2018) Feature selection based on term frequency reordering of document level. IEEE Access 6:51,655-51,668
Article Google Scholar
Zhu, L, Wang, G, Zou, X (2017) Improved information gain feature selection method for chinese text classification based on word embedding. In: Proceedings of the 6th international conference on software and computer applications, ICSCA ’17, Association for Computing Machinery, New York, pp 72–76. https://doi.org/10.1145/3056662.3056671
Zhu, W, Feng, J, Lin, Y (2014) Using gini-index for feature selection in text categorization. In: Proceedings of the 2014 International conference on information, business and education technology, Atlantis Press, pp 76–80. https://doi.org/10.2991/icibet-14.2014.22

Download references

Author information

Authors and Affiliations

Mepco Schlenk Engineering College, Sivakasi, 626005, India
K Thirumoorthy
Ramco Institute of Technology, Rajapalayam, 626117, India
J Jerold John Britto

Authors

K Thirumoorthy
View author publications
You can also search for this author in PubMed Google Scholar
J Jerold John Britto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K Thirumoorthy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Thirumoorthy, K., Britto, J.J.J. A feature selection model for document classification using Tom and Jerry Optimization algorithm. Multimed Tools Appl 83, 10273–10295 (2024). https://doi.org/10.1007/s11042-023-15828-6

Download citation

Received: 27 April 2022
Revised: 23 March 2023
Accepted: 10 May 2023
Published: 21 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15828-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Data Availibility Statement

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Abstract

Access this article

Similar content being viewed by others

A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification

Feature selection techniques for machine learning: a survey of more than two decades of research

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Data Availibility Statement

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation