Skip to main content
Log in

Filter feature selection methods for text classification: a review

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Filter feature selection methods are utilized to select discriminative terms from high-dimensional text data to improve text classification performance and reduce computational costs. This paper aims to provide a comprehensive systematic review of existing filter feature selection methods for text classification. Firstly, we briefly discuss text classification based on filter feature selection. Secondly, we present a detailed discussion on mathematical designs, effectiveness and complexity of existing filter feature selection methods of different methodologies (supervised methods, unsupervised methods and hybrid methods). In addition, a certain number of benchmark datasets for evaluating performance of filter feature selection methods in text classification are also discussion. Finally, we provide future directions in filter feature selection, along with conclusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

  1. Abiodun EO, Alabdulatif A, Abiodun OI et al (2021) A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Comput Applic 33(22):15091–15118

    Google Scholar 

  2. Abualigah LM, Khader AT, Al-Betar MA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36

    Google Scholar 

  3. Agarwal S, Godbole S, Punjani D et al (2007) How much noise is too much: a study in automatic text classification. In: Proceedings of the 7th IEEE International Conference on Data Mining, Omaha, pp 3–12

  4. Aggarwal CC (2015) Mining text data. Data Mining, Springer International Publishing, Switzerland, pp 429-455

  5. Agnihotri D, Verma K, Tripathi P (2017) Variable Global Feature Selection Scheme for automatic classification of text documents. Expert Syst Appl 81:268–281

    Google Scholar 

  6. Agnihotri D, Verma K, Tripathi P (2019) Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl Intell 49(4):1597–1619

    Google Scholar 

  7. Ahmad SR, Abu Bakar A, Yaakub MR (2019) A review of feature selection techniques in sentiment analysis. Intell Data Anal 23(1):159–189

    Google Scholar 

  8. Altınel B, Ganiz MC, Diri B (2015) A corpus-based semantic kernel for text classification by using meaning values of terms. Eng Applic Artif Intell 43(1):54–66

    Google Scholar 

  9. Amazal H, Kissi M (2021) A new big data feature selection approach for text classification. Sci Programm 2021:1–10

  10. Armi L, Fekri-Ershad S (2019) Texture image analysis and texture classification methods - a review. International Online Journal of Image Processing and Pattern Recognition 2(1):1–29

  11. Armi L, Fekri-Ershad S (2019) Texture image Classification based on improved local Quinary patterns. Multimedia Tools and Applicationis 78(14):18995–19018

    Google Scholar 

  12. Ashokkumar P, Shankar GS, Srivastava G et al (2021) A Two-stage Text Feature Selection Algorithm for Improving Text Classification. ACM Trans Asian Low-Resource Language Inform Process 20(3):1–19

    Google Scholar 

  13. Asim M, Javaed K, Rehman A et al (2021) A new feature selection metric for text classification: eliminating the need for a separate pruning stage. Int J Mach Learn Cybern 12(9):2461–2478

    Google Scholar 

  14. Azam N, Yao JT (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39(5):4760–4768

    Google Scholar 

  15. Bahassine S, Madani A, Al-Sarem M (2020) Feature selection using an improved Chi-square for Arabic text classification. J King Saud Univ-Comput Inform Sci 32(2):225–231

    Google Scholar 

  16. Bakus J, Kamel MS (2006) Higher order feature selection for text classification. Knowl Inf Syst 9(4):468–491

  17. Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, pp 918–925

  18. Bharti KK, Singh PK (2014) A survey on filter techniques for feature selection in text mining. In: Proceedings of the 2nd International Conference on Soft Computing for Problem Solving (SocProS), JK Lakshmipat Univ, Jaipur, pp 1545–1559

  19. Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):529–542

    Google Scholar 

  20. Bhatti UA, Zeeshan Z, Nizamani MM et al (2021) Assessing the change of ambient air quality patterns in Jiangsu Province of China pre-to post-COVID-19. Chemosphere 288:1–10

  21. Bhatti UA, Yu ZY, Hasnain A et al (2022) Evaluating the impact of roads on the diversity pattern and density of trees to improve the conservation of species. Environ Sci Pollut Res 29(10):14780–14790

  22. Bhatti UA, Yu ZY, Chanussot J et al (2022) Local similarity-based spatial-spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15

  23. Bhatti UA, Huang MX, Wang H et al (2018) Recommendation system for immunization coverage and monitoring. Human Vaccin Immunother 14(1):165–171

    Google Scholar 

  24. Bhatti UA, Huang MX, Wu D et al (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst 13(3):329–351

    Google Scholar 

  25. Campos LMD, Romero AE (2008) Bayesian network models for hierarchical text classification from a thesaurus. Int J Approx Reason 50(7):932–944

    Google Scholar 

  26. Chao S, Cai J, Yang S et al (2016) A clustering based feature selection method using feature information distance for text data. International conference on intelligent computing. In: Proceedings of the 12th International Conference on Intelligent Computing (ICIC), Lanzhou, pp 122–132

  27. Chen K, Gao S, Zhu Y et al (2015) Music genres classification using text categorization method. In: Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing, Victoria, pp 221–224

  28. Chen J, Huang H, Tian S et al (2009) Feature selection for text classification with naïve bayes. Expert Syst Appl 36(3):5432–5435

  29. Ciarelli PM, Oliveira E (2009) Agglomeration and elimination of terms for dimensionality reduction. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, Univ Pisa, Pisa, pp 547–552

  30. Ciarelli PM, Salles E.OT, Oliveira E (2011) An evolving system based on probabilistic neural network. In: Proceedings of the 2010 Eleventh Brazilian Symposium on Neural Networks (SBRN 2010), Sao Paulo, pp 182–187

  31. De Stefano C, Fontanella F, di Freca AS (2017) Feature selection in high dimensional data by a filter-based genetic algorithm. In: Proceedings of the 20th European Conference on the Applications of Evolutionary Computation (EvoApplications), Amsterdam, pp 506–521

  32. Dhillon I, Kogan J, Nicholas C (2004) Feature selection and document clustering. In: Proceedings of Text Mining Workshop, Arlington, pp 73–100

  33. Fei G, Liu B (2015) Social media text classification under negative covariate shift. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, pp 2347–2356

  34. Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in r. J Stat Softw 25(5):1–54

    Google Scholar 

  35. Fekri-Ershad S (2020) Bark texture classification using improved local ternary patterns and multilayer neural network. Expert Syst Applic 158:1–8

  36. Feng G, An B, Yang F et al (2017) Relevance popularity: a term event model based feature selection scheme for text classification. Plos One 12(4):1–15

  37. Ferreira AJ, Figueiredo MAT (2012) Efficient feature selection filters for high-dimensional data. Pattern Recogn Lett 33(13):1794–1804

    Google Scholar 

  38. Francesconi E, Passerini A (2007) Automatic classification of provisions in legislative texts. Artif Intell Law 15(1):1–17

    Google Scholar 

  39. Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-lda models. Soft Comput 19(1):29–38

    Google Scholar 

  40. Ganesan K, Zhai CX (2012) Opinion-based entity ranking. Inf Retrieval 15(2):116–150

    Google Scholar 

  41. Gao Z, Xu Y, Meng F et al (2014) Improved information gain-based feature selection for text categorization. In: Proceedings of the 2014 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE), Aalborg, pp 1–5

  42. Garla V, Taylor C, Brandt C (2013) Semi-supervised clinical text classification with laplacian svms: an application to cancer case management. J Biomed Inform 46(5):869–875

    Google Scholar 

  43. Ghosh S, Hassan SKK, Khan AH et al (2022) Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm. Soft Comput 26(2):891–909

  44. Hai NT, Le TD, Nghia NH et al (2015) A hybrid feature selection method for vietnamese text classification. In: Proceedings of the Seventh International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, pp 91–96

  45. Han J, Kamber M, Pei J (2011) Data Mining: Concepts and Techniques (2nd Edn). Morgan Kaufmann, Cambridge, pp 297, 302–304, 310–311

  46. Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Lect Notes Comput Sci 910:424–431

  47. Hurtado J, Mendoza M, Ñanculef R (2016) Boosting SpLSA for text classification. In: Proceedings of the 21st Iberoamerican Congress on Pattern Recognition (CIARP), Lima, pp 142–149

  48. Javed K, Maruf S, Babri HA (2015) A two-stage markov blanket based feature selection algorithm for text classification. Neurocomputing 157:91–104

    Google Scholar 

  49. Jin J, Yan X, Yu Y et al (2013) Service failure complaints identification in social media: a text classification approach. In: Proceedings of the 2013 International Conference on Information Systems (ICIS 2013), Milano, pp 1–11

  50. Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of Machine Learning: ECML-98 10th European Conference on Machine Learning. Proceedings, Chemnitz, pp 137–142

  51. Kayhan N, Fekri-Ershad S (2021) Content based image retrieval based on weighted fusion of texture and color features derived from modified local binary patterns and local neighborhood difference patterns. Multimedia Tools Applic 80(21–23):32763–32790

    Google Scholar 

  52. Khaleel MI, Hmeidi II, Najadat HM (2016) An automatic text classification system based on genetic algorithm. In: Proceedings of the 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016, Union City, pp 1–7

  53. Kilinç D, Özçift A, Bozyiğit F et al (2015) Ttc-3600: a new benchmark dataset for turkish text categorization. J Inf Sci 43(2):174–185

    Google Scholar 

  54. Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp 170–178

  55. Kumar Dubey V, Kumar Saxena A (2016) Cosine similarity based filter technique for feature selection. In: Proceedings of the 2016 International Conference on Control, Computing, Communication and Materials (ICCCCM), Allahbad, pp 1–6

  56. Kumaran G, Allan J (2004) Text classification and named entities for new event detection. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Sheffield, pp 297–304

  57. Labani M, Moradi P, Ahmadizar P et al (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37

    Google Scholar 

  58. Laboreiro G, Sarmento L, Teixeira J et al (2010) Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the fourth workshop on Analytics for noisy unstructured text data, Toronto, Ontario, pp 81–88

  59. Lamirel JC, Cuxac P, Chivukula AS, Hajlaoui K (2015) Optimizing text classification through efficient feature selection based on quality metric. J Intell Inform Syst 45(3):1–18

    Google Scholar 

  60. Lee LH, Wan CH, Rajkumar R, Isa D (2012) An enhanced support vector machine classification framework by using euclidean distance function for text document categorization. Appl Intell 37(1):80–99

    Google Scholar 

  61. Lehnert W, Soderland S, Aronow D et al (1995) Inductive text classification for medical applications. J Exp Theor Artif Intell 7(1):49–80

  62. Lei S (2012) A feature selection method based on information gain and genetic algorithm. In: Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE 2012), Hangzhou, pp 355–358

  63. Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of Machine Learning: ECML-98 10th European Conference on Machine Learning. Proceedings, Chemnitz, pp 4–15

  64. Li S, Xia R, Zong C et al (2009) A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, pp 692–700

  65. Li Z, Lu W, Sun Z et al (2017) A parallel feature selection method study for text classification[J]. Neural Comput Appl 28(1):S513–S524

    Google Scholar 

  66. Lim H, Kim DW (2020) Generalized term similarity for feature selection in text classification using quadratic programming. Entropy 22(4):1–12

  67. Liu Y, Ju SG, Wang JF et al (2020) A new feature selection method for text classification based on independent feature space search. Math Problems Eng 2020:1–14

  68. Liu L, Kang J, Yu J et al (2005) A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, Wuhan, pp 597–601

  69. Liu CL, Hsaio WH, Lee CH et al (2017) Semi-supervised text classification with universum learning. IEEE Trans Cybern 46(2):462–473

    Google Scholar 

  70. Lu SH, Chiang DA, Keh HC et al (2010) Chinese text classification by the naïve bayes classifier and the associative classifier with multiple confidence threshold values. Knowl-Based Syst 23(6):598–604

    Google Scholar 

  71. Manne S, Kotha SK, Fatima SS (2012) Text categorization with K-Nearest neighbor approach. In: Proceedings of the 1st International Conference on Information Systems Design and Intelligent Applications (INDIA 2012), Visakhapatnam, pp 413–420

  72. Marini F, Walczak B (2015) Particle swarm optimization (pso). a tutorial. Chemom Intell Lab Syst 149:153–165

    Google Scholar 

  73. Martín-Valdivia MT, Ureña-López LA, García-Vega M (2007) The learning vector quantization algorithm applied to automatic text classification tasks. Neural Netw Off J Int Neural Netw Soc 20(6):748–756

    Google Scholar 

  74. Miltsakaki E, Troutt A (2008) Real-Time web text classification and analysis of reading difficulty. In: Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, Columbus, pp 89–97

  75. Mladenić D (2005) Feature selection for dimensionality reduction. In: Proceedings of Workshop on Subspace, Latent Structure and Feature Selection, Bohinj, pp 84–102

  76. Mladenić D, Grobelnik M (2003) Feature selection on hierarchy of web documents. Decis Support Syst 35(1):45–87

    Google Scholar 

  77. Mustafa AM, Rashid TA (2017) Kurdish stemmer pre-processing steps for improving information retrieval. J Inf Sci 44(1):15–27

    Google Scholar 

  78. Nigam K, Mccallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2):103–134

  79. Noushahr HG, Ahmadi S (2016) Multitask learning for text classification with deep neural networks. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, pp 119–133

  80. Novovičová J, Malik A (2005) Information-theoretic feature selection algorithms for text classification. In: Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN 2005), Montreal, pp 3272–3277

  81. Ogura H, Amano H, Kondo M (2009) Feature selection with a measure of deviations from Poisson in text categorization. Expert Syst Appl 36(3):6826–6832

    Google Scholar 

  82. Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Applic 57(1):232–247

    Google Scholar 

  83. Pandey U, Chakravarty S (2010) A survey on text classification techniques for E-mail Filtering. In: Proceedings of the 2nd International Conference on Machine Learning and Computing (ICMLC 2010), Bangalore, pp 32–36

  84. Parlak B, Uysal AK (2021) A novel filter feature selection method for text classification: extensive feature selector. J Inform Sci 49(1):59–78

  85. Pinheiro RHW, Cavalcanti GDC, Ren TI (2015) Data-driven global-ranking local feature selection methods for text categorization. Expert Syst Appl 42(4):1941–1949

    Google Scholar 

  86. Pintas JT, Fernandes LAF, Garcia ACB (2021) Feature selection methods for text classification: a systematic literature review. Artif Intell Rev 54(8):6149–6200

    Google Scholar 

  87. Rajpoot AK, Nand P, Abidi AI (2021) A comprehensive survey on effective feature selection approaches for text sentiment classification process. In: Proceedings of the 11th International Conference on Cloud Computing, Data Science and Engineering (Confluence), Amity Univ, Amity Sch Engn & Technol, Electr Network, pp 971–977

  88. Rashid TA, Mustafa AM, Saeed AM (2017) Automatic Kurdish text classification using KDC 4007 dataset. In: Proceedings of the 5th International Conference on Emerging Internetworking, Data and Web Technologies (EIDWT), Wuhan, pp 187–198

  89. Rashid TA, Mustafa AM, Saeed AM (2017) A robust categorization system for kurdish sorani text documents. Inf Technol J 16(1):27–34

    Google Scholar 

  90. Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manage 53(2):473–489

    Google Scholar 

  91. Ritter A, Clark S, Mausam et al (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, pp 1524–1534

  92. Rosé CP, Roque A, Bhembe D et al (2003) A hybrid text classification approach for analysis of student essays. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing, Edmonton, pp 68–75

  93. Sahin DO, Kilic E (2019) Two new feature selection metrics for text classification. Automatika 60(2):162–171

    Google Scholar 

  94. Sanchez-Pi N, Martí L, Garcia ACB (2014) Text classification techniques in oil industry applications. Adv Intell Syst Comput 239:211–220

    Google Scholar 

  95. Sanchez-Pi N, Martí L, Garcia ACB (2015) Improving ontology-based text classification: an occupational health and security application. J Appl Log 17:48–58

    MathSciNet  Google Scholar 

  96. Sebastiani F (2001) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47

    MathSciNet  Google Scholar 

  97. Shah FP, Patel V (2016) A review on feature selection and feature extraction for text classification. In: Proceedings of IEEE International Conference on Wireless Communications, Signal Processing and Networking (WISPNET), Dept Elect & Commun Engn, Chennai, pp 2264–2268

  98. Shang W, Huang H, Zhu H et al (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5

    Google Scholar 

  99. Shang C, Li M, Feng S et al (2013) Feature selection via maximizing global information gain for text classification. Knowl-Based Syst 54(4):298–309

    Google Scholar 

  100. Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Software Eng 34(4):825–832

    Google Scholar 

  101. Sriram B, Fuhry D, Demir E et al (2010) Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, pp 841–842

  102. Szarvas G (2008) Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, pp 281–289

  103. Tan AH, Ridge K, Labs D (1999) Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD Workshop on Knowledge Discovery from Advanced Databases, Beijing, pp 65–70

  104. Tang B, Kay S, He H (2016) Toward optimal feature selection in naive bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521

    Google Scholar 

  105. Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216

    Google Scholar 

  106. Thirumoorthy K, Muneeswaran K (2021) Feature selection using hybrid poor and rich optimization algorithm for text classification. Pattern Recogn Lett 147:63–70

    Google Scholar 

  107. Tommasel A, Godoy D (2018) A Social-aware online short-text feature selection technique for social media[J]. Inform Fusion 40:1–17

    Google Scholar 

  108. Torii M, Yin L, Nguyen T et al (2011) An exploratory study of a text classification framework for internet-based surveillance of emerging epidemics. Int J Med Inform 80(1):56–66

  109. Tutkan M, Ganiz MC, Akyokuş S (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manage 52(5):885–910

    Google Scholar 

  110. Uchida Y (2008) A simple proof of the geometric-arithmetic mean inequality. J Inequal Pure Appl Math 9(2):1–2

    MathSciNet  Google Scholar 

  111. Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032

    Google Scholar 

  112. Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92

    Google Scholar 

  113. Uysal AK (2018) On Two-Stage Feature Selection Methods for Text Classification[J]. IEEE Access 6:43233–43251

    MathSciNet  Google Scholar 

  114. Uysal AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl-Based Syst 36(6):226–235

    Google Scholar 

  115. Verma I, Dey L, Srinivasan RS et al (2015). Event detection from business news. In: Proceedings of Pattern Recognition and Machine Intelligence. 6th International Conference, PReMI 2015, Warsaw, pp 575–585

  116. Villatoro-Tello E, Anguiano E, Montes-Y-Gómez M et al (2016) Enhancing semi-supevised text classification using document summaries. In: Proceedings of the 15th Ibero-American Conference on Artificial Intelligence (AI), San Jose, pp 115–126

  117. Wang H, Hong M (2015) Distance variance score: an efficient feature selection method in text classification. Math Probl Eng 2015:1–10

    Google Scholar 

  118. Wang H, Hong M (2017) Probability and Variance Score: an Efficient Supervised Feature Selection Method for Text Classification. J Residuals Sci Technol 14(3):218–232

    MathSciNet  Google Scholar 

  119. Wang H, Hong M (2019) Supervised Hebb rule based feature selection for text classification. Inf Process Manage 56(1):167–191

    Google Scholar 

  120. Wang G, Lochovsky FH (2004) Feature selection with conditional mutual information maximin in text categorization. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, DC, pp 342–349

  121. Wang F, Li XX, Huang XT et al (2016) Improved document feature selection with categorical parameter for text classification. In: Proceedings of the 2nd International Conference on Mobile, Secure and Programmable Networking (MSPN), Paris, pp 86–98

  122. Wang D, Zhang H, Liu R, Lv W, Wang D (2014) T-test feature selection approach based on term frequency for text categorization. Pattern Recogn Lett 45(1):1–10

    Google Scholar 

  123. Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89

    Google Scholar 

  124. Wang H, Hong M, Raymond YK (2019) Utility-based feature selection for text classification. Knowl Inf Syst 61(1):197–226

    Google Scholar 

  125. Wei G, Agnihotri L, Dimitrova N (2000) TV program classification based on face and text processing. In: Proceedings of the 1st IEEE International Conference on Multimedia and Expo (ICME2000), New York, pp 1345–1348

  126. Wiener E, Pedersen JO, Weigend AS (1995) A neural network approach to topic spotting. A neural network approach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR '95), Las Vegas, pp 317–332

  127. Wiratunga N, Lothian R, Massie S (2006) Unsupervised Feature Selection for Text Data. Adv Case-Based Reason Proceed 4106:340–354

    Google Scholar 

  128. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques (4th en). Morgan Kaufmann, Cambridge, p 314

  129. Wu L, Wang YB, Zhang SY (2017) Fusing gini index and term frequency for text feature selection. In: Proceedings of IEEE 3rd International Conference on Multimedia Big Data (BigMM), Laguna Hills, pp 280–283

  130. Xu Y Chen L. (2010) Term-frequency based feature selection methods for text categorization. In: Proceedings of 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010), Shenzhen, pp 280–283

  131. Xu Y, Jones G, Li J et al (2007) A study on mutual information-based feature selection for text categorization. J Comput Inform Syst 3(3):1007–1012

    Google Scholar 

  132. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp 412–420

  133. Zhang W, Tang X, Yoshida T (2015) Tesc: an approach to text classification using semi-supervised clustering. Knowl-Based Syst 75:152–160

    Google Scholar 

  134. Zhen Z, Wang H, Han L et al (2011) Categorical document frequency based feature selection for text categorization. In: Proceedings of 2011 International Conference on Information Technology, Computer Engineering and Management Sciences (ICM 2011), Nanjing, pp 65–68

  135. Zheng Z (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newsl 6(1):80–89

    Google Scholar 

  136. Zhou HF, Ma YM, Li X (2021) Feature selection based on term frequency deviation rate for text classification. Appl Intell 51(6):3255–3274

    Google Scholar 

  137. Zhu HD, Zhao XH, Zhong Y (2009) Feature selection method combined optimized document frequency with improved RBF network. In: Proceedings of the 5th International Conference on Advanced Data Mining and Applications, Beijing, pp 796–803

  138. Zu C, Zhu L, Zhang D (2017) Iterative sparsity score for feature selection and its extension for multimodal data. Neurocomputing 259:146–153

Download references

Funding

This research was supported by Project of National Nature Science Foundation of China, Grant No. 71731006; the Fundamental Research Funds for Guangdong Natural Science Foundation, Grant No. 2022A1515011848; Guangzhou Philosophy and Social Science, Grant No. 2020GZYB04; Guangdong Philosophy and Social Science, Grant No. GD22YYJ15.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wang Heyong.

Ethics declarations

Conflict of Interests

The authors declare that there are no conflicts of interest with this research article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ming, H., Heyong, W. Filter feature selection methods for text classification: a review. Multimed Tools Appl 83, 2053–2091 (2024). https://doi.org/10.1007/s11042-023-15675-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15675-5

Keywords

Navigation