Filter feature selection methods for text classification: a review

Ming, Hong; Heyong, Wang

doi:10.1007/s11042-023-15675-5

Filter feature selection methods for text classification: a review

Published: 11 May 2023

Volume 83, pages 2053–2091, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hong Ming¹ &
Wang Heyong¹

471 Accesses
1 Altmetric
Explore all metrics

Abstract

Filter feature selection methods are utilized to select discriminative terms from high-dimensional text data to improve text classification performance and reduce computational costs. This paper aims to provide a comprehensive systematic review of existing filter feature selection methods for text classification. Firstly, we briefly discuss text classification based on filter feature selection. Secondly, we present a detailed discussion on mathematical designs, effectiveness and complexity of existing filter feature selection methods of different methodologies (supervised methods, unsupervised methods and hybrid methods). In addition, a certain number of benchmark datasets for evaluating performance of filter feature selection methods in text classification are also discussion. Finally, we provide future directions in filter feature selection, along with conclusion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Filter Techniques for Feature Selection in Text Mining

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Article 22 June 2023

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

References

Abiodun EO, Alabdulatif A, Abiodun OI et al (2021) A systematic review of emerging feature selection optimization methods for optimal text classification: the present state and prospective opportunities. Neural Comput Applic 33(22):15091–15118
Google Scholar
Abualigah LM, Khader AT, Al-Betar MA (2017) Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Syst Appl 84:24–36
Google Scholar
Agarwal S, Godbole S, Punjani D et al (2007) How much noise is too much: a study in automatic text classification. In: Proceedings of the 7th IEEE International Conference on Data Mining, Omaha, pp 3–12
Aggarwal CC (2015) Mining text data. Data Mining, Springer International Publishing, Switzerland, pp 429-455
Agnihotri D, Verma K, Tripathi P (2017) Variable Global Feature Selection Scheme for automatic classification of text documents. Expert Syst Appl 81:268–281
Google Scholar
Agnihotri D, Verma K, Tripathi P (2019) Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl Intell 49(4):1597–1619
Google Scholar
Ahmad SR, Abu Bakar A, Yaakub MR (2019) A review of feature selection techniques in sentiment analysis. Intell Data Anal 23(1):159–189
Google Scholar
Altınel B, Ganiz MC, Diri B (2015) A corpus-based semantic kernel for text classification by using meaning values of terms. Eng Applic Artif Intell 43(1):54–66
Google Scholar
Amazal H, Kissi M (2021) A new big data feature selection approach for text classification. Sci Programm 2021:1–10
Armi L, Fekri-Ershad S (2019) Texture image analysis and texture classification methods - a review. International Online Journal of Image Processing and Pattern Recognition 2(1):1–29
Armi L, Fekri-Ershad S (2019) Texture image Classification based on improved local Quinary patterns. Multimedia Tools and Applicationis 78(14):18995–19018
Google Scholar
Ashokkumar P, Shankar GS, Srivastava G et al (2021) A Two-stage Text Feature Selection Algorithm for Improving Text Classification. ACM Trans Asian Low-Resource Language Inform Process 20(3):1–19
Google Scholar
Asim M, Javaed K, Rehman A et al (2021) A new feature selection metric for text classification: eliminating the need for a separate pruning stage. Int J Mach Learn Cybern 12(9):2461–2478
Google Scholar
Azam N, Yao JT (2012) Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst Appl 39(5):4760–4768
Google Scholar
Bahassine S, Madani A, Al-Sarem M (2020) Feature selection using an improved Chi-square for Arabic text classification. J King Saud Univ-Comput Inform Sci 32(2):225–231
Google Scholar
Bakus J, Kamel MS (2006) Higher order feature selection for text classification. Knowl Inf Syst 9(4):468–491
Basu T, Murthy CA (2012) Effective text classification by a supervised feature selection approach. In: Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), Brussels, Belgium, pp 918–925
Bharti KK, Singh PK (2014) A survey on filter techniques for feature selection in text mining. In: Proceedings of the 2nd International Conference on Soft Computing for Problem Solving (SocProS), JK Lakshmipat Univ, Jaipur, pp 1545–1559
Bharti KK, Singh PK (2013) A two-stage unsupervised dimension reduction method for text clustering. J Comput Sci 5(2):529–542
Google Scholar
Bhatti UA, Zeeshan Z, Nizamani MM et al (2021) Assessing the change of ambient air quality patterns in Jiangsu Province of China pre-to post-COVID-19. Chemosphere 288:1–10
Bhatti UA, Yu ZY, Hasnain A et al (2022) Evaluating the impact of roads on the diversity pattern and density of trees to improve the conservation of species. Environ Sci Pollut Res 29(10):14780–14790
Bhatti UA, Yu ZY, Chanussot J et al (2022) Local similarity-based spatial-spectral fusion hyperspectral image classification with deep cnn and gabor filtering. IEEE Trans Geosci Remote Sens 60:1–15
Bhatti UA, Huang MX, Wang H et al (2018) Recommendation system for immunization coverage and monitoring. Human Vaccin Immunother 14(1):165–171
Google Scholar
Bhatti UA, Huang MX, Wu D et al (2019) Recommendation system using feature extraction and pattern recognition in clinical care systems. Enterprise Inform Syst 13(3):329–351
Google Scholar
Campos LMD, Romero AE (2008) Bayesian network models for hierarchical text classification from a thesaurus. Int J Approx Reason 50(7):932–944
Google Scholar
Chao S, Cai J, Yang S et al (2016) A clustering based feature selection method using feature information distance for text data. International conference on intelligent computing. In: Proceedings of the 12th International Conference on Intelligent Computing (ICIC), Lanzhou, pp 122–132
Chen K, Gao S, Zhu Y et al (2015) Music genres classification using text categorization method. In: Proceedings of the 8th IEEE Workshop on Multimedia Signal Processing, Victoria, pp 221–224
Chen J, Huang H, Tian S et al (2009) Feature selection for text classification with naïve bayes. Expert Syst Appl 36(3):5432–5435
Ciarelli PM, Oliveira E (2009) Agglomeration and elimination of terms for dimensionality reduction. In: Proceedings of the 9th International Conference on Intelligent Systems Design and Applications, Univ Pisa, Pisa, pp 547–552
Ciarelli PM, Salles E.OT, Oliveira E (2011) An evolving system based on probabilistic neural network. In: Proceedings of the 2010 Eleventh Brazilian Symposium on Neural Networks (SBRN 2010), Sao Paulo, pp 182–187
De Stefano C, Fontanella F, di Freca AS (2017) Feature selection in high dimensional data by a filter-based genetic algorithm. In: Proceedings of the 20th European Conference on the Applications of Evolutionary Computation (EvoApplications), Amsterdam, pp 506–521
Dhillon I, Kogan J, Nicholas C (2004) Feature selection and document clustering. In: Proceedings of Text Mining Workshop, Arlington, pp 73–100
Fei G, Liu B (2015) Social media text classification under negative covariate shift. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, pp 2347–2356
Feinerer I, Hornik K, Meyer D (2008) Text mining infrastructure in r. J Stat Softw 25(5):1–54
Google Scholar
Fekri-Ershad S (2020) Bark texture classification using improved local ternary patterns and multilayer neural network. Expert Syst Applic 158:1–8
Feng G, An B, Yang F et al (2017) Relevance popularity: a term event model based feature selection scheme for text classification. Plos One 12(4):1–15
Ferreira AJ, Figueiredo MAT (2012) Efficient feature selection filters for high-dimensional data. Pattern Recogn Lett 33(13):1794–1804
Google Scholar
Francesconi E, Passerini A (2007) Automatic classification of provisions in legislative texts. Artif Intell Law 15(1):1–17
Google Scholar
Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-lda models. Soft Comput 19(1):29–38
Google Scholar
Ganesan K, Zhai CX (2012) Opinion-based entity ranking. Inf Retrieval 15(2):116–150
Google Scholar
Gao Z, Xu Y, Meng F et al (2014) Improved information gain-based feature selection for text categorization. In: Proceedings of the 2014 4th International Conference on Wireless Communications, Vehicular Technology, Information Theory and Aerospace & Electronic Systems (VITAE), Aalborg, pp 1–5
Garla V, Taylor C, Brandt C (2013) Semi-supervised clinical text classification with laplacian svms: an application to cancer case management. J Biomed Inform 46(5):869–875
Google Scholar
Ghosh S, Hassan SKK, Khan AH et al (2022) Application of texture-based features for text non-text classification in printed document images with novel feature selection algorithm. Soft Comput 26(2):891–909
Hai NT, Le TD, Nghia NH et al (2015) A hybrid feature selection method for vietnamese text classification. In: Proceedings of the Seventh International Conference on Knowledge and Systems Engineering (KSE), Ho Chi Minh City, pp 91–96
Han J, Kamber M, Pei J (2011) Data Mining: Concepts and Techniques (2nd Edn). Morgan Kaufmann, Cambridge, pp 297, 302–304, 310–311
Han EH, Karypis G (2000) Centroid-based document classification: analysis and experimental results. Lect Notes Comput Sci 910:424–431
Hurtado J, Mendoza M, Ñanculef R (2016) Boosting SpLSA for text classification. In: Proceedings of the 21st Iberoamerican Congress on Pattern Recognition (CIARP), Lima, pp 142–149
Javed K, Maruf S, Babri HA (2015) A two-stage markov blanket based feature selection algorithm for text classification. Neurocomputing 157:91–104
Google Scholar
Jin J, Yan X, Yu Y et al (2013) Service failure complaints identification in social media: a text classification approach. In: Proceedings of the 2013 International Conference on Information Systems (ICIS 2013), Milano, pp 1–11
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of Machine Learning: ECML-98 10th European Conference on Machine Learning. Proceedings, Chemnitz, pp 137–142
Kayhan N, Fekri-Ershad S (2021) Content based image retrieval based on weighted fusion of texture and color features derived from modified local binary patterns and local neighborhood difference patterns. Multimedia Tools Applic 80(21–23):32763–32790
Google Scholar
Khaleel MI, Hmeidi II, Najadat HM (2016) An automatic text classification system based on genetic algorithm. In: Proceedings of the 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016, Union City, pp 1–7
Kilinç D, Özçift A, Bozyiğit F et al (2015) Ttc-3600: a new benchmark dataset for turkish text categorization. J Inf Sci 43(2):174–185
Google Scholar
Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp 170–178
Kumar Dubey V, Kumar Saxena A (2016) Cosine similarity based filter technique for feature selection. In: Proceedings of the 2016 International Conference on Control, Computing, Communication and Materials (ICCCCM), Allahbad, pp 1–6
Kumaran G, Allan J (2004) Text classification and named entities for new event detection. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Sheffield, pp 297–304
Labani M, Moradi P, Ahmadizar P et al (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Google Scholar
Laboreiro G, Sarmento L, Teixeira J et al (2010) Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the fourth workshop on Analytics for noisy unstructured text data, Toronto, Ontario, pp 81–88
Lamirel JC, Cuxac P, Chivukula AS, Hajlaoui K (2015) Optimizing text classification through efficient feature selection based on quality metric. J Intell Inform Syst 45(3):1–18
Google Scholar
Lee LH, Wan CH, Rajkumar R, Isa D (2012) An enhanced support vector machine classification framework by using euclidean distance function for text document categorization. Appl Intell 37(1):80–99
Google Scholar
Lehnert W, Soderland S, Aronow D et al (1995) Inductive text classification for medical applications. J Exp Theor Artif Intell 7(1):49–80
Lei S (2012) A feature selection method based on information gain and genetic algorithm. In: Proceedings of the 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE 2012), Hangzhou, pp 355–358
Lewis DD (1998) Naive (Bayes) at forty: the independence assumption in information retrieval. In: Proceedings of Machine Learning: ECML-98 10th European Conference on Machine Learning. Proceedings, Chemnitz, pp 4–15
Li S, Xia R, Zong C et al (2009) A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, pp 692–700
Li Z, Lu W, Sun Z et al (2017) A parallel feature selection method study for text classification[J]. Neural Comput Appl 28(1):S513–S524
Google Scholar
Lim H, Kim DW (2020) Generalized term similarity for feature selection in text classification using quadratic programming. Entropy 22(4):1–12
Liu Y, Ju SG, Wang JF et al (2020) A new feature selection method for text classification based on independent feature space search. Math Problems Eng 2020:1–14
Liu L, Kang J, Yu J et al (2005) A comparative study on unsupervised feature selection methods for text clustering. In: Proceedings of International Conference on Natural Language Processing and Knowledge Engineering, Wuhan, pp 597–601
Liu CL, Hsaio WH, Lee CH et al (2017) Semi-supervised text classification with universum learning. IEEE Trans Cybern 46(2):462–473
Google Scholar
Lu SH, Chiang DA, Keh HC et al (2010) Chinese text classification by the naïve bayes classifier and the associative classifier with multiple confidence threshold values. Knowl-Based Syst 23(6):598–604
Google Scholar
Manne S, Kotha SK, Fatima SS (2012) Text categorization with K-Nearest neighbor approach. In: Proceedings of the 1st International Conference on Information Systems Design and Intelligent Applications (INDIA 2012), Visakhapatnam, pp 413–420
Marini F, Walczak B (2015) Particle swarm optimization (pso). a tutorial. Chemom Intell Lab Syst 149:153–165
Google Scholar
Martín-Valdivia MT, Ureña-López LA, García-Vega M (2007) The learning vector quantization algorithm applied to automatic text classification tasks. Neural Netw Off J Int Neural Netw Soc 20(6):748–756
Google Scholar
Miltsakaki E, Troutt A (2008) Real-Time web text classification and analysis of reading difficulty. In: Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, Columbus, pp 89–97
Mladenić D (2005) Feature selection for dimensionality reduction. In: Proceedings of Workshop on Subspace, Latent Structure and Feature Selection, Bohinj, pp 84–102
Mladenić D, Grobelnik M (2003) Feature selection on hierarchy of web documents. Decis Support Syst 35(1):45–87
Google Scholar
Mustafa AM, Rashid TA (2017) Kurdish stemmer pre-processing steps for improving information retrieval. J Inf Sci 44(1):15–27
Google Scholar
Nigam K, Mccallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2):103–134
Noushahr HG, Ahmadi S (2016) Multitask learning for text classification with deep neural networks. In: Proceedings of International Conference on Innovative Techniques and Applications of Artificial Intelligence, Cambridge, pp 119–133
Novovičová J, Malik A (2005) Information-theoretic feature selection algorithms for text classification. In: Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN 2005), Montreal, pp 3272–3277
Ogura H, Amano H, Kondo M (2009) Feature selection with a measure of deviations from Poisson in text categorization. Expert Syst Appl 36(3):6826–6832
Google Scholar
Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Applic 57(1):232–247
Google Scholar
Pandey U, Chakravarty S (2010) A survey on text classification techniques for E-mail Filtering. In: Proceedings of the 2nd International Conference on Machine Learning and Computing (ICMLC 2010), Bangalore, pp 32–36
Parlak B, Uysal AK (2021) A novel filter feature selection method for text classification: extensive feature selector. J Inform Sci 49(1):59–78
Pinheiro RHW, Cavalcanti GDC, Ren TI (2015) Data-driven global-ranking local feature selection methods for text categorization. Expert Syst Appl 42(4):1941–1949
Google Scholar
Pintas JT, Fernandes LAF, Garcia ACB (2021) Feature selection methods for text classification: a systematic literature review. Artif Intell Rev 54(8):6149–6200
Google Scholar
Rajpoot AK, Nand P, Abidi AI (2021) A comprehensive survey on effective feature selection approaches for text sentiment classification process. In: Proceedings of the 11th International Conference on Cloud Computing, Data Science and Engineering (Confluence), Amity Univ, Amity Sch Engn & Technol, Electr Network, pp 971–977
Rashid TA, Mustafa AM, Saeed AM (2017) Automatic Kurdish text classification using KDC 4007 dataset. In: Proceedings of the 5th International Conference on Emerging Internetworking, Data and Web Technologies (EIDWT), Wuhan, pp 187–198
Rashid TA, Mustafa AM, Saeed AM (2017) A robust categorization system for kurdish sorani text documents. Inf Technol J 16(1):27–34
Google Scholar
Rehman A, Javed K, Babri HA (2017) Feature selection based on a normalized difference measure for text classification. Inf Process Manage 53(2):473–489
Google Scholar
Ritter A, Clark S, Mausam et al (2011) Named entity recognition in tweets: an experimental study. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, pp 1524–1534
Rosé CP, Roque A, Bhembe D et al (2003) A hybrid text classification approach for analysis of student essays. In: Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing, Edmonton, pp 68–75
Sahin DO, Kilic E (2019) Two new feature selection metrics for text classification. Automatika 60(2):162–171
Google Scholar
Sanchez-Pi N, Martí L, Garcia ACB (2014) Text classification techniques in oil industry applications. Adv Intell Syst Comput 239:211–220
Google Scholar
Sanchez-Pi N, Martí L, Garcia ACB (2015) Improving ontology-based text classification: an occupational health and security application. J Appl Log 17:48–58
MathSciNet Google Scholar
Sebastiani F (2001) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
MathSciNet Google Scholar
Shah FP, Patel V (2016) A review on feature selection and feature extraction for text classification. In: Proceedings of IEEE International Conference on Wireless Communications, Signal Processing and Networking (WISPNET), Dept Elect & Commun Engn, Chennai, pp 2264–2268
Shang W, Huang H, Zhu H et al (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5
Google Scholar
Shang C, Li M, Feng S et al (2013) Feature selection via maximizing global information gain for text classification. Knowl-Based Syst 54(4):298–309
Google Scholar
Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Software Eng 34(4):825–832
Google Scholar
Sriram B, Fuhry D, Demir E et al (2010) Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, pp 841–842
Szarvas G (2008) Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, pp 281–289
Tan AH, Ridge K, Labs D (1999) Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD Workshop on Knowledge Discovery from Advanced Databases, Beijing, pp 65–70
Tang B, Kay S, He H (2016) Toward optimal feature selection in naive bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521
Google Scholar
Tang X, Dai Y, Xiang Y (2019) Feature selection based on feature interactions with application to text categorization. Expert Syst Appl 120:207–216
Google Scholar
Thirumoorthy K, Muneeswaran K (2021) Feature selection using hybrid poor and rich optimization algorithm for text classification. Pattern Recogn Lett 147:63–70
Google Scholar
Tommasel A, Godoy D (2018) A Social-aware online short-text feature selection technique for social media[J]. Inform Fusion 40:1–17
Google Scholar
Torii M, Yin L, Nguyen T et al (2011) An exploratory study of a text classification framework for internet-based surveillance of emerging epidemics. Int J Med Inform 80(1):56–66
Tutkan M, Ganiz MC, Akyokuş S (2016) Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Inf Process Manage 52(5):885–910
Google Scholar
Uchida Y (2008) A simple proof of the geometric-arithmetic mean inequality. J Inequal Pure Appl Math 9(2):1–2
MathSciNet Google Scholar
Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032
Google Scholar
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92
Google Scholar
Uysal AK (2018) On Two-Stage Feature Selection Methods for Text Classification[J]. IEEE Access 6:43233–43251
MathSciNet Google Scholar
Uysal AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl-Based Syst 36(6):226–235
Google Scholar
Verma I, Dey L, Srinivasan RS et al (2015). Event detection from business news. In: Proceedings of Pattern Recognition and Machine Intelligence. 6th International Conference, PReMI 2015, Warsaw, pp 575–585
Villatoro-Tello E, Anguiano E, Montes-Y-Gómez M et al (2016) Enhancing semi-supevised text classification using document summaries. In: Proceedings of the 15th Ibero-American Conference on Artificial Intelligence (AI), San Jose, pp 115–126
Wang H, Hong M (2015) Distance variance score: an efficient feature selection method in text classification. Math Probl Eng 2015:1–10
Google Scholar
Wang H, Hong M (2017) Probability and Variance Score: an Efficient Supervised Feature Selection Method for Text Classification. J Residuals Sci Technol 14(3):218–232
MathSciNet Google Scholar
Wang H, Hong M (2019) Supervised Hebb rule based feature selection for text classification. Inf Process Manage 56(1):167–191
Google Scholar
Wang G, Lochovsky FH (2004) Feature selection with conditional mutual information maximin in text categorization. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management, Washington, DC, pp 342–349
Wang F, Li XX, Huang XT et al (2016) Improved document feature selection with categorical parameter for text classification. In: Proceedings of the 2nd International Conference on Mobile, Secure and Programmable Networking (MSPN), Paris, pp 86–98
Wang D, Zhang H, Liu R, Lv W, Wang D (2014) T-test feature selection approach based on term frequency for text categorization. Pattern Recogn Lett 45(1):1–10
Google Scholar
Wang S, Jiang L, Li C (2015) Adapting naive Bayes tree for text classification. Knowl Inf Syst 44(1):77–89
Google Scholar
Wang H, Hong M, Raymond YK (2019) Utility-based feature selection for text classification. Knowl Inf Syst 61(1):197–226
Google Scholar
Wei G, Agnihotri L, Dimitrova N (2000) TV program classification based on face and text processing. In: Proceedings of the 1st IEEE International Conference on Multimedia and Expo (ICME2000), New York, pp 1345–1348
Wiener E, Pedersen JO, Weigend AS (1995) A neural network approach to topic spotting. A neural network approach to topic spotting. In: Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR '95), Las Vegas, pp 317–332
Wiratunga N, Lothian R, Massie S (2006) Unsupervised Feature Selection for Text Data. Adv Case-Based Reason Proceed 4106:340–354
Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques (4th en). Morgan Kaufmann, Cambridge, p 314
Wu L, Wang YB, Zhang SY (2017) Fusing gini index and term frequency for text feature selection. In: Proceedings of IEEE 3rd International Conference on Multimedia Big Data (BigMM), Laguna Hills, pp 280–283
Xu Y Chen L. (2010) Term-frequency based feature selection methods for text categorization. In: Proceedings of 2010 Fourth International Conference on Genetic and Evolutionary Computing (ICGEC 2010), Shenzhen, pp 280–283
Xu Y, Jones G, Li J et al (2007) A study on mutual information-based feature selection for text categorization. J Comput Inform Syst 3(3):1007–1012
Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp 412–420
Zhang W, Tang X, Yoshida T (2015) Tesc: an approach to text classification using semi-supervised clustering. Knowl-Based Syst 75:152–160
Google Scholar
Zhen Z, Wang H, Han L et al (2011) Categorical document frequency based feature selection for text categorization. In: Proceedings of 2011 International Conference on Information Technology, Computer Engineering and Management Sciences (ICM 2011), Nanjing, pp 65–68
Zheng Z (2004) Feature selection for text categorization on imbalanced data. ACM SIGKDD Explor Newsl 6(1):80–89
Google Scholar
Zhou HF, Ma YM, Li X (2021) Feature selection based on term frequency deviation rate for text classification. Appl Intell 51(6):3255–3274
Google Scholar
Zhu HD, Zhao XH, Zhong Y (2009) Feature selection method combined optimized document frequency with improved RBF network. In: Proceedings of the 5th International Conference on Advanced Data Mining and Applications, Beijing, pp 796–803
Zu C, Zhu L, Zhang D (2017) Iterative sparsity score for feature selection and its extension for multimodal data. Neurocomputing 259:146–153

Download references

Funding

This research was supported by Project of National Nature Science Foundation of China, Grant No. 71731006; the Fundamental Research Funds for Guangdong Natural Science Foundation, Grant No. 2022A1515011848; Guangzhou Philosophy and Social Science, Grant No. 2020GZYB04; Guangdong Philosophy and Social Science, Grant No. GD22YYJ15.

Author information

Authors and Affiliations

Department of Electronic Business, South China University of Technology, Guangzhou, 510006, China
Hong Ming & Wang Heyong

Authors

Hong Ming
View author publications
You can also search for this author in PubMed Google Scholar
Wang Heyong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wang Heyong.

Ethics declarations

Conflict of Interests

The authors declare that there are no conflicts of interest with this research article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ming, H., Heyong, W. Filter feature selection methods for text classification: a review. Multimed Tools Appl 83, 2053–2091 (2024). https://doi.org/10.1007/s11042-023-15675-5

Download citation

Received: 26 May 2022
Revised: 29 August 2022
Accepted: 22 April 2023
Published: 11 May 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15675-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Filter feature selection methods for text classification: a review

Abstract

Access this article

Similar content being viewed by others

A Survey on Filter Techniques for Feature Selection in Text Mining

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Filter feature selection methods for text classification: a review

Abstract

Access this article

Similar content being viewed by others

A Survey on Filter Techniques for Feature Selection in Text Mining

Modified Pointwise Mutual Information-Based Feature Selection for Text Classification

Ensemble feature selection for single-label text classification: a comprehensive analytical study

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation