Abstract
In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, C., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer, Boston (2012). ISBN 978-1-4614-3222-7
Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)
Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42, 3105–3114 (2015)
Brahim, A.B., Limam, M.: Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. (2015). https://doi.org/10.1007/s11634-017-0285-y
Dadaneh, B.Z., Markid, H.Y., Zakerolhosseini, A.: Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst. Appl. 53, 27–42 (2016)
Feng, G., Guo, J., Jing, B.Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48, 283–302 (2012)
Fenga, G., Guoa, J., Jing, B.Y., Sunb, T.: Feature subset selection using Naive Bayes for text classification. Pattern Recogn. Lett. 65, 109–115 (2015)
Ghareb, S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)
Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comp. Sci. 20(2), 1296–1311 (2012)
Jalilvanda, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017)
Jiang, L., Li, C., Wang, S., Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)
Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)
Kumar, G., Kumar, K.: The use of artificial-intelligence-based ensembles for intrusion detection: a review. Appl. Comput. Intell. Soft Comput. 2012, 1–20 (2012)
Li, Y.H., Jain, A.K.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)
Meena, M.J., Chandran, K.R., Karthik, A., Samuel, A.V.: An enhanced ACO algorithm to select features for text categorization and its parallelization. Expert Syst. Appl. 39, 5861–5871 (2012)
Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 43, 117–130 (2016)
Seijo-Pardo, B., Porto-DÃaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous & heterogeneous approach. Knowl. Based Syst. 118, 124–139 (2017)
Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)
Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)
Sarkar, S.D., Goswami, S., Agarwal, A., Aktar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 1–10 (2014)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Shang, C., Li, M., Feng, S., Jiang, Q., Fan, J.: Feature selection via maximizing global information gain for text classification. Knowl. Based Syst. 54, 298–309 (2013)
Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)
Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)
Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)
Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for Naive Bayes text classifiers. Knowl. Based Syst. 100, 137–144 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R. (2018). Ensemble of Feature Selection Methods for Text Classification: An Analytical Study. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-76348-4_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)