Advertisement

Ensemble of Feature Selection Methods for Text Classification: An Analytical Study

  • D. S. Guru
  • Mahamad Suhil
  • S. K. Pavithra
  • G. R. Priya
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)

Abstract

In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.

Keywords

Feature selection Ranking aggregation Feature evaluation function Text classification Document term matrix 

References

  1. Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, C., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)CrossRefGoogle Scholar
  2. Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer, Boston (2012). ISBN 978-1-4614-3222-7CrossRefGoogle Scholar
  3. Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)CrossRefGoogle Scholar
  4. Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42, 3105–3114 (2015)CrossRefGoogle Scholar
  5. Brahim, A.B., Limam, M.: Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. (2015).  https://doi.org/10.1007/s11634-017-0285-y
  6. Dadaneh, B.Z., Markid, H.Y., Zakerolhosseini, A.: Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst. Appl. 53, 27–42 (2016)CrossRefGoogle Scholar
  7. Feng, G., Guo, J., Jing, B.Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48, 283–302 (2012)CrossRefGoogle Scholar
  8. Fenga, G., Guoa, J., Jing, B.Y., Sunb, T.: Feature subset selection using Naive Bayes for text classification. Pattern Recogn. Lett. 65, 109–115 (2015)CrossRefGoogle Scholar
  9. Ghareb, S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)CrossRefGoogle Scholar
  10. Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comp. Sci. 20(2), 1296–1311 (2012)Google Scholar
  11. Jalilvanda, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017)CrossRefGoogle Scholar
  12. Jiang, L., Li, C., Wang, S., Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)CrossRefGoogle Scholar
  13. Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)CrossRefGoogle Scholar
  14. Kumar, G., Kumar, K.: The use of artificial-intelligence-based ensembles for intrusion detection: a review. Appl. Comput. Intell. Soft Comput. 2012, 1–20 (2012)CrossRefGoogle Scholar
  15. Li, Y.H., Jain, A.K.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)CrossRefzbMATHGoogle Scholar
  16. Meena, M.J., Chandran, K.R., Karthik, A., Samuel, A.V.: An enhanced ACO algorithm to select features for text categorization and its parallelization. Expert Syst. Appl. 39, 5861–5871 (2012)CrossRefGoogle Scholar
  17. Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 43, 117–130 (2016)CrossRefGoogle Scholar
  18. Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous & heterogeneous approach. Knowl. Based Syst. 118, 124–139 (2017)CrossRefGoogle Scholar
  19. Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)CrossRefGoogle Scholar
  20. Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)CrossRefGoogle Scholar
  21. Sarkar, S.D., Goswami, S., Agarwal, A., Aktar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 1–10 (2014)CrossRefGoogle Scholar
  22. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  23. Shang, C., Li, M., Feng, S., Jiang, Q., Fan, J.: Feature selection via maximizing global information gain for text classification. Knowl. Based Syst. 54, 298–309 (2013)CrossRefGoogle Scholar
  24. Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)CrossRefGoogle Scholar
  25. Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)CrossRefGoogle Scholar
  26. Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)CrossRefGoogle Scholar
  27. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)Google Scholar
  28. Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for Naive Bayes text classifiers. Knowl. Based Syst. 100, 137–144 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • D. S. Guru
    • 1
  • Mahamad Suhil
    • 1
  • S. K. Pavithra
    • 1
  • G. R. Priya
    • 1
  1. 1.Department of Studies in Computer ScienceUniversity of MysoreMysoreIndia

Personalised recommendations