Skip to main content

Ensemble of Feature Selection Methods for Text Classification: An Analytical Study

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Abstract

In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, C., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)

    Article  Google Scholar 

  • Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer, Boston (2012). ISBN 978-1-4614-3222-7

    Book  Google Scholar 

  • Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)

    Article  Google Scholar 

  • Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42, 3105–3114 (2015)

    Article  Google Scholar 

  • Brahim, A.B., Limam, M.: Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. (2015). https://doi.org/10.1007/s11634-017-0285-y

  • Dadaneh, B.Z., Markid, H.Y., Zakerolhosseini, A.: Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst. Appl. 53, 27–42 (2016)

    Article  Google Scholar 

  • Feng, G., Guo, J., Jing, B.Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48, 283–302 (2012)

    Article  Google Scholar 

  • Fenga, G., Guoa, J., Jing, B.Y., Sunb, T.: Feature subset selection using Naive Bayes for text classification. Pattern Recogn. Lett. 65, 109–115 (2015)

    Article  Google Scholar 

  • Ghareb, S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)

    Article  Google Scholar 

  • Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comp. Sci. 20(2), 1296–1311 (2012)

    Google Scholar 

  • Jalilvanda, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017)

    Article  Google Scholar 

  • Jiang, L., Li, C., Wang, S., Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)

    Article  Google Scholar 

  • Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)

    Article  Google Scholar 

  • Kumar, G., Kumar, K.: The use of artificial-intelligence-based ensembles for intrusion detection: a review. Appl. Comput. Intell. Soft Comput. 2012, 1–20 (2012)

    Article  Google Scholar 

  • Li, Y.H., Jain, A.K.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)

    Article  MATH  Google Scholar 

  • Meena, M.J., Chandran, K.R., Karthik, A., Samuel, A.V.: An enhanced ACO algorithm to select features for text categorization and its parallelization. Expert Syst. Appl. 39, 5861–5871 (2012)

    Article  Google Scholar 

  • Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 43, 117–130 (2016)

    Article  Google Scholar 

  • Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous & heterogeneous approach. Knowl. Based Syst. 118, 124–139 (2017)

    Article  Google Scholar 

  • Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)

    Article  Google Scholar 

  • Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)

    Article  Google Scholar 

  • Sarkar, S.D., Goswami, S., Agarwal, A., Aktar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 1–10 (2014)

    Article  Google Scholar 

  • Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  • Shang, C., Li, M., Feng, S., Jiang, Q., Fan, J.: Feature selection via maximizing global information gain for text classification. Knowl. Based Syst. 54, 298–309 (2013)

    Article  Google Scholar 

  • Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)

    Article  Google Scholar 

  • Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)

    Article  Google Scholar 

  • Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)

    Article  Google Scholar 

  • Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)

    Google Scholar 

  • Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for Naive Bayes text classifiers. Knowl. Based Syst. 100, 137–144 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. K. Pavithra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R. (2018). Ensemble of Feature Selection Methods for Text Classification: An Analytical Study. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76348-4_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76347-7

  • Online ISBN: 978-3-319-76348-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics