Ensemble of Feature Selection Methods for Text Classification: An Analytical Study

Guru, D. S.; Suhil, Mahamad; Pavithra, S. K.; Priya, G. R.

doi:10.1007/978-3-319-76348-4_33

D. S. Guru¹⁸,
Mahamad Suhil¹⁸,
S. K. Pavithra¹⁸ &
…
G. R. Priya¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 736))

Included in the following conference series:

International Conference on Intelligent Systems Design and Applications

1874 Accesses
9 Citations

Abstract

In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, C., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Article Google Scholar
Aggarwal, C.C., Zhai, C.X.: Mining Text Data. Springer, Boston (2012). ISBN 978-1-4614-3222-7
Book Google Scholar
Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)
Article Google Scholar
Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42, 3105–3114 (2015)
Article Google Scholar
Brahim, A.B., Limam, M.: Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. (2015). https://doi.org/10.1007/s11634-017-0285-y
Dadaneh, B.Z., Markid, H.Y., Zakerolhosseini, A.: Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst. Appl. 53, 27–42 (2016)
Article Google Scholar
Feng, G., Guo, J., Jing, B.Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48, 283–302 (2012)
Article Google Scholar
Fenga, G., Guoa, J., Jing, B.Y., Sunb, T.: Feature subset selection using Naive Bayes for text classification. Pattern Recogn. Lett. 65, 109–115 (2015)
Article Google Scholar
Ghareb, S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)
Article Google Scholar
Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comp. Sci. 20(2), 1296–1311 (2012)
Google Scholar
Jalilvanda, A., Salim, N.: Feature unionization: a novel approach for dimension reduction. Appl. Soft Comput. 52, 1253–1261 (2017)
Article Google Scholar
Jiang, L., Li, C., Wang, S., Zhang, L.: Deep feature weighting for Naive Bayes and its application to text classification. Eng. Appl. Artif. Intell. 52, 26–39 (2016)
Article Google Scholar
Kolde, R., Laur, S., Adler, P., Vilo, J.: Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012)
Article Google Scholar
Kumar, G., Kumar, K.: The use of artificial-intelligence-based ensembles for intrusion detection: a review. Appl. Comput. Intell. Soft Comput. 2012, 1–20 (2012)
Article Google Scholar
Li, Y.H., Jain, A.K.: Classification of text documents. Comput. J. 41(8), 537–546 (1998)
Article MATH Google Scholar
Meena, M.J., Chandran, K.R., Karthik, A., Samuel, A.V.: An enhanced ACO algorithm to select features for text categorization and its parallelization. Expert Syst. Appl. 39, 5861–5871 (2012)
Article Google Scholar
Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. 43, 117–130 (2016)
Article Google Scholar
Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous & heterogeneous approach. Knowl. Based Syst. 118, 124–139 (2017)
Article Google Scholar
Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)
Article Google Scholar
Pinheiro, R.H.W., Cavalcanti, G.D.C., Correa, R.F., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Syst. Appl. 39, 12851–12857 (2012)
Article Google Scholar
Sarkar, S.D., Goswami, S., Agarwal, A., Aktar, J.: A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 1–10 (2014)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Shang, C., Li, M., Feng, S., Jiang, Q., Fan, J.: Feature selection via maximizing global information gain for text classification. Knowl. Based Syst. 54, 298–309 (2013)
Article Google Scholar
Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)
Article Google Scholar
Uysal, A.K.: An improved global feature selection scheme for text classification. Expert Syst. Appl. 43, 82–92 (2016)
Article Google Scholar
Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)
Google Scholar
Zhang, L., Jiang, L., Li, C., Kong, G.: Two feature weighting approaches for Naive Bayes text classifiers. Knowl. Based Syst. 100, 137–144 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysore, 570006, India
D. S. Guru, Mahamad Suhil, S. K. Pavithra & G. R. Priya

Authors

D. S. Guru
View author publications
You can also search for this author in PubMed Google Scholar
Mahamad Suhil
View author publications
You can also search for this author in PubMed Google Scholar
S. K. Pavithra
View author publications
You can also search for this author in PubMed Google Scholar
G. R. Priya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. K. Pavithra .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs , Auburn, Washington, USA
Ajith Abraham
Department of Computer Science, South Asian University, Chanakyapuri, Delhi, India
Pranab Kr. Muhuri
Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka , Durian Tunggal, Melaka, Malaysia
Azah Kamilah Muda
Machine Intelligence Research Labs , Auburn, Washington, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guru, D.S., Suhil, M., Pavithra, S.K., Priya, G.R. (2018). Ensemble of Feature Selection Methods for Text Classification: An Analytical Study. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2017. Advances in Intelligent Systems and Computing, vol 736. Springer, Cham. https://doi.org/10.1007/978-3-319-76348-4_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-76348-4_33
Published: 22 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76347-7
Online ISBN: 978-3-319-76348-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics