Ensemble of Feature Selection Methods for Text Classification: An Analytical Study
In this paper, alternative models for ensembling of feature selection methods for text classification have been studied. An analytical study on three different models with various rank aggregation techniques has been made. The three models proposed for ensembling of feature selection are homogeneous ensemble, heterogeneous ensemble and hybrid ensemble. In homogeneous ensemble, the training feature matrix is randomly partitioned into multiple equal sized training matrices. A common feature evaluation function (FEF) is applied on all the smaller training matrices so as to obtain multiple ranks for each feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. In heterogeneous ensemble, instead of partitioning the training matrix, multiple FEFs are applied onto the same training matrix to obtain multiple rankings for every feature. Then a final score for each feature is computed by applying a suitable rank aggregation method. Hybrid ensembling combines the ranks obtained by multiple homogeneous ensembling through multiple FEFs. It has been experimentally proven on two benchmarking text collections that, in most of the cases the proposed ensembling methods achieve better performance than that of any one of the feature selection methods when applied individually.
KeywordsFeature selection Ranking aggregation Feature evaluation function Text classification Document term matrix
- Brahim, A.B., Limam, M.: Ensemble feature selection for high dimensional data: a new method and a comparative study. Adv. Data Anal. Classif. (2015). https://doi.org/10.1007/s11634-017-0285-y
- Gunal, S.: Hybrid feature selection for text classification. Turk. J. Electr. Eng. Comp. Sci. 20(2), 1296–1311 (2012)Google Scholar
- Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 412–420 (1997)Google Scholar