Ensemble-Based Feature Ranking for Semi-supervised Classification
In this paper, we propose three feature ranking scores (Symbolic, Genie3, and Random Forest) for the task of semi-supervised classification. In this task, there are only a few labeled examples in a dataset and many unlabeled. This is a highly relevant task, since it is increasingly easy to obtain unlabeled examples, while obtaining labeled examples is often an expensive and tedious task. Each of the proposed feature ranking scores can be computed by using any of three approaches to learning predictive clustering tree ensembles (bagging, random forests, and extra trees). We extensively evaluate the proposed scores on 8 benchmark datasets. The evaluation finds the most suitable ensemble method for each of the scores, shows that taking into account unlabeled examples improves the quality of a feature ranking, and demonstrates that the proposed feature ranking scores outperform a state-of-the-art semi-supervised feature ranking method SEFR. Finally, we identify the best performing pair of a feature ranking score and an ensemble method.
KeywordsSemi-supervised learning Feature ranking Ensembles
We acknowledge the financial support of the Slovenian Research Agency via the grants P2-0103 and a young researcher grant to MP. SD and DK acknowledge the support by the Slovenian Research Agency (via grants J2-9230, and N2-0056), and the European Commission (project LANDMARK and The Human Brain Project SGA2). The computational experiments presented here were executed on a computing infrastructure from the Slovenian Grid (SLING) initiative.
- 3.Blockeel, H.: Top-down Induction of First Order Logical Decision Trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)Google Scholar
- 6.Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning. MIT Press, Cambridge (2010)Google Scholar
- 8.Gijsbers, P.: OpenML repository (2017). https://www.openml.org/d/40713
- 13.Levatić, J.: Semi-supervised Learning for Structured Output Prediction. Ph.D. thesis, Jožef Stefan Postgraduate School, Ljubljana, Slovenia (2017)Google Scholar
- 15.Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 16.Petković, M., Kocev, D., Džeroski, S.: Feature ranking for multi-target regression. Mach. Learn. J. (2019, accepted)Google Scholar
- 17.Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)Google Scholar
- 19.Wettschereck, D.: A Study of Distance Based Algorithms. Ph.D. thesis, Oregon State University, Corvallis, OR (1994)Google Scholar