Advertisement

Ensemble-Based Feature Ranking for Semi-supervised Classification

  • Matej PetkovićEmail author
  • Sašo Džeroski
  • Dragi Kocev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11828)

Abstract

In this paper, we propose three feature ranking scores (Symbolic, Genie3, and Random Forest) for the task of semi-supervised classification. In this task, there are only a few labeled examples in a dataset and many unlabeled. This is a highly relevant task, since it is increasingly easy to obtain unlabeled examples, while obtaining labeled examples is often an expensive and tedious task. Each of the proposed feature ranking scores can be computed by using any of three approaches to learning predictive clustering tree ensembles (bagging, random forests, and extra trees). We extensively evaluate the proposed scores on 8 benchmark datasets. The evaluation finds the most suitable ensemble method for each of the scores, shows that taking into account unlabeled examples improves the quality of a feature ranking, and demonstrates that the proposed feature ranking scores outperform a state-of-the-art semi-supervised feature ranking method SEFR. Finally, we identify the best performing pair of a feature ranking score and an ensemble method.

Keywords

Semi-supervised learning Feature ranking Ensembles 

Notes

Acknowledgements

We acknowledge the financial support of the Slovenian Research Agency via the grants P2-0103 and a young researcher grant to MP. SD and DK acknowledge the support by the Slovenian Research Agency (via grants J2-9230, and N2-0056), and the European Commission (project LANDMARK and The Human Brain Project SGA2). The computational experiments presented here were executed on a computing infrastructure from the Slovenian Grid (SLING) initiative.

References

  1. 1.
    Bellal, F., Elghazel, H., Aussem, A.: A semi-supervised feature ranking method with ensemble learning. Pattern Recognit. Lett. 33(10), 1426–1433 (2012)CrossRefGoogle Scholar
  2. 2.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995). https://dl.acm.org/citation.cfm?id=525960zbMATHGoogle Scholar
  3. 3.
    Blockeel, H.: Top-down Induction of First Order Logical Decision Trees. Ph.D. thesis, Katholieke Universiteit Leuven, Leuven, Belgium (1998)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  5. 5.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman and Hall/CRC, New York (1984)zbMATHGoogle Scholar
  6. 6.
    Chapelle, O., Schölkopf, B., Zien, A.: Semi-supervised Learning. MIT Press, Cambridge (2010)Google Scholar
  7. 7.
    Geurts, P., Erns, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 36(1), 3–42 (2006)CrossRefGoogle Scholar
  8. 8.
    Gijsbers, P.: OpenML repository (2017). https://www.openml.org/d/40713
  9. 9.
    Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Analy. Mach. Intell. 12, 993–1001 (1990)CrossRefGoogle Scholar
  10. 10.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)CrossRefGoogle Scholar
  11. 11.
    Huynh-Thu, V.A., Irrthum, A., Wehenkel, L., Geurts, P.: Inferring regulatory networks from expression data using tree-based methods. PLoS One 5(9), 1–10 (2010)CrossRefGoogle Scholar
  12. 12.
    Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recognit. 46(3), 817–833 (2013)CrossRefGoogle Scholar
  13. 13.
    Levatić, J.: Semi-supervised Learning for Structured Output Prediction. Ph.D. thesis, Jožef Stefan Postgraduate School, Ljubljana, Slovenia (2017)Google Scholar
  14. 14.
    Levatić, J., Ceci, M., Kocev, D., Džeroski, S.: Semi-supervised classification trees. J. Intell. Inf. Syst. 49(3), 461–486 (2017)CrossRefGoogle Scholar
  15. 15.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  16. 16.
    Petković, M., Kocev, D., Džeroski, S.: Feature ranking for multi-target regression. Mach. Learn. J. (2019, accepted)Google Scholar
  17. 17.
    Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)Google Scholar
  18. 18.
    Sheikhpour, R., Sarram, M., Gharaghani, S., Chahooki, M.: A survey on semi-supervised feature selection methods. Pattern Recognit. 64((C)), 141–185 (2017)CrossRefGoogle Scholar
  19. 19.
    Wettschereck, D.: A Study of Distance Based Algorithms. Ph.D. thesis, Oregon State University, Corvallis, OR (1994)Google Scholar
  20. 20.
    Xu, Z., King, I., Lyu, M.R.T., Jin, R.: Discriminative semi-supervised feature selection via manifold regularization. Trans. Neural Netw. 21(7), 1033–1047 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Matej Petković
    • 1
    • 2
    Email author
  • Sašo Džeroski
    • 1
    • 2
  • Dragi Kocev
    • 2
  1. 1.Jožef Stefan International Postgraduate SchoolLjubljanaSlovenia
  2. 2.Department of Knowledge Technologies, Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations