Advertisement

Auto-CES: An Automatic Pruning Method Through Clustering Ensemble Selection

  • Mojtaba Amiri Maskouni
  • Saeid Hosseini
  • Hadi Mohammadzadeh Abachi
  • Mohammadreza Kangavari
  • Xiaofang Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10837)

Abstract

Ensemble learning is a machine learning approach where multiple learners are trained to solve a particular problem. Random Forest is an ensemble learning algorithm which comprises numerous decision trees and nominates a class through majority voting for classification and averaging approach for regression. The prior research affirms that the learning time of the Random Forest algorithm linearly increases when the number of trees in the forest augments. This large number of decision trees in the Random Forest can cause certain challenges. Firstly, it can enlarge the model complexity, and secondly, it can negatively affect the efficiency of large-scale datasets. Hence, ensemble pruning methods (e.g. Clustering Ensemble Selection (CES)) are devised to select a subset of decision trees out of the forest. The main challenge is that the prior CES models require the number of clusters as input. To solve the problem, we devise an Automatic CES pruning model (Auto-CES) for Random Forest which can automatically find the proper number of clusters. Our proposed model is able to obtain an optimal subset of trees that can provide the same or even better effectiveness compared to the original set. Auto-CES has two components: clustering and selection. First, our algorithm utilizes a new clustering technique to classify homogeneous trees. In selection part, it takes both accuracy and diversity of the trees into consideration to choose the best tree.

Extensive experiments are conducted on five datasets. The results show that our algorithm can perform the classification task more effectively than the state-of-the-art rivals.

Keywords

Machine learning Ensemble method Random Forest Decision tree Clustering Ensemble Selection Pruning of Random Forest 

References

  1. 1.
    Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: International Joint Conference on Neural Networks, IJCNN 2009, pp. 302–307. IEEE (2009)Google Scholar
  2. 2.
    Bharathidason, S., Venkataeswaran, C.J.: Improving classification accuracy based on random forest model with uncorrelated high performing trees. Int. J. Comput. Appl. 101(13), 26–30 (2014)Google Scholar
  3. 3.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine 55 (1998). http://www.ics.uci.edu/ mlearn/mlrepository.html
  4. 4.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)zbMATHGoogle Scholar
  7. 7.
    Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002)CrossRefGoogle Scholar
  8. 8.
    Elghazel, H., Aussem, A., Perraud, F.: Trading-off diversity and accuracy for optimal ensemble tree selection in random forests. In: Okun, O., Valentini, G., Re, M. (eds.) Ensembles in Machine Learning Applications. SCI, vol. 373, pp. 169–179. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22910-7_10CrossRefGoogle Scholar
  9. 9.
    Fawagreh, K., Gaber, M.M., Elyan, E.: CLUB-DRF: a clustering approach to extreme pruning of random forests. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXII, pp. 59–73. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25032-8_4CrossRefGoogle Scholar
  10. 10.
    Fern, X.Z., Lin, W.: Cluster ensemble selection. Stat. Anal. Data Min.: ASA Data Sci. J. 1(3), 128–141 (2008)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Gacquer, D., Delcroix, V., Delmotte, F., Piechowiak, S.: On the effectiveness of diversity when training multiple classifier systems. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS (LNAI), vol. 5590, pp. 493–504. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-02906-6_43CrossRefGoogle Scholar
  12. 12.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. Roy. Stat. Soc.: Ser. C (Appl. Stat.) 28(1), 100–108 (1979)zbMATHGoogle Scholar
  13. 13.
    He, X.G., Hou, W.S., Huang, C.S.: Implications for B\(\rightarrow \upeta \)k and B\(\rightarrow \) glueball+ K modes from observed large B\(\rightarrow \upeta ^{\prime }\) K+ x. Phys. Lett. B 429(1–2), 99–105 (1998)CrossRefGoogle Scholar
  14. 14.
    Hermans, F., Murphy-Hill, E.: Enron’s spreadsheets and related emails: a dataset and analysis. In: Proceedings of the 37th International Conference on Software Engineering, vol. 2, pp. 7–16. IEEE Press (2015)Google Scholar
  15. 15.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)CrossRefGoogle Scholar
  16. 16.
    Holmes, G., Donkin, A., Witten, I.H.: WEKA: a machine learning workbench. In: Proceedings of the 1994 Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361. IEEE (1994)Google Scholar
  17. 17.
    Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)CrossRefGoogle Scholar
  18. 18.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)CrossRefGoogle Scholar
  19. 19.
    Latinne, P., Debeir, O., Decaestecker, C.: Limiting the number of trees in random forests. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 178–187. Springer, Heidelberg (2001).  https://doi.org/10.1007/3-540-48219-9_18CrossRefGoogle Scholar
  20. 20.
    Opitz, D.W., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. (JAIR) 11, 169–198 (1999)zbMATHGoogle Scholar
  21. 21.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Schapire, R.E.: The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990)Google Scholar
  23. 23.
    Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)CrossRefGoogle Scholar
  24. 24.
    Tripoliti, E.E., Fotiadis, D.I., Manis, G.: Dynamic construction of random forests: evaluation using biomedical engineering problems. In: 2010 10th IEEE International Conference on Information Technology and Applications in Biomedicine (ITAB), pp. 1–4. IEEE (2010)Google Scholar
  25. 25.
    Zhang, H., Wang, M.: Search for the smallest random forest. Stat. Interface 2(3), 381 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Mojtaba Amiri Maskouni
    • 1
  • Saeid Hosseini
    • 1
    • 2
  • Hadi Mohammadzadeh Abachi
    • 1
  • Mohammadreza Kangavari
    • 1
  • Xiaofang Zhou
    • 2
  1. 1.Iran University of Science and TechnologyTehranIran
  2. 2.The University of QueenslandBrisbaneAustralia

Personalised recommendations