Trading-Off Diversity and Accuracy for Optimal Ensemble Tree Selection in Random Forests

Part of the Studies in Computational Intelligence book series (SCI, volume 373)


We discuss an effective method for optimal ensemble tree selection in Random Forests by trading-off diversity and accuracy of the ensemble during the selection process. As the chances of overfitting increase dramatically with the size of the ensemble, we wrap cross-validation around the ensemble selection to maximize the amount of validation data considering, in turn, each fold as a validation fold to select the trees from. The aim is to increase performance by reducing the variance of the tree ensemble selection process. We demonstrate the effectiveness of our approach on several UCI and real-world data sets.


Random Forest Ensemble Member Ensemble Method Random Forest Model Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernard, S., Heutte, L., Adam, S.: On the selection of decision trees in random forests. In: Proc. 2009 Int. Joint Conf. Neural Networks, Atlanta, GA, pp. 302–307. IEEE Comp. Press, Los Alamitos (2009)CrossRefGoogle Scholar
  2. 2.
    Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Machine Learning Research 9, 2039–2057 (2008)MathSciNetGoogle Scholar
  3. 3.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Dept. of Information and Computer Sciences, Irvine (1998)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 26, 123–140 (1996)Google Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Brown, G., Kuncheva, L.I.: “Good” and “bad” diversity in majority vote ensembles. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 124–133. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: Proc. the 6th Int. Conf. Data Mining, Hong Kong, China, pp. 828–833. IEEE Comp. Society, Los Alamitos (2006)Google Scholar
  8. 8.
    Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Brodley, C. (ed.) Proc. the 21st Int. Conf. Machine Learning, Banff, AB. ACM Press, New York (2004)Google Scholar
  9. 9.
    Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Cohen, W.W., Moore, A. (eds.) Proc. the 23rd Int. Conf. Machine Learning, Pittsburgh, PA, pp. 161–168. ACM Press, New York (2006)CrossRefGoogle Scholar
  10. 10.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 40, 139–157 (2000)CrossRefGoogle Scholar
  12. 12.
    Díaz-Uriarte, R., Alvarez de Andrés, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7 (2006)Google Scholar
  13. 13.
    Freund, Y., Shapire, R.E.: Experiments with a new boosting algorithm. In: Saitta, L. (ed.) 1996 Proc. the 13th Int. Conf. Machine Learning, Bari, Italy, pp. 148–156. Morgan Kaufmann, San Francisco (1996)Google Scholar
  14. 14.
    Gacquer, D., Delcroix, V., Delmotte, F., Piechowiak, S.: On the effectiveness of diversity when training multiple classifier systems. In: Sossai, C., Chemello, G. (eds.) ECSQARU 2009. LNCS, vol. 5590, pp. 493–504. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  15. 15.
    Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  16. 16.
    Kuncheva, L.I.: Combining pattern classifiers: Methods and algorithms. Wiley Interscience, Hoboken (2004)CrossRefzbMATHGoogle Scholar
  17. 17.
    Li, G., Yang, J., Kong, A.S., Chen, N.: Clustering algorithm based selective ensemble. J. Fudan University 43, 689–695 (2004)Google Scholar
  18. 18.
    Lu, Z., Wu, X., Bongard, J.: Ensemble pruning via individual contribution ordering. In: Rao, B., Krishnapuram, B., Tomkins, A., Yang, Q. (eds.) Proc. the 16th ACM SIGKDD Conf. Knowledge Discovery and Data Mining, Washington, DC, pp. 871–880. ACM Press, New York (2010)CrossRefGoogle Scholar
  19. 19.
    Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Fisher, D.H. (ed.) Proc. the 14th Int. Conf. Machine Learning, Nashville, TN, pp. 211–218. Morgan Kaufmann, San Francisco (1997)Google Scholar
  20. 20.
    Martínez-Muñoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Analysis and Machine Intell. 31, 245–259 (2009)CrossRefGoogle Scholar
  21. 21.
    Niculescu-Mizil, A., Perlich, C., Swirszcz, G., Sindhwani, V., Liu, Y., Melville, P., Wang, D., Xiao, J., Hu, J., Singh, M., Shang, W.X., Zhu, W.F.: Winning the KDD Cup Orange Challenge with ensemble selection. J. Machine Learning Research 7, 23–34 (2009)Google Scholar
  22. 22.
    Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Research 11, 169–198 (1999)zbMATHGoogle Scholar
  23. 23.
    Partalas, I., Tsoumakas, G., Vlahavas, I.P.: An ensemble uncertainty aware measure for directed hill climbing ensemble pruning. Machine Learning 81, 257–282 (2010)CrossRefGoogle Scholar
  24. 24.
    Tsoumakas, G., Partalas, I., Vlahavas, I.P.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Ultsch, A.: Fundamental clustering problems suite (2005)Google Scholar
  26. 26.
    Zhang, Y., Burer, S., Street, W.N.: Ensemble pruning via semi-definite programming. J. Machine Learning Research 7, 1315–1338 (2006)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Université de LyonLyonFrance
  2. 2.Université de Lyon 1France
  3. 3.Laboratoire GAMAVilleurbanneFrance

Personalised recommendations