Advertisement

Learning with Ensembles of Randomized Trees : New Insights

  • Vincent Pisetta
  • Pierre-Emmanuel Jouve
  • Djamel A. Zighed
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6323)

Abstract

Ensembles of randomized trees such as Random Forests are among the most popular tools used in machine learning and data mining. Such algorithms work by introducing randomness in the induction of several decision trees before employing a voting scheme to give a prediction for unseen instances. In this paper, randomized trees ensembles are studied in the point of view of the basis functions they induce. We point out a connection with kernel target alignment, a measure of kernel quality, which suggests that randomization is a way to obtain a high alignment, leading to possibly low generalization error. The connection also suggests to post-process ensembles with sophisticated linear separators such as Support Vector Machines (SVM). Interestingly, post-processing gives experimentally better performances than a classical majority voting. We finish by comparing those results to an approximate infinite ensemble classifier very similar to the one introduced by Lin and Li. This methodology also shows strong learning abilities, comparable to ensemble post-processing.

Keywords

Ensemble Learning Kernel Target Alignment Randomized Trees Ensembles Infinite Ensembles 

References

  1. 1.
    Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9, 2015–2033 (2008)MathSciNetGoogle Scholar
  2. 2.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Breiman, L.: Bias, variance and arcing classifiers (1996), http://www.sasenterpriseminer.com/documents/arcing.pdf
  4. 4.
    Breiman, L.: Some infinity theory for predictor ensembles (2000), http://www.stat.berkeley.edu
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks (1984)Google Scholar
  7. 7.
    Buntine, W., Niblett, T.: A further comparison of splitting rules for decision tree induction. Machine Learning 8, 75–85 (1992)Google Scholar
  8. 8.
    Chang, C.-C., Lin, C.-J.: Libsvm: A library for support vector machines (2001), Software available at, http://www.csie.ntu.edu.tw/cjlin/libsvm
  9. 9.
    Cristianini, N., Kandola, J., Elisseeff, A., Shawe-Taylor, J.: On kernel-target alignment. In: Holmes, D., Jain, L. (eds.) Innovations in Machine Learning: Theory and Application, pp. 205–255 (2006)Google Scholar
  10. 10.
    Cutler, A., Zhao, G.: Pert - perfect random tree ensembles. Computer Science and Statistics (2001)Google Scholar
  11. 11.
    Demiriz, A., Bennett, K., Shawe-Taylor, J.: Linear programming boosting via column generation. Machine Learning 46, 225–254 (2002)zbMATHCrossRefGoogle Scholar
  12. 12.
    Dietterich, T.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)CrossRefGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: ICML, pp. 148–156 (1996)Google Scholar
  14. 14.
    Freund, Y., Schapire, R.: A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)Google Scholar
  15. 15.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. The Annals of Statistics 38, 95–118 (2000)MathSciNetGoogle Scholar
  16. 16.
    Friedman, J., Popescu, B.: Predictive learning via rule ensembles. The Annals of Applied Statistics 2, 916–954 (2008)zbMATHCrossRefGoogle Scholar
  17. 17.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63, 3–42 (2006)zbMATHCrossRefGoogle Scholar
  18. 18.
    Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 545–552. MIT Press, Cambridge (2005)Google Scholar
  19. 19.
    Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data Mining, Inference and Prediction. Springer, Heidelberg (2009)zbMATHGoogle Scholar
  20. 20.
    Hsu, C.-W., Chang, C.-C., Lin, C.-J.: A practical guide to support vector classification (2003), http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
  21. 21.
    Kleinberg, E.: On the algorithmic implementation of stochastic discrimination. IEEE Trans. Pattern Anal. Mach. Intell. 22, 473–490 (2000)CrossRefGoogle Scholar
  22. 22.
    Kuncheva, L., Whitaker, C.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)zbMATHCrossRefGoogle Scholar
  23. 23.
    Lin, H.-T., Li, L.: Support vector machinery for infinite ensemble learning. Journal of Machine Learning Research 9, 941–973 (2008)MathSciNetGoogle Scholar
  24. 24.
    Liu, F.-T., Ting, K.-M., Yu, Y., Zhou, Z.-H.: Spectrum of Variable-Random Trees. Journal of Artifical Intelligence Research 32, 355–384 (2008)zbMATHGoogle Scholar
  25. 25.
    Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  26. 26.
    Ratsch, G., Onoda, T., Muller, K.-R.: Soft margins for adaboost. Machine Learning 42, 287–320 (2001)CrossRefGoogle Scholar
  27. 27.
    Rosset, S., Zhu, J., Hastie, T.: Boosting as a regularized path to a maximum margin classifier. Journal of Machine Learning Research 5, 941–973 (2004)MathSciNetGoogle Scholar
  28. 28.
    Scholkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2001)Google Scholar
  29. 29.
    Utgoff, P., Clouse, J.: A kolmogorov-smirnov metric for decision tree induction (1996), http://www.ics.uci.edu/~mlearn/MLRepository.html
  30. 30.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)zbMATHGoogle Scholar
  31. 31.
    Wolpert, D.: Stacked Generalization. Neural Networks 5, 241–259 (1992)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Vincent Pisetta
    • 1
  • Pierre-Emmanuel Jouve
    • 2
  • Djamel A. Zighed
    • 3
  1. 1.RithmeLyon
  2. 2.FenicsLyon
  3. 3.ERIC LaboratoryBron

Personalised recommendations