Advertisement

Variable Randomness in Decision Tree Ensembles

  • Fei Tony Liu
  • Kai Ming Ting
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3918)

Abstract

In this paper, we propose Max-diverse.α, which has a mechanism to control the degrees of randomness in decision tree ensembles. This control gives an ensemble the means to balance the two conflicting functions of a random random ensemble, i.e., the abilities to model non-axis-parallel boundary and eliminate irrelevant features. We find that this control is more sensitive to the one provided by Random Forests. Using progressive training errors, we are able to estimate an appropriate randomness for any given data prior to any predictive tasks. Experiment results show that Max-diverse.α is significantly better than Random Forests and Max-diverse Ensemble, and it is comparable to the state-of-the-art C5 boosting.

Keywords

Feature Selection Random Forest Variable Randomness Decision Boundary Training Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)MATHGoogle Scholar
  2. 2.
    Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)CrossRefGoogle Scholar
  3. 3.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)CrossRefGoogle Scholar
  4. 4.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)CrossRefMATHGoogle Scholar
  6. 6.
    Fan, W., Wang, H., Yu, P.S., Ma, S.: Is random model better? on its accuracy and efficiency. In: Third IEEE International Conference on Data Mining, pp. 51–58 (2003)Google Scholar
  7. 7.
    Liu, F.T., Ting, K.M., Fan, W.: Maximizing tree diversity by building complete-random decision trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 605–610. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)CrossRefMATHGoogle Scholar
  9. 9.
    Ji, C., Ma, S.: Combinations of weak classifiers. IEEE Transactions on Neural Networks 8, 494–500 (1997)Google Scholar
  10. 10.
    Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Mateo (1993), The latest version of C5 is available from, http://www.rulequest.com Google Scholar
  11. 11.
    Buttrey, S., Kobayashi, I.: On strength and correlation in random forests. In: Proceedings of the 2003 Joint Statistical Meetings (2003)Google Scholar
  12. 12.
    Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, Inference, and Prediction. Springer, Heidelberg (2001)CrossRefMATHGoogle Scholar
  13. 13.
    Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjuncts. In: IJCAI, pp. 813–818 (1989)Google Scholar
  14. 14.
    Blake, C., Merz, C.: UCI repository of machine learning databases (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Fei Tony Liu
    • 1
  • Kai Ming Ting
    • 1
  1. 1.Gippsland School of Information TechnologyMonash UniversityChurchillAustralia

Personalised recommendations