Abstract
In this paper, we propose Max-diverse.α, which has a mechanism to control the degrees of randomness in decision tree ensembles. This control gives an ensemble the means to balance the two conflicting functions of a random random ensemble, i.e., the abilities to model non-axis-parallel boundary and eliminate irrelevant features. We find that this control is more sensitive to the one provided by Random Forests. Using progressive training errors, we are able to estimate an appropriate randomness for any given data prior to any predictive tasks. Experiment results show that Max-diverse.α is significantly better than Random Forests and Max-diverse Ensemble, and it is comparable to the state-of-the-art C5 boosting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 832–844 (1998)
Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning 40, 139–157 (2000)
Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)
Fan, W., Wang, H., Yu, P.S., Ma, S.: Is random model better? on its accuracy and efficiency. In: Third IEEE International Conference on Data Mining, pp. 51–58 (2003)
Liu, F.T., Ting, K.M., Fan, W.: Maximizing tree diversity by building complete-random decision trees. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 605–610. Springer, Heidelberg (2005)
Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51, 181–207 (2003)
Ji, C., Ma, S.: Combinations of weak classifiers. IEEE Transactions on Neural Networks 8, 494–500 (1997)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Mateo (1993), The latest version of C5 is available from, http://www.rulequest.com
Buttrey, S., Kobayashi, I.: On strength and correlation in random forests. In: Proceedings of the 2003 Joint Statistical Meetings (2003)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, Inference, and Prediction. Springer, Heidelberg (2001)
Holte, R.C., Acker, L., Porter, B.W.: Concept learning and the problem of small disjuncts. In: IJCAI, pp. 813–818 (1989)
Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, F.T., Ting, K.M. (2006). Variable Randomness in Decision Tree Ensembles. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_12
Download citation
DOI: https://doi.org/10.1007/11731139_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)
