Out of Bootstrap Estimation of Generalization Error Curves in Bagging Ensembles
The dependence of the classification error on the size of a bagging ensemble can be modeled within the framework of Monte Carlo theory for ensemble learning. These error curves are parametrized in terms of the probability that a given instance is misclassified by one of the predictors in the ensemble. Out of bootstrap estimates of these probabilities can be used to model generalization error curves using only information from the training data. Since these estimates are obtained using a finite number of hypotheses, they exhibit fluctuations. This implies that the modeled curves are biased and tend to overestimate the true generalization error. This bias becomes negligible as the number of hypotheses used in the estimator becomes sufficiently large. Experiments are carried out to analyze the consistency of the proposed estimator.
KeywordsMonte Carlo Bootstrap Sample Generalization Error Monte Carlo Algorithm Ensemble Size
Unable to display preview. Download preview PDF.
- 2.Esposito, R., Saitta, L.: Monte Carlo theory as an explanation of bagging and boosting. In: IJCAI, pp. 499–504. Morgan Kaufmann, San Francisco (2003)Google Scholar
- 3.Esposito, R., Saitta, L.: A Monte Carlo analysis of ensemble classification. In: Greiner, R., Schuurmans, D. (eds.) ICML, Banff, Canada, pp. 265–272. ACM Press, New York (2004)Google Scholar
- 6.Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC (1994)Google Scholar
- 7.Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proc. 13th National Conference on Artificial Intelligence, Cambridge, MA, pp. 725–730 (1996)Google Scholar
- 12.Breiman, L.: Out-of-bag estimation. Technical report, Statistics Department, University of California (1996)Google Scholar
- 13.Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar