Advertisement

Out of Bootstrap Estimation of Generalization Error Curves in Bagging Ensembles

  • Daniel Hernández-Lobato
  • Gonzalo Martínez-Muñoz
  • Alberto Suárez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4881)

Abstract

The dependence of the classification error on the size of a bagging ensemble can be modeled within the framework of Monte Carlo theory for ensemble learning. These error curves are parametrized in terms of the probability that a given instance is misclassified by one of the predictors in the ensemble. Out of bootstrap estimates of these probabilities can be used to model generalization error curves using only information from the training data. Since these estimates are obtained using a finite number of hypotheses, they exhibit fluctuations. This implies that the modeled curves are biased and tend to overestimate the true generalization error. This bias becomes negligible as the number of hypotheses used in the estimator becomes sufficiently large. Experiments are carried out to analyze the consistency of the proposed estimator.

Keywords

Monte Carlo Bootstrap Sample Generalization Error Monte Carlo Algorithm Ensemble Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Esposito, R., Saitta, L.: Monte Carlo theory as an explanation of bagging and boosting. In: IJCAI, pp. 499–504. Morgan Kaufmann, San Francisco (2003)Google Scholar
  3. 3.
    Esposito, R., Saitta, L.: A Monte Carlo analysis of ensemble classification. In: Greiner, R., Schuurmans, D. (eds.) ICML, Banff, Canada, pp. 265–272. ACM Press, New York (2004)Google Scholar
  4. 4.
    Esposito, R., Saitta, L.: Experimental comparison between bagging and Monte Carlo ensemble classification. In: ICML, pp. 209–216. ACM Press, New York, USA (2005)CrossRefGoogle Scholar
  5. 5.
    Brassard, G., Bratley, P.: Algorithmics: theory & practice. Prentice-Hall, Inc., Upper Saddle River, NJ, USA (1988)zbMATHGoogle Scholar
  6. 6.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman & Hall/CRC (1994)Google Scholar
  7. 7.
    Quinlan, J.R.: Bagging, boosting, and C4.5. In: Proc. 13th National Conference on Artificial Intelligence, Cambridge, MA, pp. 725–730 (1996)Google Scholar
  8. 8.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning 36(1-2), 105–139 (1999)CrossRefGoogle Scholar
  9. 9.
    Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)zbMATHGoogle Scholar
  10. 10.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Wolpert, D.H., Macready, W.G.: An efficient method to estimate bagging’s generalization error. Machine Learning 35(1), 41–55 (1999)zbMATHCrossRefGoogle Scholar
  12. 12.
    Breiman, L.: Out-of-bag estimation. Technical report, Statistics Department, University of California (1996)Google Scholar
  13. 13.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)Google Scholar
  14. 14.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman & Hall, New York (1984)zbMATHGoogle Scholar
  15. 15.
    Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52(3), 239–281 (2003)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Daniel Hernández-Lobato
    • 1
  • Gonzalo Martínez-Muñoz
    • 1
  • Alberto Suárez
    • 1
  1. 1.Escuela Politécnica Superior, Universidad Autónoma de Madrid, C/ Francisco Tomás y Valiente, 11, Madrid 28049Spain

Personalised recommendations