Machine Learning

, Volume 48, Issue 1–3, pp 287–297 | Cite as

Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates

  • Tom Bylander


For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-bag votes.

bagging cross-validation generalization error 


  1. Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. [ ~mlearn/MLRepository.html]. Irvine, California: Department of Information and Computer Science, University of California.Google Scholar
  2. Breiman, L. (1996a). Bagging predictors. Machine Learning, 24:2, 123–140.Google Scholar
  3. Breiman, L. (1996b). Out-of-bag estimation. []. Berkeley, California: Department of Statistics, University of California.Google Scholar
  4. Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2, 139–157.Google Scholar
  5. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.Google Scholar
  6. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148–156). Bara, Italy: Morgan Kaufmann.Google Scholar
  7. Kearns, M. J., & Ron, D. (1997). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 152–162). Nashville, Tennessee: ACM Press.Google Scholar
  8. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). Montréal: Morgan Kaufmann.Google Scholar
  9. Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551). Providence, Rhode Island: AAAI Press.Google Scholar
  10. Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Englewood Cliffs, New Jersey: Prentice Hall.Google Scholar
  11. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81–106.Google Scholar
  12. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, California: Morgan Kaufmann.Google Scholar
  13. Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730). Portland, Oregon: AAAI Press.Google Scholar
  14. Tibshirani, R. (1996). Bias, variance and prediction error for classification rules. [ ~tibs/ftp/]. Toronto: Department of Statistics, University of Toronto.Google Scholar
  15. Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, California: Morgan Kaufmann.Google Scholar
  16. Wolpert, D. H., & Macready, W.G. (1999). An efficient method to estimate bagging's generalization error. Machine Learning, 35:1, 41–55.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Tom Bylander
    • 1
  1. 1.Division of Computer ScienceUniversity of Texas at San AntonioSan AntonioUSA

Personalised recommendations