Machine Learning

, Volume 35, Issue 1, pp 41–55 | Cite as

An Efficient Method To Estimate Bagging's Generalization Error

  • David H. Wolpert
  • William G. Macready


Bagging (Breiman, 1994a) is a technique that tries to improve a learning algorithm's performance by using bootstrap replicates of the training set (Efron & Tibshirani, 1993, Efron, 1979). The computational requirements for estimating the resultant generalization error on a test set by means of cross-validation are often prohibitive, for leave-one-out cross-validation one needs to train the underlying algorithm on the order of mν times, where m is the size of the training set and ν is the number of replicates. This paper presents several techniques for estimating the generalization error of a bagged learning algorithm without invoking yet more training of the underlying learning algorithm (beyond that of the bagging itself), as is required by cross-validation-based estimation. These techniques all exploit the bias-variance decomposition (Geman, Bienenstock & Doursat, 1992, Wolpert, 1996). The best of our estimators also exploits stacking (Wolpert, 1992). In a set of experiments reported here, it was found to be more accurate than both the alternative cross-validation-based estimator of the bagged algorithm's error and the cross-validation-based estimator of the underlying algorithm's error. This improvement was particularly pronounced for small test sets. This suggests a novel justification for using bagging—more accurate estimation of the generalization error than is possible without bagging.

Bagging cross-validation stacking generalization error bootstrap 


  1. Breiman, L. (1994a). Bagging predictors. Univesity of California, Dept. of Statistics, TR 421.Google Scholar
  2. Breiman, L. (1994b). Heuristics of instability and stabilization in model selection. University of California, Dept. of Statistics, TR 416.Google Scholar
  3. Breiman, L. (1996). Out-of-bag estimation. University of California, Dept. of statistics.Google Scholar
  4. Efron, B. (1979). Computers and the theory of statistics: thinking the unthinkable. SIAM Review, 21: 460.Google Scholar
  5. Efron, B. & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall.Google Scholar
  6. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4: 1–58.Google Scholar
  7. Tibshirani, R. (1996). Bias, variance and prediction error for classification rules. University of Toronto Statistics Department Technical Report.Google Scholar
  8. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5: 241–249.Google Scholar
  9. Wolpert, D. H. (1996). The bootstrap is inconsistent with probability theory. “Maximum Entropy and Bayesian Methods”, K. Hanson and R. Silver (Eds), pages 69–76.Google Scholar
  10. Wolpert, D. H. (1996). On bias plus variance. Neural Computation, in press.Google Scholar
  11. Wolpert, D. H. & Macready, W. G. (1996). Combining stacking with bagging to improve a learning algorithm. Submitted.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • David H. Wolpert
    • 1
  • William G. Macready
    • 2
  1. 1.Caelum ResearchNASA Ames Research CenterMoffett Field
  2. 2.Bios Group, LPSanta Fe

Personalised recommendations