Skip to main content
Springer Nature Link
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates

  • Published: July 2002
  • Volume 48, pages 287–297, (2002)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates
Download PDF
  • Tom Bylander1 
  • 2485 Accesses

  • Explore all metrics

Abstract

For two-class datasets, we provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each predictor (single hypothesis) is learned from a bootstrap sample of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out-of-bag estimate is based on recording the votes of each predictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10-fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error on two-class datasets. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have similar performance, but are both biased. We can eliminate most of the bias in the out-of-bag estimate and increase accuracy by incorporating a correction based on the distribution of the out-of-bag votes.

Article PDF

Download to read the full article text

Similar content being viewed by others

On Properties of Undersampling Bagging and Its Extensions for Imbalanced Data

Chapter © 2016

Actively Balanced Bagging for Imbalanced Data

Chapter © 2017

Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data

Article Open access 10 February 2017

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.
  • Artificial Intelligence
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. [http://www.ics.uci.edu/ ~mlearn/MLRepository.html]. Irvine, California: Department of Information and Computer Science, University of California.

  • Breiman, L. (1996a). Bagging predictors. Machine Learning, 24:2, 123–140.

    Google Scholar 

  • Breiman, L. (1996b). Out-of-bag estimation. [ftp://ftp.stat.berkeley.edu/pub/users/breiman/OOBestimation.ps.Z]. Berkeley, California: Department of Statistics, University of California.

  • Dietterich, T. G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:2, 139–157.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman and Hall.

    Google Scholar 

  • Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning (pp. 148–156). Bara, Italy: Morgan Kaufmann.

    Google Scholar 

  • Kearns, M. J., & Ron, D. (1997). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 152–162). Nashville, Tennessee: ACM Press.

    Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1137–1143). Montréal: Morgan Kaufmann.

    Google Scholar 

  • Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551). Providence, Rhode Island: AAAI Press.

    Google Scholar 

  • Michie, D., Spiegelhalter, D. J., & Taylor, C. C. (1994). Machine learning, neural and statistical classification. Englewood Cliffs, New Jersey: Prentice Hall.

    Google Scholar 

  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:1, 81–106.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, California: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R. (1996). Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730). Portland, Oregon: AAAI Press.

    Google Scholar 

  • Tibshirani, R. (1996). Bias, variance and prediction error for classification rules. [http://www-stat.stanford.edu/ ~tibs/ftp/biasvar.ps]. Toronto: Department of Statistics, University of Toronto.

  • Weiss, S. M., & Kulikowski, C. A. (1991). Computer systems that learn: Classification and prediction methods from statistics, neural nets, machine learning, and expert systems. San Mateo, California: Morgan Kaufmann.

    Google Scholar 

  • Wolpert, D. H., & Macready, W.G. (1999). An efficient method to estimate bagging's generalization error. Machine Learning, 35:1, 41–55.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Division of Computer Science, University of Texas at San Antonio, San Antonio, Texas, 78249-0667, USA

    Tom Bylander

Authors
  1. Tom Bylander
    View author publications

    You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bylander, T. Estimating Generalization Error on Two-Class Datasets Using Out-of-Bag Estimates. Machine Learning 48, 287–297 (2002). https://doi.org/10.1023/A:1013964023376

Download citation

  • Issue Date: July 2002

  • DOI: https://doi.org/10.1023/A:1013964023376

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • bagging
  • cross-validation
  • generalization error
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Journal finder
  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support
  • Cancel contracts here

65.109.116.201

Not affiliated

Springer Nature

© 2025 Springer Nature