Machine Learning

, Volume 45, Issue 3, pp 261–277 | Cite as

Using Iterated Bagging to Debias Regressions

  • Leo Breiman
Article

Abstract

Breiman (Machine Learning, 26(2), 123–140) showed that bagging could effectively reduce the variance of regression predictors, while leaving the bias relatively unchanged. A new form of bagging we call iterated bagging is effective in reducing both bias and variance. The procedure works in stages—the first stage is bagging. Based on the outcomes of the first stage, the output values are altered; and a second stage of bagging is carried out using the altered output values. This is repeated until a simple rule stops the process. The method is tested using both trees and nearest neighbor regression methods. Accuracy on the Boston Housing data benchmark is comparable to the best of the results gotten using highly tuned and compute- intensive Support Vector Regression Machines. Some heuristic theory is given to clarify what is going on. Application to two-class classification data gives interesting results.

regression bagging out-of-bag unbiased residuals 

References

  1. Breiman, L. (1993). Hinging hyperplanes for regression, classification and noiseless function approximation. IEEE Transactions on Information Theory, 39, 999–1013.Google Scholar
  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 26(2), 123–140.Google Scholar
  3. Breiman, L. (1997a). Arcing the edge. Technical Report, Statistics Department, University of California.Google Scholar
  4. Breiman, L. (1997b). Out-of-bag estimation. Technical Report, Statistics Department, University of California.Google Scholar
  5. Breiman, L. (1998). Arcing classifiers, discussion paper. Annals of Statistics, 26, 801–824.Google Scholar
  6. Breiman, L. (1998a). Half and half bagging and hard boundary points. Technical Report, Statistics Department, University of California.Google Scholar
  7. Breiman, L., Friedman, J., Olshen R., & Stone, C. (1984). Classification and Regression Trees. Wadsworth.Google Scholar
  8. Drucker, H. (1997). Improving regressors using boosting techniques. In Proceedings of the International Conference on Machine Learning (pp. 107–115).Google Scholar
  9. Drucker, H. (1999). Combining Artificial Neural Nets (pp. 51–77). Berlin: Springer.Google Scholar
  10. Drucker, H., Burges, C., Kaufman, K., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155–161.Google Scholar
  11. Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference. July, 1996.Google Scholar
  12. Friedman, J. (1991). Multivariate adaptive regression splines. Annals of Statistics, 1991.Google Scholar
  13. Friedman, J. (1997). On bias, variance, 0/1 loss, and the curse of dimensionality. J. of Data Mining and Knowlege Discovery, 1, 55.Google Scholar
  14. Friedman, J. (1999a). Greedy Function Approximation: A Gradient Boosting Method. Available at http://www-stat.stanford.edu/?jhf/.Google Scholar
  15. Friedman, J. (1999b). Stochastic Gradient Boosting. Available at http://www-stat.stanford.edu/?jhf/.Google Scholar
  16. Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: A statistical view of boosting. Technical Report, Statistics Department, Stanford University.Google Scholar
  17. Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–58.Google Scholar
  18. Scholkopf, B., Bartlett, P., Smola, A., & Williamson, R. (1999). Shrinking the tube:A new support vector regression algorithm. Advances in Neural Information Processing Systems, 11, 330–336.Google Scholar
  19. Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C., & Weston, J. (1999). Support vector regression with ANOVA decomposition kernels. Advances in Kernel Methods—Support Vector Learning (pp. 285–291). Cambridge, MA: MIT Press.Google Scholar
  20. Tibshirani, R. (1996). Bias, variance, and prediction error for classification rules. Technical Report, Statistics Department, University of Toronto.Google Scholar
  21. Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.Google Scholar
  22. Wolpert, D. H. & Macready, W. G. (1999). An efficient method to estimate bagging's generalization error. Machine Learning, 35(1), 41–55.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Leo Breiman
    • 1
  1. 1.Statistics DepartmentUniversity of California at BerkeleyBerkeleyUSA

Personalised recommendations