Machine Learning

, Volume 47, Issue 2–3, pp 153–200 | Cite as

Boosting Methods for Regression

  • Nigel Duffy
  • David Helmbold


In this paper we examine ensemble methods for regression that leverage or “boost” base regressors by iteratively calling them on modified samples. The most successful leveraging algorithm for classification is AdaBoost, an algorithm that requires only modest assumptions on the base learning method for its strong theoretical guarantees. We present several gradient descent leveraging algorithms for regression and prove AdaBoost-style bounds on their sample errors using intuitive assumptions on the base learners. We bound the complexity of the regression functions produced in order to derive PAC-style bounds on their generalization errors. Experiments validate our theoretical results.

learning boosting arcing ensemble methods regression gradient descent 


  1. Anthony, M., & Bartlett, P. L. (1999). Neural network learning: Theoretical foundations. Cambridge: Cambridge University Press.Google Scholar
  2. Barron, A. R. (1993). Universal approximation bounds for superposition of a sigmoidal function. IEEE Trans. on Information Theory, 39, 930–945.Google Scholar
  3. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36: 1/2, 105–39.Google Scholar
  4. Bertoni, A., Campadelli, P., & Parodi, M. (1997). Aboosting algorithm for regression. In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud (Eds.), Proceedings ICANN'97, Int. Conf. on Artificial Neural Networks (pp. 343–348). Berlin: Springer. Vol. V of LNCS.Google Scholar
  5. Blake, C. E. K., & Merz, C. (1998). UCI repository of machine learning databases.Google Scholar
  6. Breiman, L. (1996). Bagging predictors. Machine Learning, 24:2, 123–140.Google Scholar
  7. Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26:3, 801–849.Google Scholar
  8. Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation, 11, 1493–1517.Google Scholar
  9. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.Google Scholar
  10. Duffy, N., & Helmbold, D. (2000). Potential boosters? In S. Solla, T. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems, 12 (pp. 258–264) Cambridge, MA: MIT Press.Google Scholar
  11. Duffy, N. & Helmbold, D. (1999). A geometric approach to leveraging weak learners. In P. Fischer, & H. U. Simon (Eds.), Computational learning theory: 4th European Conference (EuroCOLT '99) (pp. 18–33). Berlin: Springer-Verlag.Google Scholar
  12. Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121:2, 256–285.Google Scholar
  13. Freund, Y. (1999). An adaptive version of the boost by majority algorithm. In Proc. 12th Annu. Conf. on Comput. Learning Theory (pp. 102–113). New York, NY: ACM Press.Google Scholar
  14. Freund, Y., & Schapire, R. E. (1996). Experiments with a new Boosting algorithm. In Proc. 13th International Conference on Machine Learning (pp. 148–156). San Matco, CA: Morgan Kaufmann.Google Scholar
  15. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55:1, 119–139.Google Scholar
  16. Friedman, J. H. (1999a). Greedy function approximation: A gradient boosting machine. Technical Report, Department of Statistics, Sequoia Hall, Stanford University, Stanford California 94305.Google Scholar
  17. Friedman, J. H. (1999b). Stochastic gradient boosting. Technical Report, Department of Statistics, Sequoia Hall, Stanford University, Stanford California 94305.Google Scholar
  18. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28:2, 337–374.Google Scholar
  19. Drucker, H. (1997). Improving regressors using boosting techniques. In Proceedings of the Fourteenth International Converence on Machine Learning (pp. 107–115). San Matco, CA: Morgan-Kaufman.Google Scholar
  20. Jones, L. K. (1992). Asimple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. The Annals of Statistics, 20, 608–613.Google Scholar
  21. Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41:1, 67–95.Google Scholar
  22. Kearns, M. J., & Vazirani, U. V. (1994). An introduction to computational learning theory. Cambridge, MA: The MIT Press.Google Scholar
  23. Lee, W. S., Bartlett, P. L., & Williamson, R. C. (1995). On efficient agnostic learning of linear combinations of basis functions. In Proc. 8th Annu. Conf. on Comput. Learning Theory (pp. 369–376). New York, NY: ACM Press.Google Scholar
  24. Mason, L., Baxter, J., Bartlett, P., & Frean, M. (2000). Boosting algorithms as gradient descent. In S. Solla, T. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems, 12 (pp. 512–518). Cambridge, MA: MIT Press.Google Scholar
  25. Quinlan, J. R. (1996). Bagging, Boosting and C4.5. In Proceedings of the Thirteenth National Conference of Artificial Intelligence (pp. 725–730). Cambridge, MA: AAAI Press MIT Press.Google Scholar
  26. Rätsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost'. Machine Learning, 42:3, 287–320.Google Scholar
  27. Rätsch, G., Warmuth, M., Mika, S., Onoda, T., Lemm, S., & Müller, K. R. (2000). Barrier Boosting. In Proc. 13th Annu. Conference on Comput. Learning Theory (pp. 170–179). San Francisco: Morgan Kaufmann.Google Scholar
  28. Ridgeway, G., Madigan, D., & Richardson, T. (1999). Boosting methodology for regression problems. In D. Heckerman, & J. Whittaker (Eds.), Proc. Artificial Intelligence and Statistics (pp. 152-161).Google Scholar
  29. Schapire, R. E. (1992). The design and analysis of efficient learning algorithms. Cambridge, MA: MIT Press.Google Scholar
  30. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26:5, 1651–1686.Google Scholar
  31. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37:3, 297–336.Google Scholar
  32. Valiant, L. G. (1984). A theory of the learnable. Commun. ACM, 27:11, 1134–1142.Google Scholar
  33. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Nigel Duffy
    • 1
  • David Helmbold
    • 1
  1. 1.Computer Science DepartmentUniversity of California, Santa CruzSanta CruzUSA

Personalised recommendations