Machine Learning

, Volume 24, Issue 1, pp 49–64 | Cite as

Stacked regressions

  • Leo Breiman


Stacking regressions is a method for forming linear combinations of different predictors to give improved prediction accuracy. The idea is to use cross-validation data and least squares under non-negativity constraints to determine the coefficients in the combination. Its effectiveness is demonstrated in stacking regression trees of different sizes and in a simulation stacking linear subset and ridge regressions. Reasons why this method works are explored. The idea of stacking originated with Wolpert (1992).

Key words

Stacking Non-negativity Trees Subset regression Combinations 


  1. Belsley, D.A., Kuh, E. and Welsch, R., “Regression Diagnostics,” 1980, John Wiley and Sons, New York.Google Scholar
  2. Berger, J.O. and Bock, M.E., “Combining independent normal mean estimation problems with unknown variances,” Ann. Statist. 4, 1976, pp. 642–648.Google Scholar
  3. Breiman, L., Friedman, J., Olshen, R. and Stone, J., “Classification and Regression Trees,” 1984, Wadsworth, California.Google Scholar
  4. Breiman, L. and Friedman, J.H., “Estimating Optimal Transformations in Multiple Regression and Correlation (with discussion),” J. Amer. Statist. Assoc., 80, 1985, pp. 580–619.Google Scholar
  5. Breiman, L. and Spector, P., “Submodel Selection and Evaluation-X Random Case,” International Statistical Review, 3, 1992, pp. 291–319.Google Scholar
  6. Efron, B. and Morris, C., “Combining possibly related estimation problems (with discussion),” J. Roy. Statist. Soc. Ser. B, 35, 1973, pp. 379–421.Google Scholar
  7. Green, E.J. and Strawderman, W.E., “A James-Stein type estimator for combining unbiased and possibly biased estimators,” J. Amer. Statist. Assoc., 86, 1991, pp. 1001–1006.Google Scholar
  8. Hoerl, A.E. and Kennard, R.W., “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, 12, 1970, pp. 55–67.Google Scholar
  9. Lawson, J. and Hanson, R., “Solving Least Squares Problems,” 1974, Prentice-Hall, New Jersey.Google Scholar
  10. Luenberger, D., “Linear and Nonlinear Programming,” 1984, Addison-Wesley Publishing Co.Google Scholar
  11. Le Blanc, M. and Tibshirani, R., “Combining Estimates in Regression and Classification,” Technical Report 9318, 1973, Dept. of Statistics, University of Toronto.Google Scholar
  12. Perrone, M.P., “General Averaging Results for Convex Optimization,” Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, 1994, pp. 364–371.Google Scholar
  13. Rao, J.N.K. and Subrathmaniam, K., “Combining independent estimators and estimation in linear regression with unequal variances,” Biometrics, 27, 1971, pp. 971–990.Google Scholar
  14. Rubin, D.B. and Weisberg, S., “The variance of a linear combination of independent estimators using estimated weights,” Biometrika, 62, 1975, pp. 708–709.Google Scholar
  15. Wolpert, D., “Stacked Generalization,” Neural Networks, Vol. 5, 1992, pp. 241–259.Google Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Leo Breiman
    • 1
  1. 1.Statistics DepartmentUniversity of CaliforniaBerkeley

Personalised recommendations