Stacked regressions

Abstract

Stacking regressions is a method for forming linear combinations of different predictors to give improved prediction accuracy. The idea is to use cross-validation data and least squares under non-negativity constraints to determine the coefficients in the combination. Its effectiveness is demonstrated in stacking regression trees of different sizes and in a simulation stacking linear subset and ridge regressions. Reasons why this method works are explored. The idea of stacking originated with Wolpert (1992).

References

  1. Belsley, D.A., Kuh, E. and Welsch, R., “Regression Diagnostics,” 1980, John Wiley and Sons, New York.

    Google Scholar 

  2. Berger, J.O. and Bock, M.E., “Combining independent normal mean estimation problems with unknown variances,” Ann. Statist. 4, 1976, pp. 642–648.

    Google Scholar 

  3. Breiman, L., Friedman, J., Olshen, R. and Stone, J., “Classification and Regression Trees,” 1984, Wadsworth, California.

    Google Scholar 

  4. Breiman, L. and Friedman, J.H., “Estimating Optimal Transformations in Multiple Regression and Correlation (with discussion),” J. Amer. Statist. Assoc., 80, 1985, pp. 580–619.

    Google Scholar 

  5. Breiman, L. and Spector, P., “Submodel Selection and Evaluation-X Random Case,” International Statistical Review, 3, 1992, pp. 291–319.

    Google Scholar 

  6. Efron, B. and Morris, C., “Combining possibly related estimation problems (with discussion),” J. Roy. Statist. Soc. Ser. B, 35, 1973, pp. 379–421.

    Google Scholar 

  7. Green, E.J. and Strawderman, W.E., “A James-Stein type estimator for combining unbiased and possibly biased estimators,” J. Amer. Statist. Assoc., 86, 1991, pp. 1001–1006.

    Google Scholar 

  8. Hoerl, A.E. and Kennard, R.W., “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, 12, 1970, pp. 55–67.

    Google Scholar 

  9. Lawson, J. and Hanson, R., “Solving Least Squares Problems,” 1974, Prentice-Hall, New Jersey.

    Google Scholar 

  10. Luenberger, D., “Linear and Nonlinear Programming,” 1984, Addison-Wesley Publishing Co.

  11. Le Blanc, M. and Tibshirani, R., “Combining Estimates in Regression and Classification,” Technical Report 9318, 1973, Dept. of Statistics, University of Toronto.

  12. Perrone, M.P., “General Averaging Results for Convex Optimization,” Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, 1994, pp. 364–371.

  13. Rao, J.N.K. and Subrathmaniam, K., “Combining independent estimators and estimation in linear regression with unequal variances,” Biometrics, 27, 1971, pp. 971–990.

    Google Scholar 

  14. Rubin, D.B. and Weisberg, S., “The variance of a linear combination of independent estimators using estimated weights,” Biometrika, 62, 1975, pp. 708–709.

    Google Scholar 

  15. Wolpert, D., “Stacked Generalization,” Neural Networks, Vol. 5, 1992, pp. 241–259.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Breiman, L. Stacked regressions. Mach Learn 24, 49–64 (1996). https://doi.org/10.1007/BF00117832

Download citation

Key words

  • Stacking
  • Non-negativity
  • Trees
  • Subset regression
  • Combinations