Advertisement

Regularized Linear Models in Stacked Generalization

  • Sam Reid
  • Greg Grudic
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5519)

Abstract

Stacked generalization is a flexible method for multiple classifier combination; however, it tends to overfit unless the combiner function is sufficiently smooth. Previous studies attempt to avoid overfitting by using a linear function at the combiner level. This paper demonstrates experimentally that even with a linear combination function, regularization is necessary to reduce overfitting and increase predictive accuracy. The standard linear least squares regression can be regularized with an L2 penalty (Ridge regression), an L1 penalty (lasso regression) or a combination of the two (elastic net regression). In multi-class classification, sparse linear models select and combine individual predicted probabilities instead of using complete probability distributions, allowing base classifiers to specialize in subproblems corresponding to different classes.

Keywords

Ridge Regression Combiner Function Linear Regression Problem Lasso Regression Decision Stump 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Roli, F., Giacinto, G., Vernazza, G.: Methods for designing multiple classifier systems. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 78–87. Springer, Heidelberg (2001)Google Scholar
  3. 3.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: ICML 2004: Proceedings of the twenty-first international conference on Machine learning, p. 18. ACM, New York (2004)Google Scholar
  6. 6.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)CrossRefGoogle Scholar
  7. 7.
    Ting, K.M., Witten, I.H.: Issues in stacked generalization. Journal of Artificial Intelligence Research 10, 271–289 (1999)zbMATHGoogle Scholar
  8. 8.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2003)zbMATHGoogle Scholar
  9. 9.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B 67, 301–320 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, Washington, DC, USA, pp. 828–833. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  11. 11.
    Seewald, A.K.: How to make stacking better and faster while also taking care of an unknown weakness. In: ICML 2002: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 554–561. Morgan Kaufmann Publishers Inc., San Francisco (2002)Google Scholar
  12. 12.
    Dietterich, T.G.: Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)CrossRefGoogle Scholar
  13. 13.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Annals of Statistics 32, 407–499 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularized paths for generalized linear models via coordinate descent. Technical report, Stanford (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Sam Reid
    • 1
  • Greg Grudic
    • 1
  1. 1.University of Colorado at BoulderBoulderUSA

Personalised recommendations