Stacked generalization: an introduction to super learning

Naimi, Ashley I.; Balzer, Laura B.

doi:10.1007/s10654-018-0390-z

Stacked generalization: an introduction to super learning

METHODS
Published: 10 April 2018

Volume 33, pages 459–464, (2018)
Cite this article

European Journal of Epidemiology Aims and scope Submit manuscript

Ashley I. Naimi¹ &
Laura B. Balzer²

4927 Accesses
185 Citations
25 Altmetric
3 Mentions
Explore all metrics

Abstract

Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into a host of methods among which is the “Super Learner”. Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Wolpert D. Stacked generalization. Neural Netw. 1992;5(2):241–59.
Article Google Scholar
Breiman L. Stacked regressions. Mach Learn. 1996;24:49–64.
Google Scholar
van der Laan M, Dudoit S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and example. Technical report 30, division of biostatistics, University of California, Berkeley. 2003.
van der Laan M, Dudoit S, van der Vaart AW. The cross-validated adaptive epsilon-net estimator. Stat Decis. 2006;24:373.
Google Scholar
van der Laan MJ, Polley EC, Hubbard AE. Super learner. Stat Appl Genet Mol Biol. 2007; 6:Article 25.
Polley EC, Rose S, van der Laan MJ. Super learning. In: van der Laan MJ, Rose S, editors. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011. p. 43–66.
Chapter Google Scholar
Rose S. Mortality risk score prediction in an elderly population using machine learning. Am J Epidemiol. 2013;177:443–52.
Article PubMed Google Scholar
Pirracchio R, Petersen ML, Carone M, Resche Rigon M, Chevret S, van der Laan M. Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study. Lancet Resp Med. 2015;3:42–52.
Article Google Scholar
Petersen M, LeDell E, Schwab J, Sarovar MS, et al. Super learner analysis of electronic adherence data improves viral prediction and may provide strategies for selective HIV RNA monitoring. J Acquir Immune Defic Syndr. 2015;69:109–18.
Article PubMed PubMed Central Google Scholar
Zheng W, Balzer L, van der Laan M, Petersen M, the SEARCH Collaboration. Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies. Stat Med. 2017; (Early view).
Díaz I, Hubbard A, Decker A, Cohen M. Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE. 2015;10:e0120031.
Article PubMed PubMed Central CAS Google Scholar
Kreif N, Tran L, Grieve R, De Stavola B, Tasker RC, Petersen M. Estimating the comparative effectiveness of feeding interventions in the pediatric intensive care unit: a demonstration of longitudinal targeted maximum likelihood estimation. Am J Epidemiol. 2017;186:1370–9.
Article PubMed PubMed Central Google Scholar
Luque-Fernandez MA, Belot A, Valeri L, Ceruli G, Maringe C, Rachet B. Data-adaptive estimation for double-robust methods in population-based cancer epidemiology: risk differences for lung cancer mortality by emergency presentation. Am J Epidemiol. 2018;187(4):871–8. https://doi.org/10.1093/aje/kwx317.
Article PubMed PubMed Central Google Scholar
Baćak A, Kennedy EH. Principled machine learning using the super learner: an application to predicting prison violence. Sociol Methods Res. 2018. https://doi.org/10.1177/0049124117747301.
Article Google Scholar
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2017.
Hastie T, Tibshirani R. Generalized additive models. London: Chapman & Hall; 1990.
Google Scholar
Hastie T, Tibshirani R, Friedman JH. The elements of statistical learning: data mining, inference, and prediction. New York: Springer; 2009.
Book Google Scholar
Polley E, LeDell E, Kennedy C, van der Laan M. Super learner: super learner prediction. 2016. https://CRAN.R-project.org/package=SuperLearner. R package version 2.0-22.
Kennedy C. SuperLearner introduction. 2017. https://github.com/ck37/superlearner-guide/blob/master/SuperLearner-Intro.Rmd.
Gelman A, Su Y-S. Arm: data analysis using regression and multilevel/hierarchical models. 2016. https://CRAN.R-project.org/package=arm. R package version 1.9-3.
Kooperberg C. Polspline: polynomial spline routines. 2015. https://CRAN.R-project.org/package=polspline. R package version 1.1.12.
Milborrow S. Derived from MDA: mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller’s Fortran utilities with Thomas Lumley’s leaps wrapper. Earth: multivariate adaptive regression splines. 2017. https://CRAN.R-project.org/package=earth. R package version 4.5.0.
Erin L. Scalable ensemble learning and computationally efficient variance estimation. 2015. PhD Dissertation. UC Berkeley.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence. 1995; vol 2.
Zhang Y, Yang Y. Cross-validation for selecting a model selection procedure. J Econom. 2015;187:95–112.
Article Google Scholar
Moodie EEM, Stephens DA. Treatment prediction, balance, and propensity score adjustment. Epidemiology. 2017;28:e51–3.
Article PubMed Google Scholar
Pirracchio R, Carone M. The balance super learner: a robust adaptation of the super learner to improve estimation of the average treatment effect in the treated based on propensity score matching. Stat Methods Med Res. 2016; :962280216682055.
McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF. A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med. 2013;32:3388–414.
Article PubMed PubMed Central Google Scholar
Westreich D, Lessler J, Funk MJ. Propensity score estimation: neural networks, support vector machines, decision trees (cart), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol. 2010;63:826–33.
Article PubMed PubMed Central Google Scholar
Lee BK, Lessler J, Stuart EA. Improving propensity score weighting using machine learning. Stat Med. 2010;29:337–46.
PubMed PubMed Central Google Scholar
Snowden JM, Rose S, Mortimer KM. Implementation of g-computation on a simulated data set: demonstration of a causal inference technique. Am J Epidemiol. 2011;173:731–8.
Article PubMed PubMed Central Google Scholar
Westreich D, Edwards JK, Cole SR, Platt RW, Mumford SL, Schisterman EF. Imputation approaches for potential outcomes in causal inference. Int J Epidemiol. 2015; Published ahead of print July 25, 2015.
van der Vaart A. Higher order tangent spaces and influence functions. Stat Sci. 2014;29:679–86.
Article Google Scholar
Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc. 1994;89:846–66.
Article Google Scholar
van der Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York: Springer; 2003.
Book Google Scholar
van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data. New York: Springer; 2011.
Book Google Scholar
Naimi AI, Kennedy EH. Nonparametric double robustness. arXiv:1711.07137v1 [stat.ME]
Kennedy EH, Balakrishnan S. Discussion of “Data-driven confounder selection via Markov and Bayesian networks” by Jenny Häggström. Biometrics. 2017; (in press).
Robins J, Li L, Tchetgen Tchetgen E, van der Vaart A. Higher order influence functions and minimax estimation of nonlinear functionals. In: Nolan D, Speed T (Eds.) Probability and statistics: essays in honor of David A. Freedman, Volume 2 of collections. Beachwood, Ohio: Institute of Mathematical Statistics. 2008; pp 335–421. https://doi.org/10.1214/193940307000000527
Diaz I, Carone M, van der Laan MJ. Second-order inference for the mean of a variable missing at random. 2016; 12:333. www.degruyter.com/view/j/ijb.2016.12.issue-1/ijb-2015-0031/ijb-2015-0031.xml.

Download references

Acknowledgements

We thank Susan Gruber and Mark J van der Laan for expert advice.

Funding

NIH Grant Number UL1TR001857 and R37AI051164.

Author information

Authors and Affiliations

Department of Epidemiology, University of Pittsburgh, 130 DeSoto Street 503 Parran Hall, Pittsburgh, PA, 15261, USA
Ashley I. Naimi
Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, USA
Laura B. Balzer

Authors

Ashley I. Naimi
View author publications
You can also search for this author in PubMed Google Scholar
Laura B. Balzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashley I. Naimi.

Ethics declarations

Conflicts of interest

The author declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naimi, A.I., Balzer, L.B. Stacked generalization: an introduction to super learning. Eur J Epidemiol 33, 459–464 (2018). https://doi.org/10.1007/s10654-018-0390-z

Download citation

Received: 22 November 2017
Accepted: 28 March 2018
Published: 10 April 2018
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10654-018-0390-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stacked generalization: an introduction to super learning

Abstract

Access this article

Similar content being viewed by others

An Introduction to Stacking Regression for Economists

Supervised Learning: Classification and Regression

Inner Ensembles: Using Ensemble Methods Inside the Learning Algorithm

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stacked generalization: an introduction to super learning

Abstract

Access this article

Similar content being viewed by others

An Introduction to Stacking Regression for Economists

Supervised Learning: Classification and Regression

Inner Ensembles: Using Ensemble Methods Inside the Learning Algorithm

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation