# When and when not to use optimal model averaging

## Abstract

Traditionally model averaging has been viewed as an alternative to model selection with the ultimate goal to incorporate the uncertainty associated with the model selection process in standard errors and confidence intervals by using a weighted combination of candidate models. In recent years, a new class of model averaging estimators has emerged in the literature, suggesting to combine models such that the squared risk, or other risk functions, are minimized. We argue that, contrary to popular belief, these estimators do not necessarily address the challenges induced by model selection uncertainty, but should be regarded as attractive complements for the machine learning and forecasting literature, as well as tools to identify causal parameters. We illustrate our point by means of several targeted simulation studies.

## Keywords

Model selection Model averaging Prediction Machine learning Causal inference## References

- Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Proceeding of the second international symposiumon information theory, Budapest, pp 267–281Google Scholar
- Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 64(2):962–972MathSciNetCrossRefGoogle Scholar
- Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
- Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618CrossRefGoogle Scholar
- Burnham K, Anderson D (2002) Model selection and multimodel inference. A practical information-theoretic approach. Springer, New YorkzbMATHGoogle Scholar
- Chatfield C (1995) Model uncertainty, data mining and statistical inference. J R Stat Soc A 158:419–466CrossRefGoogle Scholar
- Cheng TCF, Ing CK, Yu SH (2015) Toward optimal model averaging in regression models with time series errors. J Econometr 189(2):321–334MathSciNetCrossRefGoogle Scholar
- Daniel RM, Cousens SN, De Stavola BL, Kenward MG, Sterne JA (2013) Methods for dealing with time-dependent confounding. Stat Med 32(9):1584–1618MathSciNetCrossRefGoogle Scholar
- Draper D (1995) Assessment and propagation of model uncertainty. J R Stat Soc B 57:45–97MathSciNetzbMATHGoogle Scholar
- Fletcher D, Dillingham PW (2011) Model-averaged confidence intervals for factorial experiments. Comput Stat Data Anal 55:3041–3048MathSciNetCrossRefGoogle Scholar
- Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22CrossRefGoogle Scholar
- Gao Y, Zhang XY, Wang SY, Zou GH (2016) Model averaging based on leave-subject-out cross-validation. J Econometr 192(1):139–151MathSciNetCrossRefGoogle Scholar
- Gelman A, Su YS (2016) arm: data analysis using regression and multilevel/hierarchical models. R package version 1.9-3. https://CRAN.R-project.org/package=arm. Accessed 12 Sept 2018
- Gruber S, van der Laan MJ (2012) tmle: an R package for targeted maximum likelihood estimation. J Stat Softw 51(13):1–35CrossRefGoogle Scholar
- Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189MathSciNetCrossRefGoogle Scholar
- Hansen BE (2008) Least squares forecast averaging. J Econometr 146:342–350MathSciNetCrossRefGoogle Scholar
- Hansen BE, Racine J (2012) Jackknife model averaging. J Econometr 167:38–46MathSciNetCrossRefGoogle Scholar
- Hjort L, Claeskens G (2003) Frequentist model average estimators. J Am Stat Assoc 98:879–945MathSciNetCrossRefGoogle Scholar
- Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial. Stat Sci 14:382–417MathSciNetCrossRefGoogle Scholar
- Kabaila P, Welsh A, Abeysekera W (2016) Model-averaged confidence intervals. Scand J Stat 43:35–48MathSciNetCrossRefGoogle Scholar
- Leeb H, Pötscher BM (2005) Model selection and inference: facts and fiction. Econometr Theory 21:21–59MathSciNetCrossRefGoogle Scholar
- Leeb H, Pötscher BM (2008) Model Selection. Springer, New York, pp 785–821zbMATHGoogle Scholar
- Lendle SD, Schwab J, Petersen ML, van der Laan MJ (2017) ltmle: an R package implementing targeted minimum loss-based estimation for longitudinal data. J Stat Softw 81(1):1–21CrossRefGoogle Scholar
- Liang H, Zou GH, Wan ATK, Zhang XY (2011) Optimal weight choice for frequentist model average estimators. J Am Stat Assoc 106(495):1053–1066MathSciNetCrossRefGoogle Scholar
- Liu C, Kuo B (2016) Model averaging in predictive regressions. Econometr J 19(2):203–231MathSciNetCrossRefGoogle Scholar
- Liu QF, Okui R, Yoshimura A (2016) Generalized least squares model averaging. Econometr Rev 35(8–10):1692–1752MathSciNetCrossRefGoogle Scholar
- Mallows C (1973) Some comments on \(C_p\). Technometrics 15:661–675zbMATHGoogle Scholar
- Petersen M, Schwab J, Gruber S, Blaser N, Schomaker M, van der Laan M (2014) Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. J Causal Inference 2:147–185CrossRefGoogle Scholar
- Polley E, LeDell E, Kennedy C, van der Laan M (2017) SuperLearner: super learner prediction. R package version 2.0-22. https://CRAN.R-project.org/package=SuperLearner. Accessed 12 Sept 2018
- Pötscher B (2006) The distribution of model averaging estimators and an impossibility result regarding its estimation. Lect Notes Monogr Ser 52:113–129MathSciNetCrossRefGoogle Scholar
- Raftery A, Hoeting J, Volinsky C, Painter I, Yeung KY (2017) BMA: Bayesian model averaging. R package version 3.18.7. https://CRAN.R-project.org/package=BMA. Accessed 12 Sept 2018
- Rao CR, Wu Y (2001) On model selection. Lect Notes Monogr Ser 38:1–64Google Scholar
- Robins J, Hernan MA (2009) Estimation of the causal effects of time-varying exposures. In: Fitzmaurice G, Davidian M, Verbeke G, Molenberghs G (eds) Longitudinal data analysis. CRC Press, Boca Raton, pp 553–599Google Scholar
- Sala-I-Martin X, Doppelhofer G, Miller RI (2004) Determinants of long-term growth: a Bayesian averaging of classical estimates (bace) approach. Am Econ Rev 94(4):813–835CrossRefGoogle Scholar
- Schomaker M (2012) Shrinkage averaging estimation. Stat Pap 53(4):1015–1034MathSciNetCrossRefGoogle Scholar
- Schomaker M (2017a) MAMI: model averaging (and model selection) after multiple imputation. R package version 0.9.10Google Scholar
- Schomaker M (2017b) Model averaging and model selection after multiple imputation using the R-package MAMI. http://mami.r-forge.r-project.org. Accessed 12 Sept 2018
- Schomaker M, Heumann C (2014) Model selection and model averaging after multiple imputation. Comput Stat Data Anal 71:758–770MathSciNetCrossRefGoogle Scholar
- Schomaker M, Heumann C (2018) Bootstrap inference when using multiple imputation. Stat Med 37(14):2252–2266MathSciNetCrossRefGoogle Scholar
- Schomaker M, Davies MA, Malateste K, Renner L, Sawry S, N’Gbeche S, Technau K, Eboua FT, Tanser F, Sygnate-Sy H, Phiri S, Amorissani-Folquet M, Cox V, Koueta F, Chimbete C, Lawson-Evi A, Giddy J, Amani-Bosse C, Wood R, Egger M, Leroy V (2016) Growth and mortality outcomes for different antiretroviral therapy initiation criteria in children aged 1–5 years: a causal modelling analysis from West and Southern Africa. Epidemiology 27:237–246Google Scholar
- Sofrygin O, van der Laan MJ, Neugebauer R (2017) simcausal R package: conducting transparent and reproducible simulation studies of causal effect estimation with complex longitudinal data. J Stat Softw 81(2):1–47CrossRefGoogle Scholar
- Tibsharani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288MathSciNetGoogle Scholar
- Turek D, Fletcher D (2012) Model-averaged wald confidence intervals. Comput Stat Data Anal 56:2809–2815MathSciNetCrossRefGoogle Scholar
- Van der Laan M, Petersen M (2007) Statistical learning of origin-specific statistically optimal individualized treatment rules. Int J Biostat 3:3zbMATHGoogle Scholar
- Van der Laan M, Rose S (2011) Targeted learning. Springer, New YorkCrossRefGoogle Scholar
- Van der Laan M, Polley E, Hubbard A (2008) Super learner. Stat Appl Genet Mol Biol 6:25MathSciNetzbMATHGoogle Scholar
- Wan ATK, Zhang X, Zou GH (2010) Least squares model averaging by Mallows criterion. J Econometr 156:277–283MathSciNetCrossRefGoogle Scholar
- Wang H, Zhou S (2012) Interval estimation by frequentist model averaging. Commun Stat Theory Methods 42(23):4342–4356MathSciNetCrossRefGoogle Scholar
- Wang H, Zhang X, Zou G (2009) Frequentist model averaging: a review. J Syst Sci Complex 22:732–748MathSciNetCrossRefGoogle Scholar
- Wood SN (2006) Generalized additive models: an introduction with R. Chapman and Hall/CRC, Boca RatonCrossRefGoogle Scholar
- Yan J (2007) Enjoy the joy of copulas: with package copula. J Stat Softw 21:1–21CrossRefGoogle Scholar
- Zhang X, Liu CA (2017) Inference after model averaging in linear regression models. IEAS working paper: academic research 17-A005. Institute of Economics, Academia Sinica, Taipei, Taiwan. https://ideas.repec.org/p/sin/wpaper/17-a005.html. Accessed 12 Sept 2018
- Zhang XY, Zou GH, Liang H (2014) Model averaging and weight choice in linear mixed-effects models. Biometrika 101(1):205–218MathSciNetCrossRefGoogle Scholar
- Zhang XY, Zou GH, Carroll RJ (2015) Model averaging based on Kullback-Leibler distance. Stat Sin 25(4):1583–1598MathSciNetzbMATHGoogle Scholar
- Zhang XY, Ullah A, Zhao SW (2016a) On the dominance of mallows model averaging estimator over ordinary least squares estimator. Econ Lett 142:69–73MathSciNetCrossRefGoogle Scholar
- Zhang XY, Yu DL, Zou GH, Liang H (2016b) Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. J Am Stat Assoc 111(516):1775–1790MathSciNetCrossRefGoogle Scholar