Bootstrapping multiple linear regression after variable selection

  • Lasanthi C. R. Pelawa Watagoda
  • David J. OliveEmail author
Regular Article


This paper suggests a method for bootstrapping the multiple linear regression model \(Y = \beta _1 + \beta _2 x_2 + \cdots + \beta _p x_p + e\) after variable selection. We develop asymptotic theory for some common least squares variable selection estimators such as forward selection with \(C_p\). Then hypothesis testing is done using three confidence regions, one of which is new. Theory suggests that the three confidence regions tend to have coverage at least as high as the nominal coverage if the sample size is large enough.


Bagging Confidence region Forward selection 



The authors thank the Editor and two referees for their work.


  1. Akaike H (1973) Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csakim F (eds) Proceedings, 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281Google Scholar
  2. Bickel PJ, Ren JJ (2001) The bootstrap in hypothesis testing. In: van Zwet WR, de Gunst M, Klaassen C, van der Vaart (eds) A state of the art in probability and statistics: festschrift for William R. van Zwet. The Institute of Mathematical Statistics, Hayward, pp 91–112Google Scholar
  3. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140zbMATHGoogle Scholar
  4. Büchlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30:927–961MathSciNetCrossRefzbMATHGoogle Scholar
  5. Buckland ST, Burnham KP, Augustin NH (1997) Model selection: an integral part of inference. Biometrics 53:603–618CrossRefzbMATHGoogle Scholar
  6. Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  7. Cook RD, Forzani L (2018) Big data and partial least squares prediction. Can J Stat 46:62–78MathSciNetCrossRefGoogle Scholar
  8. Cook RD, Forzani L (2019) Partial least squares prediction in high-dimensional regression. Ann Stat 47:884–908MathSciNetCrossRefzbMATHGoogle Scholar
  9. Cook RD, Weisberg S (1999) Applied regression including computing and graphics. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  10. Efron B (1982) The Jackknife, the bootstrap and other resampling plans. SIAM, PhiladelphiaCrossRefzbMATHGoogle Scholar
  11. Efron B (2014) Estimation and accuracy after model selection (with discussion). J Am Stat Assoc 109:991–1007MathSciNetCrossRefzbMATHGoogle Scholar
  12. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression (with discussion). Ann Stat 32:407–451CrossRefzbMATHGoogle Scholar
  13. Ferrari D, Yang Y (2015) Confidence sets for model selection by \(F\)-testing. Stat Sinica 25:1637–1658MathSciNetzbMATHGoogle Scholar
  14. Firinguetti L, Bobadilla G (2011) Asymptotic confidence intervals in ridge regression based on the Edgeworth expansion. Stat Pap 52:287–307MathSciNetCrossRefzbMATHGoogle Scholar
  15. Freedman DA (1981) Bootstrapping regression models. Ann Stat 9:1218–1228MathSciNetCrossRefzbMATHGoogle Scholar
  16. Frey J (2013) Data-driven nonparametric prediction intervals. J Stat Plan Inference 143:1039–1048MathSciNetCrossRefzbMATHGoogle Scholar
  17. Friedman JH, Hall P (2007) On bagging and nonlinear estimation. J Stat Plan Inference 137:669–683MathSciNetCrossRefzbMATHGoogle Scholar
  18. Hall P (1988) Theoretical comparisons of bootstrap confidence intervals (with discussion). Ann Stat 16:927–985CrossRefzbMATHGoogle Scholar
  19. Hjort G, Claeskens NL (2003) The focused information criterion. J Am Stat Assoc 98:900–945MathSciNetCrossRefzbMATHGoogle Scholar
  20. Imhoff DC (2018) Bootstrapping forward selection with \(C_p\). Master’s Research Paper, Southern Illinois UniversityGoogle Scholar
  21. Jones HL (1946) Linear regression functions with neglected variables. J Am Stat Assoc 41:356–369MathSciNetCrossRefzbMATHGoogle Scholar
  22. Knight K, Fu WJ (2000) Asymptotics for lasso-type estimators. Ann Stat 28:1356–1378MathSciNetCrossRefzbMATHGoogle Scholar
  23. Leeb H, Pötscher BM (2006) Can one estimate the conditional distribution of post-model-selection estimators? Ann Stat 34:2554–2591MathSciNetCrossRefzbMATHGoogle Scholar
  24. Leeb H, Pötscher BM (2008) Can one estimate the unconditional distribution of post-model-selection estimators? Econometrics Theory 24:338–376MathSciNetzbMATHGoogle Scholar
  25. Leeb H, Pötscher BM, Ewald K (2015) On various confidence intervals post-model-selection. Stat Sci 30:216–227MathSciNetCrossRefzbMATHGoogle Scholar
  26. Li K-C (1987) Asymptotic optimality for \(C_p\), \(C_L\), cross-validation and generalized cross-validation: discrete index set. Ann Stat 15:958–975CrossRefzbMATHGoogle Scholar
  27. Machado JAF, Parente P (2005) Bootstrap estimation of covariance matrices via the percentile method. Econometrics J 8:70–78MathSciNetCrossRefzbMATHGoogle Scholar
  28. Mallows C (1973) Some comments on \(C_p\). Technom 15:661–676zbMATHGoogle Scholar
  29. Meinshausen N (2007) Relaxed lasso. Comput Stat Data Anal 52:374–393MathSciNetCrossRefzbMATHGoogle Scholar
  30. Murphy C (2018) Bootstrapping forward selection with BIC. Master’s Research Paper. Southern Illinois University, CarbondaleGoogle Scholar
  31. Nishii R (1984) Asymptotic properties of criteria for selection of variables in multiple regression. Ann Stat 12:758–765MathSciNetCrossRefzbMATHGoogle Scholar
  32. Olive DJ (2013) Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. Internat J Stat Probab 2:90–100CrossRefGoogle Scholar
  33. Olive DJ (2017a) Linear regression. Springer, New YorkCrossRefzbMATHGoogle Scholar
  34. Olive DJ (2017b) Robust multivariate analysis. Springer, New YorkCrossRefzbMATHGoogle Scholar
  35. Olive DJ (2018) Applications of hyperellipsoidal prediction regions. Stat Pap 59:913–931MathSciNetCrossRefzbMATHGoogle Scholar
  36. Olive DJ (2019) Prediction and statistical learning, online course notes. (
  37. Olive DJ, Hawkins DM (2005) Variable selection for 1D regression models. Technom 47:43–50MathSciNetCrossRefGoogle Scholar
  38. Pelawa Watagoda LCR (2017) Inference after variable selection, PhD Thesis, Southern Illinois University. (
  39. Pelawa Watagoda LCR, Olive DJ (2019) Comparing shrinkage estimators with asymptotically optimal prediction intervals. Unpublished manuscript. (
  40. R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  41. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464MathSciNetCrossRefzbMATHGoogle Scholar
  42. Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88:486–494MathSciNetCrossRefzbMATHGoogle Scholar
  43. Schomaker M (2012) Shrinkage averaging estimation. Stat Pap 53:1015–1034MathSciNetCrossRefzbMATHGoogle Scholar
  44. Schomaker M, Heumann C (2014) Model selection and model averaging after multiple imputation. Computat Stat Data Anal 71:758–770MathSciNetCrossRefzbMATHGoogle Scholar
  45. Seber GAF, Lee AJ (2003) Linear regression analysis, 2nd edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
  46. Sen PK, Singer JM (1993) Large sample methods in statistics: an introduction with applications. Chapman & Hall, New YorkCrossRefzbMATHGoogle Scholar
  47. Su Z, Cook RD (2012) Inner envelopes: efficient estimation in multivariate linear regression. Biometrika 99:687–702MathSciNetCrossRefzbMATHGoogle Scholar
  48. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc B 58:267–288MathSciNetzbMATHGoogle Scholar
  49. Tibshirani RJ, Rinaldo A, Tibshirani R, Wasserman L (2018) Uniform asymptotic inference and the bootstrap after model selection. Ann Stat 46:1255–1287MathSciNetCrossRefzbMATHGoogle Scholar
  50. Tibshirani RJ, Taylor J, Lockhart R, Tibshirani R (2016) Exact post-selection inference for sequential regression procedures. J Am Stat Assoc 111:600–620MathSciNetCrossRefGoogle Scholar
  51. Wang H, Zhou SZF (2013) Interval estimation by frequentist model averaging. Commun Stat Theory Meth 42:4342–4356MathSciNetCrossRefzbMATHGoogle Scholar
  52. Yang Y (2003) Regression with multiple candidate models: selecting or mixing? Stat Sinica 13:783–809MathSciNetzbMATHGoogle Scholar
  53. Zhang J (2018) Consistency of MLE, LSE and M-estimation under mild conditions. Stat Pap to appearGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematical SciencesAppalachian State UniversityBooneUSA
  2. 2.Department of MathematicsSouthern Illinois UniversityCarbondaleUSA

Personalised recommendations