Bootstrapping multiple linear regression after variable selection
- 12 Downloads
This paper suggests a method for bootstrapping the multiple linear regression model \(Y = \beta _1 + \beta _2 x_2 + \cdots + \beta _p x_p + e\) after variable selection. We develop asymptotic theory for some common least squares variable selection estimators such as forward selection with \(C_p\). Then hypothesis testing is done using three confidence regions, one of which is new. Theory suggests that the three confidence regions tend to have coverage at least as high as the nominal coverage if the sample size is large enough.
KeywordsBagging Confidence region Forward selection
The authors thank the Editor and two referees for their work.
- Akaike H (1973) Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csakim F (eds) Proceedings, 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281Google Scholar
- Bickel PJ, Ren JJ (2001) The bootstrap in hypothesis testing. In: van Zwet WR, de Gunst M, Klaassen C, van der Vaart (eds) A state of the art in probability and statistics: festschrift for William R. van Zwet. The Institute of Mathematical Statistics, Hayward, pp 91–112Google Scholar
- Imhoff DC (2018) Bootstrapping forward selection with \(C_p\). Master’s Research Paper, Southern Illinois UniversityGoogle Scholar
- Murphy C (2018) Bootstrapping forward selection with BIC. Master’s Research Paper. Southern Illinois University, CarbondaleGoogle Scholar
- Olive DJ (2019) Prediction and statistical learning, online course notes. (http://lagrange.math.siu.edu/Olive/slearnbk.htm)
- Pelawa Watagoda LCR (2017) Inference after variable selection, PhD Thesis, Southern Illinois University. (http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf)
- Pelawa Watagoda LCR, Olive DJ (2019) Comparing shrinkage estimators with asymptotically optimal prediction intervals. Unpublished manuscript. (http://lagrange.math.siu.edu/Olive/pppicomp.pdf)
- R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Zhang J (2018) Consistency of MLE, LSE and M-estimation under mild conditions. Stat Pap to appearGoogle Scholar