Abstract
This paper analyzes estimation by bootstrap variable-selection in a simple Gaussian model where the dimension of the unknown parameter may exceed that of the data. A naive use of the bootstrap in this problem produces risk estimators for candidate variable-selections that have a strong upward bias. Resampling from a less overfitted model removes the bias and leads to bootstrap variable-selections that minimize risk asymptotically. A related bootstrap technique generates confidence sets that are centered at the best bootstrap variable-selection and have two further properties: the asymptotic coverage probability for the unknown parameter is as desired and the confidence set is geometrically smaller than a classical competitor. The results suggest a possible approach to confidence sets in other inverse problems where a regularization technique is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H. (1974).: A new look at statistical model identification. IEEE Trans. Auto. Coni. 19 716–723.
Alexander, K.S. and Pyke, R. (1986): A uniform central limit theorem for set-indexed partial-sum processes with finite variance. Ann. Prob. 14 582–597.
Beran, R. (1994): Confidence sets centered at C p-estimators. Preprint.
Breiman, L. (1992): The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error. J. Amer. Statist. Assoc. 87 738–754.
Efron, B. and Tibshirani, R.J. (1993): An Introduction to the Bootstrap. Chapman & Hall, New York.
Freedman, D.A., Navidi, W., and Peters, S.C. (1988): On the impact of variable selection in fitting regression equations. On Model Uncertainty and its Statistical Implications (T.K. Dijkstra, ed.), 1–16. Springer-Verlag, Berlin.
Gasser, T., Sroka, L., and Jennen-Steinmetz, C. (1986): Residual variance and residual pattern in nonlinear regression. Biometrika 73 625–633.
James, W. and Stein, C. (1961): Estimation with quadratic loss. Proc. Fourth Berkeley Symp. Math. Statist. Prob., Vol. 1, 361–380. Univ. of California Press, Berkeley.
LeCam, L. (1983): A remark on empirical measures. Festschrift for Erich Lehmann (P.J. Bickel, K. Doksum and J.L. Hodges, eds.), 305–327. Wadsworth, Belmont, California.
Mallows, C. (1973): Some comments on C p. Technometrics 15 661–675.
Rice, J. (1984): Bandwidth choice for nonparametric regression. Ann. Statist. 12 1215–1230.
Shibata, R. (1981): An optimal selection of regression variables. Biometrika 68 45–54.
Speed, T.P. and Yu, B. (1993): Model selection and prediction: Normal regression. Ann. Inst. Statist. Math. 45 35–54.
Stein, C. (1956): Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proc. Third Berkeley Symp. Math. Statist. Prob. 1 197–206. Univ. of California Press, Berkeley.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1996 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Beran, R. (1996). Bootstrap Variable—Selection and Confidence Sets. In: Rieder, H. (eds) Robust Statistics, Data Analysis, and Computer Intensive Methods. Lecture Notes in Statistics, vol 109. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2380-1_1
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2380-1_1
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94660-3
Online ISBN: 978-1-4612-2380-1
eBook Packages: Springer Book Archive