Abstract
The problem of selecting one model from a family of linear models to describe a normally distributed observed data vector is considered. The notion of the model of given dimension nearest to the observation vector is introduced and methods of estimating the risk associated with such a nearest model are discussed. This leads to new model selection criteria one of which, called the "partial bootstrap", seems particularly promising. The methods are illustrated by specializing to the problem of estimating the non-zero components of a parameter vector on which noisy observations are available.
Similar content being viewed by others
References
Akaike, L. (1970). Statistical predictor identification, Ann. Inst. Statist. Math., 22, 203–217.
Akaike, L. (1973). Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Csaki), 267–281, Akademia Kiado, Budapest.
Akaike, L. (1974). A new look at statistical model identification, IEEE Trans. Automat. Control, 19, 716–723.
Box, G. E. P. and Meyer, R. D. (1986). An analysis of unreplicated fractional factorials, Technometrics, 28, 11–18.
Breiman, L. (1992). The little bootstrap and other methods for dimensionality selection in regression: X-fixed prediction error, J. Amer. Statist. Assoc., 87, 738–754.
Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math., 31, 377–403.
Daniel, C. (1959). Use of half-normal plots in interpreting factorial two level experiments, Technometrics, 1, 311–341.
Dong, F. (1993). On the identification of active contrasts in unreplicated fractional factorials, Statistica Sinica, 3, 209–217.
Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81, 425–455.
Eubank, R. L. (1988). Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York.
Snyman, J. L. J. (1994). Model selection and estimation in multiple linear regression, Ph.D. Thesis, Department of Statistics, Potchefstroom University.
Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution, Ann. Statist., 9, 1135–1151.
Venter, J. H. and Snyman, J. L. J. (1995). A note on the generalised cross-validation criterion in linear model selection, Biometrika, 82, 215–219.
Venter, J. H. and Steel, S. J. (1990). Estimating risk reduction in Stein estimation, Canad. J. Statist., 18, 221–232.
Venter, J. H. and Steel, S. J. (1992). Some contributions to selection and estimation in the normal linear model, Ann. Inst. Statist. Math., 44, 281–297.
Venter, J. H. and Steel, S. J. (1994). Pre-test type estimators for selection of simple normal models, J. Statist. Comput. Simulation, 51, 31–48.
Venter, J. H. and Steel, S. J. (1996). A hypothesis testing approach towards identifying active contrasts, Technometrics, 38, 161–169.
Author information
Authors and Affiliations
About this article
Cite this article
Venter, J.H., Snyman, J.L.J. Linear Model Selection Based on Risk Estimation. Annals of the Institute of Statistical Mathematics 49, 321–340 (1997). https://doi.org/10.1023/A:1003119114553
Issue Date:
DOI: https://doi.org/10.1023/A:1003119114553