Abstract
Several criteria, such as CV, C p , AIC, CAIC, and MAIC, are used for selecting variables in linear regression models. It might be noted that C p has been proposed as an estimator of the expected standardized prediction error, although the target risk function of CV might be regarded as the expected prediction error R PE. On the other hand, the target risk function of AIC, CAIC, and MAIC is the expected log-predictive likelihood. In this paper, we propose a prediction error criterion, PE, which is an estimator of the expected prediction error R PE. Consequently, it is also a competitor of CV. Results of this study show that PE is an unbiased estimator when the true model is contained in the full model. The property is shown without the assumption of normality. In fact, PE is demonstrated as more faithful for its risk function than CV. The prediction error criterion PE is extended to the multivariate case. Furthermore, using simulations, we examine some peculiarities of all these criteria.
Similar content being viewed by others
References
Akaike H. (1973) Informaiton theory and an extension of the maximum likelihood principle. In: Petrov B.N., Csáki F. (eds) 2nd International symposium on information theory. Budapest, Akadémia Kiado, pp 267–281
Allen D.M. (1971) Mean square error of prediction as a criterion for selecting variables. Technometrics 13: 469–475
Allen D.M. (1974) The relationship between variable selection and data augumentation, and a method for prediction. Technometrics 16: 125–127
Bedrick E.D., Tsai C.L. (1994) Model selection for multivariate regression in small samples. Biometrics 50: 226–231
Davies S.L., Neath A.A., Cavanaugh J.E. (2006) Estimation of optimality of corrected AIC and modified C p in linear regression. International Statistical Review 74: 161–168
Fujikoshi Y., Satoh K. (1997) Modified AIC and C p in multivariate linear regression. Biometrika 84: 707–716
Haga Y., Takeuchi K., Okuno C. (1973) New criteria for selecting of variables in regression model. Quality (Hinshitsu, Journal of the Japanese Society for Quality Control) 6: 73–78 (in Japanese)
Hocking R.R. (1972) Criteria for selecting of a subset regression; which one should be used. Technometrics 14: 967–970
Mallows C.L. (1973) Some comments on C p . Technometrics 15: 661–675
Mallows C.L. (1995) More comments on C p . Technometrics 37: 362–372
Stone M. (1974) Cross-validatory choice and assesment of statistical predictions (with Discussion). Journal of the Royal Statistical Society, B 36: 111–147
Sugiura N. (1978) Futher analysis of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics: Theory and Methods 7: 13–26
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Fujikoshi, Y., Kan, T., Takahashi, S. et al. Prediction error criterion for selecting variables in a linear regression model. Ann Inst Stat Math 63, 387–403 (2011). https://doi.org/10.1007/s10463-009-0233-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-009-0233-5