Abstract
Suppose that the log-likelihood-ratio sequence of two models with different numbers of estimated parameters is bounded in probability, without necessarily having a chi-square limiting distribution. Then BIC and all other related “consistent” model selection criteria, meaning those which penalize the number of estimated parameters with a weight which becomes infinite with the sample size, will, with asymptotic probability 1, select the model having fewer parameters. This note presents examples of nested and non-nested regression model pairs for which the likelihood-ratio sequence is bounded in probability and which have the property that the model in each pair with more estimated parameters has better predictive properties, for an independent replicate of the observed data, than the model with fewer parameters. Our second example also shows how a one-dimensional regressor can overfit the data used for estimation in comparison to the fit of a two-dimensional regressor.
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the likelihood principle, 2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Czáki), 267–281, Akadémiai Kiadó, Budapest.
Akaike, H. (1985). Prediction and entropy, A Celebration of Statistics (eds. A. C. Atkinson and S. E. Fienberg), Springer, New York.
Anderson, T. W. (1971). The Statistical Analysis of Time Series, Wiley, New York.
Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control, 2nd ed., Holden-Day, San Francisco.
Durbin, J. (1960). The fitting of time series models, Review of the International Institute of Statistics, 28, 233–244.
Findley, D. F. (1990). Making difficult model comparisons (submitted for publication).
Findley, D. F. and Wei, C.-Z. (1988). Beyond chi-square: likelihood ratio procedures for comparing non-nested, possibly incorrect regressors, J. Amer. Statist. Assoc. (to appear).
Findley, D. F. and Wei, C.-Z. (1991). Bias properties of AIC for possibly incorrect stochastic regression models (in preparation).
Hannan, E. J. and Quinn, B. (1979). The determination of the order of an autoregression, J. Roy. Statist. Soc. Ser. B, 41, 190–195.
Kashyap, R. L. (1980). Inconsistency of the AIC rule for estimating the order of autoregressive models, IEEE Trans. Automat. Control, AC- 25, 996–998.
Levinson, N. (1946). The Wiener RMS (root mean square) error criterion in filter design and prediction, J. Math. Phys., 25, 261–278.
Poskitt, D. S. (1987). Precision, complexity and Bayesian model determination, J. Roy. Statist. Soc. Ser. B, 49, 199–208.
Raftery, A. E. and Martin, R. D. (1988). Reply, J. Amer. Statist. Assoc., 83, 1231.
Rissanen, J. (1978). Modelling by shortest data description, Automatica—J. IFAC, 14, 465–471.
Rissanen, J. (1986). Stochastic complexity and modeling, Ann. Statist., 14, 1080–1100.
Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore.
Schwarz, G. (1978). Estimating the dimension of a model, Ann. Statist., 6, 461–464.
Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike's information criterion, Biometrika, 63, 117–126.
Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process, Ann. Statist., 8, 147–164.
Shibata, R. (1981). An optimal selection of regression variables, Biometrika, 68, 45–54 (Correction: ibid. 69, 494).
Takada, Y. (1982). Admissibility of some variable selection rules in the linear regression model, J. Japan Statist. Soc., 12, 45–49.
Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica, 57, 307–333.
Wei, C.-Z. (1991). On predictive least squares principles, Ann. Statist. (to appear).
White, H. (1990). Estimation, Inference and Specification Analysis, Cambridge University Press, New York.
Woodroofe, M. (1982). On model selection and are sine laws, Ann. Statist., 10, 1182–1194.
Author information
Authors and Affiliations
Additional information
An earlier version of this article was presented at the Symposium on the Analysis of Statistical Information held in the Institute of Statistical Mathematics, Tokyo during December 5–8, 1989.
About this article
Cite this article
Findley, D.F. Counterexamples to parsimony and BIC. Ann Inst Stat Math 43, 505–514 (1991). https://doi.org/10.1007/BF00053369
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00053369