Annals of the Institute of Statistical Mathematics

, Volume 43, Issue 3, pp 505–514 | Cite as

Counterexamples to parsimony and BIC

  • David F. Findley
Model Selection


Suppose that the log-likelihood-ratio sequence of two models with different numbers of estimated parameters is bounded in probability, without necessarily having a chi-square limiting distribution. Then BIC and all other related “consistent” model selection criteria, meaning those which penalize the number of estimated parameters with a weight which becomes infinite with the sample size, will, with asymptotic probability 1, select the model having fewer parameters. This note presents examples of nested and non-nested regression model pairs for which the likelihood-ratio sequence is bounded in probability and which have the property that the model in each pair with more estimated parameters has better predictive properties, for an independent replicate of the observed data, than the model with fewer parameters. Our second example also shows how a one-dimensional regressor can overfit the data used for estimation in comparison to the fit of a two-dimensional regressor.

Key words and phrases

Model selection linear regression misspecified models AIC BIC MDL Hannan-Quinn criterion overfitting 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1973). Information theory and an extension of the likelihood principle, 2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Czáki), 267–281, Akadémiai Kiadó, Budapest.Google Scholar
  2. Akaike, H. (1985). Prediction and entropy, A Celebration of Statistics (eds. A. C. Atkinson and S. E. Fienberg), Springer, New York.Google Scholar
  3. Anderson, T. W. (1971). The Statistical Analysis of Time Series, Wiley, New York.Google Scholar
  4. Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control, 2nd ed., Holden-Day, San Francisco.Google Scholar
  5. Durbin, J. (1960). The fitting of time series models, Review of the International Institute of Statistics, 28, 233–244.Google Scholar
  6. Findley, D. F. (1990). Making difficult model comparisons (submitted for publication).Google Scholar
  7. Findley, D. F. and Wei, C.-Z. (1988). Beyond chi-square: likelihood ratio procedures for comparing non-nested, possibly incorrect regressors, J. Amer. Statist. Assoc. (to appear).Google Scholar
  8. Findley, D. F. and Wei, C.-Z. (1991). Bias properties of AIC for possibly incorrect stochastic regression models (in preparation).Google Scholar
  9. Hannan, E. J. and Quinn, B. (1979). The determination of the order of an autoregression, J. Roy. Statist. Soc. Ser. B, 41, 190–195.Google Scholar
  10. Kashyap, R. L. (1980). Inconsistency of the AIC rule for estimating the order of autoregressive models, IEEE Trans. Automat. Control, AC- 25, 996–998.Google Scholar
  11. Levinson, N. (1946). The Wiener RMS (root mean square) error criterion in filter design and prediction, J. Math. Phys., 25, 261–278.Google Scholar
  12. Poskitt, D. S. (1987). Precision, complexity and Bayesian model determination, J. Roy. Statist. Soc. Ser. B, 49, 199–208.Google Scholar
  13. Raftery, A. E. and Martin, R. D. (1988). Reply, J. Amer. Statist. Assoc., 83, 1231.Google Scholar
  14. Rissanen, J. (1978). Modelling by shortest data description, Automatica—J. IFAC, 14, 465–471.Google Scholar
  15. Rissanen, J. (1986). Stochastic complexity and modeling, Ann. Statist., 14, 1080–1100.Google Scholar
  16. Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore.Google Scholar
  17. Schwarz, G. (1978). Estimating the dimension of a model, Ann. Statist., 6, 461–464.Google Scholar
  18. Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike's information criterion, Biometrika, 63, 117–126.Google Scholar
  19. Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process, Ann. Statist., 8, 147–164.Google Scholar
  20. Shibata, R. (1981). An optimal selection of regression variables, Biometrika, 68, 45–54 (Correction: ibid. 69, 494).Google Scholar
  21. Takada, Y. (1982). Admissibility of some variable selection rules in the linear regression model, J. Japan Statist. Soc., 12, 45–49.Google Scholar
  22. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica, 57, 307–333.Google Scholar
  23. Wei, C.-Z. (1991). On predictive least squares principles, Ann. Statist. (to appear).Google Scholar
  24. White, H. (1990). Estimation, Inference and Specification Analysis, Cambridge University Press, New York.Google Scholar
  25. Woodroofe, M. (1982). On model selection and are sine laws, Ann. Statist., 10, 1182–1194.Google Scholar

Copyright information

© The Institute of Statistical Mathematics 1991

Authors and Affiliations

  • David F. Findley
    • 1
    • 2
  1. 1.Statistical Research DivisionU.S. Bureau of the CensusWashington, D.C.U.S.A.
  2. 2.Institute of Statistical ScienceAcademia SinicaTaipeiTaiwan

Personalised recommendations