Skip to main content
Log in

Counterexamples to parsimony and BIC

  • Model Selection
  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Suppose that the log-likelihood-ratio sequence of two models with different numbers of estimated parameters is bounded in probability, without necessarily having a chi-square limiting distribution. Then BIC and all other related “consistent” model selection criteria, meaning those which penalize the number of estimated parameters with a weight which becomes infinite with the sample size, will, with asymptotic probability 1, select the model having fewer parameters. This note presents examples of nested and non-nested regression model pairs for which the likelihood-ratio sequence is bounded in probability and which have the property that the model in each pair with more estimated parameters has better predictive properties, for an independent replicate of the observed data, than the model with fewer parameters. Our second example also shows how a one-dimensional regressor can overfit the data used for estimation in comparison to the fit of a two-dimensional regressor.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the likelihood principle, 2nd International Symposium on Information Theory (eds. B. N. Petrov and F. Czáki), 267–281, Akadémiai Kiadó, Budapest.

    Google Scholar 

  • Akaike, H. (1985). Prediction and entropy, A Celebration of Statistics (eds. A. C. Atkinson and S. E. Fienberg), Springer, New York.

    Google Scholar 

  • Anderson, T. W. (1971). The Statistical Analysis of Time Series, Wiley, New York.

    Google Scholar 

  • Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control, 2nd ed., Holden-Day, San Francisco.

    Google Scholar 

  • Durbin, J. (1960). The fitting of time series models, Review of the International Institute of Statistics, 28, 233–244.

    Google Scholar 

  • Findley, D. F. (1990). Making difficult model comparisons (submitted for publication).

  • Findley, D. F. and Wei, C.-Z. (1988). Beyond chi-square: likelihood ratio procedures for comparing non-nested, possibly incorrect regressors, J. Amer. Statist. Assoc. (to appear).

  • Findley, D. F. and Wei, C.-Z. (1991). Bias properties of AIC for possibly incorrect stochastic regression models (in preparation).

  • Hannan, E. J. and Quinn, B. (1979). The determination of the order of an autoregression, J. Roy. Statist. Soc. Ser. B, 41, 190–195.

    Google Scholar 

  • Kashyap, R. L. (1980). Inconsistency of the AIC rule for estimating the order of autoregressive models, IEEE Trans. Automat. Control, AC- 25, 996–998.

    Google Scholar 

  • Levinson, N. (1946). The Wiener RMS (root mean square) error criterion in filter design and prediction, J. Math. Phys., 25, 261–278.

    Google Scholar 

  • Poskitt, D. S. (1987). Precision, complexity and Bayesian model determination, J. Roy. Statist. Soc. Ser. B, 49, 199–208.

    Google Scholar 

  • Raftery, A. E. and Martin, R. D. (1988). Reply, J. Amer. Statist. Assoc., 83, 1231.

    Google Scholar 

  • Rissanen, J. (1978). Modelling by shortest data description, Automatica—J. IFAC, 14, 465–471.

    Google Scholar 

  • Rissanen, J. (1986). Stochastic complexity and modeling, Ann. Statist., 14, 1080–1100.

    Google Scholar 

  • Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry, World Scientific, Singapore.

    Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model, Ann. Statist., 6, 461–464.

    Google Scholar 

  • Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike's information criterion, Biometrika, 63, 117–126.

    Google Scholar 

  • Shibata, R. (1980). Asymptotically efficient selection of the order of the model for estimating parameters of a linear process, Ann. Statist., 8, 147–164.

    Google Scholar 

  • Shibata, R. (1981). An optimal selection of regression variables, Biometrika, 68, 45–54 (Correction: ibid. 69, 494).

    Google Scholar 

  • Takada, Y. (1982). Admissibility of some variable selection rules in the linear regression model, J. Japan Statist. Soc., 12, 45–49.

    Google Scholar 

  • Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica, 57, 307–333.

    Google Scholar 

  • Wei, C.-Z. (1991). On predictive least squares principles, Ann. Statist. (to appear).

  • White, H. (1990). Estimation, Inference and Specification Analysis, Cambridge University Press, New York.

    Google Scholar 

  • Woodroofe, M. (1982). On model selection and are sine laws, Ann. Statist., 10, 1182–1194.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

An earlier version of this article was presented at the Symposium on the Analysis of Statistical Information held in the Institute of Statistical Mathematics, Tokyo during December 5–8, 1989.

About this article

Cite this article

Findley, D.F. Counterexamples to parsimony and BIC. Ann Inst Stat Math 43, 505–514 (1991). https://doi.org/10.1007/BF00053369

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00053369

Key words and phrases

Navigation