, Volume 52, Issue 3, pp 345–370 | Cite as

Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions

  • Hamparsum Bozdogan
Special Section


During the last fifteen years, Akaike's entropy-based Information Criterion (AIC) has had a fundamental impact in statistical model evaluation problems. This paper studies the general theory of the AIC procedure and provides its analytical extensions in two ways without violating Akaike's main principles. These extensions make AIC asymptotically consistent and penalize overparameterization more stringently to pick only the simplest of the “true” models. These selection criteria are called CAIC and CAICF. Asymptotic properties of AIC and its extensions are investigated, and empirical performances of these criteria are studied in choosing the correct degree of a polynomial model in two different Monte Carlo experiments under different conditions.

Key words

model selection Akaike's information criterion AIC CAIC CAICF asymptotic properties 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & B. F. Csaki (Eds.),Second International Symposium on Information Theory, (pp. 267–281). Academiai Kiado: Budapest.Google Scholar
  2. Akaike, H. (1974). A new look at the statistical model identification.IEEE Transactions on Automatic Control, AC-19, 716–723.Google Scholar
  3. Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information criterion. In R. K. Mehra & D. G. Lainiotis (Eds.),System identification (pp. 27–96). New York: Academic Press.Google Scholar
  4. Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah (Ed.),Proceedings of the Symposium on Applications of Statistics (pp. 27–47). Amsterdam: North-Holland.Google Scholar
  5. Akaike, H. (1978). On newer statistical approaches to parameter estimation and structure determination.International Federation of Automatic Control, 3, 1877–1884.Google Scholar
  6. Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of autogressive model fitting.Biometrika, 66, 237–242.Google Scholar
  7. Akaike, H. (1981a). Likelihood of a model and information criteria.Journal of Econometrics, 16, 3–14.Google Scholar
  8. Akaike, H. (1981b). Modern development of statistical methods. In P. Eykhoff (Ed.),Trends and progress in system identification (pp. 169–184). New York: Pergamon Press.Google Scholar
  9. Akaike, H. (1987). Factor Analysis and AIC.Psychometrika, 52.Google Scholar
  10. Anderson, T. W. (1962). The choice of the degree of a polynomial regression as a multiple decision problem.Annals of Mathematical Statistics, 33, 255–265.Google Scholar
  11. Atilgan, T. (1983).Parameter parsimony, model selection, and smooth density estimation. Unpublished doctoral dissertation, Madison: University of Wisconsin, Department of Statistics.Google Scholar
  12. Atilgan, T., & Bozdogan, H. (1987, June). Information-theoretic univariate density estimation under different basis functions. A paper presented at the First Conference of the International Federation of Classification Societies, Aachen, West Germany.Google Scholar
  13. Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model.Biometrika, 67, 413–418.Google Scholar
  14. Bhansali, R. J., & Downham, D. Y. (1977). Some properties of the order of an autoregressive model selected by a generalization of Akaike's FPE criterion.Biometrika, 64, 547–551.Google Scholar
  15. Boltzmann, L. (1877). Über die Beziehung zwischen dem zweitin Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht.Wiener Berichte, 76, 373–435.Google Scholar
  16. Čencov, N. N. (1982).Statistical decision rules and optimal inference. Providence, RI: American Mathematical Society.Google Scholar
  17. Clergeot, H. (1984). Filter-order selection in adaptive maximum likelihood estimation.IEEE Transactions on Information Theory, IT-30 (2), 199–210.Google Scholar
  18. Cox, D. R., & Hinkley, D. V. (1974).Theoretical statistics. London: Chapman and Hall.Google Scholar
  19. Davis, M. H. A., & Vinter, R. B. (1985).Stochastic modelling and control. New York: Chapman and Hall.Google Scholar
  20. Efron, B. (1967). The power of the likelihood ratio test.Annals of Mathematical Statistics, 38, 802–806.Google Scholar
  21. Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics.Royal Society of London. Philosophical Transactions (Series A),222, 309–368.Google Scholar
  22. Graybill, F. A. (1976),Theory and application of the linear model. Boston: Duxbury Press.Google Scholar
  23. Hannan, E. J. (1986). Remembrance of things past. In J. Gani (Ed.),The craft of probabilistic modelling. New York: Springer-Verlag.Google Scholar
  24. Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression.Journal of the Royal Statistical Society, (Series B),41, 190–195.Google Scholar
  25. Haughton, D. (1983). On the choice of a model to fit data from an exponential family. Unpublished doctoral dissertaion, Massachusetts Institute of Technology, Department of Mathematics, Cambridge, MA.Google Scholar
  26. Jaynes, E. T. (1957). Information theory and statistical mechanics.Physical Review, 106, 620–630.Google Scholar
  27. Kashyap, R. L. (1982). Optimal choice of AR and MA parts in autoregressive moving average models.IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 99–104.Google Scholar
  28. Kendall, M. G., & Stuart, M. A. (1967).The Advanced Theory of Statistics, Vol. 2, Second Edition. New York: Hafner Publishing.Google Scholar
  29. Kitagawa, G. (1979). On the use of AIC for the detection of outliers.Technometrics, 21, 193–199.Google Scholar
  30. Kullback, S. (1959).Information theory and statistics. New York: John Wiley & Sons.Google Scholar
  31. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency.Annals of Mathematical Statistics, 22, 79–86.Google Scholar
  32. Larimore, W. E., & Mehra, R. K. (1985, October). The problems of overfitting data.Byte, pp. 167–180.Google Scholar
  33. Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion).Journal of the Royal Statistical Scociety (Series B),30, 31–36.Google Scholar
  34. Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference.Biometrika, 20A, 175–240 (Part I), 263–294 (Part II).Google Scholar
  35. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.Royal Society of London. Philosophical Transactions. (Series A),231, 289–337.Google Scholar
  36. Parzen, E. (1982). Data modeling using quantile and density-quantile functions. In J. T. de Oliveira & B. Epstein (Eds.),Some recent advances in statistics (pp. 23–52). London: Academic Press.Google Scholar
  37. Quinn, B. G. (1980). Order determination for a multivariate autoregression.Journal of the Royal Statistical Society (Series B),42, 182–185.Google Scholar
  38. Rissanen, J. (1978). Modeling by shortest data description.Automatica, 14, 465–471.Google Scholar
  39. Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6, 461–464.Google Scholar
  40. Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis.Psychometrika, 52.Google Scholar
  41. Shibata, R. (1983). A theoretical view of the use of AIC. In O. D. Anderson (Ed.),Time series analysis: Theory and practice, Vol. 4 (pp. 237–244). Amsterdam: North-Holland.Google Scholar
  42. Silvey, S. D. (1975).Statistical inference. London: Chapman and Hall.Google Scholar
  43. Stone, C. J. (1981). Admissible selection of an accurate and parsimonious normal linear regression model.Annals of Statistics, 9, 475–485.Google Scholar
  44. Teräsvirta, T., & Mellin, I. (1986). Model selection criteria and model selection tests in regression models.Scandinavian Journal of Statistics, 13, 159–171.Google Scholar
  45. Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large.Transactions of the American Mathematical Society, 54, 426–482.Google Scholar
  46. White, H. (1982). Maximum likelihood estimation of misspecified models.Econometrica, 50, 1–26.Google Scholar
  47. Wilks, S. S. (1962).Mathematical Statistics. New York: John Wiley & Sons.Google Scholar
  48. Woodroofe, M. (1982). On model selection and the arc sine laws.Annals of Statistics, 10, 1182–1194.Google Scholar

Copyright information

© The Psychometric Society 1987

Authors and Affiliations

  • Hamparsum Bozdogan
    • 1
  1. 1.the Department of Mathematics, Math-Astronomy BuildingUniversity of VirginiaCharlottesville

Personalised recommendations