# Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions

- 9.1k Downloads
- 1.7k Citations

## Abstract

During the last fifteen years, Akaike's entropy-based Information Criterion (AIC) has had a fundamental impact in statistical model evaluation problems. This paper studies the general theory of the AIC procedure and provides its analytical extensions in two ways without violating Akaike's main principles. These extensions make AIC asymptotically consistent and penalize overparameterization more stringently to pick only the simplest of the “true” models. These selection criteria are called CAIC and CAICF. Asymptotic properties of AIC and its extensions are investigated, and empirical performances of these criteria are studied in choosing the correct degree of a polynomial model in two different Monte Carlo experiments under different conditions.

## Key words

model selection Akaike's information criterion AIC CAIC CAICF asymptotic properties## Preview

Unable to display preview. Download preview PDF.

## References

- Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & B. F. Csaki (Eds.),
*Second International Symposium on Information Theory*, (pp. 267–281). Academiai Kiado: Budapest.Google Scholar - Akaike, H. (1974). A new look at the statistical model identification.
*IEEE Transactions on Automatic Control, AC-19*, 716–723.Google Scholar - Akaike, H. (1976). Canonical correlation analysis of time series and the use of an information criterion. In R. K. Mehra & D. G. Lainiotis (Eds.),
*System identification*(pp. 27–96). New York: Academic Press.Google Scholar - Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah (Ed.),
*Proceedings of the Symposium on Applications of Statistics*(pp. 27–47). Amsterdam: North-Holland.Google Scholar - Akaike, H. (1978). On newer statistical approaches to parameter estimation and structure determination.
*International Federation of Automatic Control, 3*, 1877–1884.Google Scholar - Akaike, H. (1979). A Bayesian extension of the minimum AIC procedure of autogressive model fitting.
*Biometrika, 66*, 237–242.Google Scholar - Akaike, H. (1981a). Likelihood of a model and information criteria.
*Journal of Econometrics, 16*, 3–14.Google Scholar - Akaike, H. (1981b). Modern development of statistical methods. In P. Eykhoff (Ed.),
*Trends and progress in system identification*(pp. 169–184). New York: Pergamon Press.Google Scholar - Akaike, H. (1987). Factor Analysis and AIC.
*Psychometrika, 52*.Google Scholar - Anderson, T. W. (1962). The choice of the degree of a polynomial regression as a multiple decision problem.
*Annals of Mathematical Statistics, 33*, 255–265.Google Scholar - Atilgan, T. (1983).
*Parameter parsimony, model selection, and smooth density estimation*. Unpublished doctoral dissertation, Madison: University of Wisconsin, Department of Statistics.Google Scholar - Atilgan, T., & Bozdogan, H. (1987, June). Information-theoretic univariate density estimation under different basis functions. A paper presented at the First Conference of the International Federation of Classification Societies, Aachen, West Germany.Google Scholar
- Atkinson, A. C. (1980). A note on the generalized information criterion for choice of a model.
*Biometrika, 67*, 413–418.Google Scholar - Bhansali, R. J., & Downham, D. Y. (1977). Some properties of the order of an autoregressive model selected by a generalization of Akaike's FPE criterion.
*Biometrika, 64*, 547–551.Google Scholar - Boltzmann, L. (1877). Über die Beziehung zwischen dem zweitin Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht.
*Wiener Berichte, 76*, 373–435.Google Scholar - Čencov, N. N. (1982).
*Statistical decision rules and optimal inference*. Providence, RI: American Mathematical Society.Google Scholar - Clergeot, H. (1984). Filter-order selection in adaptive maximum likelihood estimation.
*IEEE Transactions on Information Theory, IT-30*(2), 199–210.Google Scholar - Cox, D. R., & Hinkley, D. V. (1974).
*Theoretical statistics*. London: Chapman and Hall.Google Scholar - Davis, M. H. A., & Vinter, R. B. (1985).
*Stochastic modelling and control*. New York: Chapman and Hall.Google Scholar - Efron, B. (1967). The power of the likelihood ratio test.
*Annals of Mathematical Statistics, 38*, 802–806.Google Scholar - Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics.
*Royal Society of London. Philosophical Transactions*(Series A),*222*, 309–368.Google Scholar - Graybill, F. A. (1976),
*Theory and application of the linear model*. Boston: Duxbury Press.Google Scholar - Hannan, E. J. (1986). Remembrance of things past. In J. Gani (Ed.),
*The craft of probabilistic modelling*. New York: Springer-Verlag.Google Scholar - Hannan, E. J., & Quinn, B. G. (1979). The determination of the order of an autoregression.
*Journal of the Royal Statistical Society*, (Series B),*41*, 190–195.Google Scholar - Haughton, D. (1983). On the choice of a model to fit data from an exponential family. Unpublished doctoral dissertaion, Massachusetts Institute of Technology, Department of Mathematics, Cambridge, MA.Google Scholar
- Jaynes, E. T. (1957). Information theory and statistical mechanics.
*Physical Review, 106*, 620–630.Google Scholar - Kashyap, R. L. (1982). Optimal choice of AR and MA parts in autoregressive moving average models.
*IEEE Transactions on Pattern Analysis and Machine Intelligence, 4*, 99–104.Google Scholar - Kendall, M. G., & Stuart, M. A. (1967).
*The Advanced Theory of Statistics, Vol. 2, Second Edition*. New York: Hafner Publishing.Google Scholar - Kitagawa, G. (1979). On the use of AIC for the detection of outliers.
*Technometrics, 21*, 193–199.Google Scholar - Kullback, S. (1959).
*Information theory and statistics*. New York: John Wiley & Sons.Google Scholar - Kullback, S., & Leibler, R. A. (1951). On information and sufficiency.
*Annals of Mathematical Statistics, 22*, 79–86.Google Scholar - Larimore, W. E., & Mehra, R. K. (1985, October). The problems of overfitting data.
*Byte*, pp. 167–180.Google Scholar - Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion).
*Journal of the Royal Statistical Scociety*(Series B),*30*, 31–36.Google Scholar - Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test criteria for purposes of statistical inference.
*Biometrika, 20A*, 175–240 (Part I), 263–294 (Part II).Google Scholar - Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses.
*Royal Society of London. Philosophical Transactions*. (Series A),*231*, 289–337.Google Scholar - Parzen, E. (1982). Data modeling using quantile and density-quantile functions. In J. T. de Oliveira & B. Epstein (Eds.),
*Some recent advances in statistics*(pp. 23–52). London: Academic Press.Google Scholar - Quinn, B. G. (1980). Order determination for a multivariate autoregression.
*Journal of the Royal Statistical Society*(Series B),*42*, 182–185.Google Scholar - Rissanen, J. (1978). Modeling by shortest data description.
*Automatica, 14*, 465–471.Google Scholar - Schwarz, G. (1978). Estimating the dimension of a model.
*Annals of Statistics, 6*, 461–464.Google Scholar - Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate analysis.
*Psychometrika, 52*.Google Scholar - Shibata, R. (1983). A theoretical view of the use of AIC. In O. D. Anderson (Ed.),
*Time series analysis: Theory and practice, Vol. 4*(pp. 237–244). Amsterdam: North-Holland.Google Scholar - Silvey, S. D. (1975).
*Statistical inference*. London: Chapman and Hall.Google Scholar - Stone, C. J. (1981). Admissible selection of an accurate and parsimonious normal linear regression model.
*Annals of Statistics, 9*, 475–485.Google Scholar - Teräsvirta, T., & Mellin, I. (1986). Model selection criteria and model selection tests in regression models.
*Scandinavian Journal of Statistics, 13*, 159–171.Google Scholar - Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large.
*Transactions of the American Mathematical Society, 54*, 426–482.Google Scholar - White, H. (1982). Maximum likelihood estimation of misspecified models.
*Econometrica, 50*, 1–26.Google Scholar - Wilks, S. S. (1962).
*Mathematical Statistics*. New York: John Wiley & Sons.Google Scholar - Woodroofe, M. (1982). On model selection and the arc sine laws.
*Annals of Statistics, 10*, 1182–1194.Google Scholar