Prediction and Entropy

Akaike, Hirotugu

doi:10.1007/978-1-4613-8560-8_1

Hirotugu Akaike

867 Accesses
131 Citations

Abstract

The emergence of the magic number 2 in recent statistical literature is explained by adopting the predictive point of view of statistics with entropy as the basic criterion of the goodness of a fitted model. The historical development of the concept of entropy is reviewed, and its relation to statistics is explained by examples. The importance of the entropy maximization principle as a basis of the unification of conventional and Bayesian statistics is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bibliography

Aitchison, J. (1975). “Goodness of prediction fit.” Biometrika, 62, 547–554
Article MathSciNet MATH Google Scholar
Akaike, H. (1969). “Fitting autoregressive models for prediction.” Ann. Inst. Statist. Math., 21, 243–247.
Article MathSciNet MATH Google Scholar
Akaike, H. (1970). “Statistical predictor identification.” Ann. Inst. Statist. Math., 22, 203–217.
Article MathSciNet MATH Google Scholar
Akaike, H. (1973). “Information theory and an extension of the maximum likelihood principle.” In B. N. Petrov and F. Csaki (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kiado, 267–281.
Google Scholar
Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Trans. Automat. Control, AC-19, 716–723.
Article MathSciNet Google Scholar
Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah, (ed.), Applications of Statistics. Amsterdam: North-Holland, 27–41.
Google Scholar
Akaike, H. (1978a). “A Bayesian analysis of the minimum aic procedure”. Ann. Inst. Statist. Math., 30A, 9–14.
Article MathSciNet Google Scholar
Akaike, H. (1978b). “A new look at the Bayes procedure”. Biometrika, 65, 53–59.
Article MathSciNet MATH Google Scholar
Akaike, H. (1979). “A Bayesian extension of the minimum aic procedure of autoregressive model fitting.” Biometrika, 66, 237–242.
Article MathSciNet MATH Google Scholar
Akaike, H. (1980a). “Seasonal adjustment by a Bayesian modeling.” J. Time Series Anal., 1, 1–13.
Article MathSciNet MATH Google Scholar
Akaike, H. (1980b). “Ignorance prior distribution of a hyperparameter and Stein‘s estimator.” Ann. Inst. Statist. Math., 33A, 171–179.
Article MathSciNet Google Scholar
Akaike, H. (1981). “Abstract and commentary on ‘A new look at the statistical model identification’.” Current Contents, Engineering, Technology and Applied Sciences, 12, No. 51, 22.
Google Scholar
Akaike, H. (1983a). “On minimum information prior distributions.” Ann. Inst. Statist. Math., 34A, 139–149.
Article MathSciNet Google Scholar
Akaike, H. (1983b). “Information measures and model selection.” In Proceedings of the 44th Session of ISI, 1, 277–291.
Google Scholar
Atkinson, A. C. (1980). “A note on the generalized information criterion for choice of a model.” Biometrika, 67, 413–418.
Article MATH Google Scholar
Bahadur, R. R. (1967). An optimal property of the likelihood ratio statistic. In L. M. Lam and J. Neyman (eds.), Proc. 5th Berkeley Symp. Math. Statist. and Probab., 1. Berkeley: Univ. of California Press, 13–26.
Google Scholar
Bernardo, J. M. (1979). “Reference posterior distributions for Bayesian inference (with discussion).” J. Roy. Statist. Soc. Ser. B, 41, 113–147.
MathSciNet MATH Google Scholar
Boltzman, L. (1872). “Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen.” Wiener Berichte, 66, 275–370.
Google Scholar
Boltzman, L. (1877a). “Bemerkungen über einige Probleme der mechanischen Wärmetheorie.” Wiener Berichte, 75, 62–100.
Google Scholar
Boltzman, L. (1877b). “Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht.” Wiener Berichte, 76, 373–435.
Google Scholar
Boltzmann, L. (1878). “Weitere Bemerkungen über einige Plobleme der mechanischen Wärmetheorie.” Wiener Berichte, 78, 7–46.
Google Scholar
Box, G. E. P. (1980). “Sampling and Bayes’ inference in scientific modelling and robustness.” J. Roy. Statist. Soc. Ser. A, 143, 383–430.
Article MathSciNet MATH Google Scholar
Chernoff, H. (1956). “Large sample theory—parametric case.” Ann. Math. Statist., 27, 1–22.
Article MathSciNet MATH Google Scholar
Csiszar, I. (1975). “I-divergence geometry of probability distributions and minimization problems.” Ann. Probab., 3, 146–158.
Article MathSciNet MATH Google Scholar
Fienberg, S. E. (1980). Analysis of Cross-classified Categorical Data (2nd ed.). Cambridge, MA: M.I.T. Press.
MATH Google Scholar
de Finetti, B. (1972). Probability, Induction and Statistics. London: Wiley.
MATH Google Scholar
Fisher, R. A. (1935). “The fiducial argument in statistical inference.” Ann. Eugenics, 6, 391–398. Paper 25 in Contributions to Mathematical Statistics (1950). New York: Wiley.
Google Scholar
Fisher, R. A. (1936). “Uncertain inference.” Proc. Amer. Acad. Arts and Sciences, 71, 245–258.
Article Google Scholar
Geisser, S. and Eddy, W. F. (1979). “A predictive approach to model selection.” J. Amer. Statist. Assoc., 74, 153–160.
Article MathSciNet MATH Google Scholar
Good, I. J. (1965). The Estimation of Probabilities. Cambridge, MA: M.I.T. Press.
MATH Google Scholar
Goodman, L. A. (1971). “The analysis of multidimensional contingency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications.” Technometries, 13, 33–61.
Article MATH Google Scholar
Guttman, I. (1967). “The use of the concept of a future observation in goodness-of-fit problems.” J Roy. Statist. Soc. Ser. B, 29, 83–100.
MathSciNet MATH Google Scholar
Hannan, E. J. and Quinn, B. G. (1979). “The determination of the order of an autoregression.” J. Roy. Statist. Soc. Ser. B, 41, 190–195.
MathSciNet MATH Google Scholar
Jaynes, E. T. (1957). “Information theory and statistical mechanics.” Phys. Rev., 106, 620–630; 108, 171–182.
Article MathSciNet Google Scholar
Jeffreys, H. (1946). “An invariant form for the prior probability in estimation problems.” Proc. Roy. Soc. London Ser. A, 186, 453–461.
Article MathSciNet MATH Google Scholar
Kerridge, D. F. (1961). “Inaccuracy and inference.” J. Roy. Statist. Soc. Ser. B, 23, 184–194.
MathSciNet MATH Google Scholar
Kullback, S. (1959). Information Theory and Statistics. New York: Wiley.
MATH Google Scholar
Kullback, S. and Leibler, R. A. (1951). “On information and sufficiency.” Ann. Math. Statist., 22, 79–86.
Article MathSciNet MATH Google Scholar
Leonard, T. (1977). “A Bayesian approach to some multinomial estimation and pretesting problems.” J. Amer. Statist. Assoc., 72, 869–876.
Article MathSciNet MATH Google Scholar
Leonard, T. and Ord, K. (1976). “An investigation of the F-test procedure as an estimation short-cut.” J. Roy. Statist. Soc. Ser. B, 38, 95–98.
MathSciNet Google Scholar
Lindley, D. V. (1956). “On a measure of the information provided by an experiment.” Ann. Math. Statist., 27, 986–1005.
Article MathSciNet MATH Google Scholar
Mallows, C. L. (1973). “Some comments on C _p.” Technometrics, 15, 661–675.
Article MATH Google Scholar
Pearson, K. (1929). “Laplace, being extracts from lectures delivered by Karl Pearson.” Biometrika, 21, 202–216.
MATH Google Scholar
Rao, C. R. (1961). “Asymptotic efficiency and limiting information.” In J. Neyman, (ed.), Proc. 4th Berkeley Symp. Math. Statist, and Probab., 1. Berkeley: Univ. of California Press, 531–548.
Google Scholar
Rao, C. R. (1962). “Efficient estimates and optimum inference procedures in large samples.” J. Roy. Statist. Soc. Ser. B, 24, 46–72.
MathSciNet Google Scholar
Sanov, I. N. (1957). “On the probability of large deviations of random variables.” (in Russian). Mat. Sbornik N.S., 42, No. 84, 11–44. English transl., Selected Transl. Math. Statist. Probab., 1 (1961), 213–244.
MathSciNet Google Scholar
Schwarz, G. (1978). “Estimating the dimension of a model.” Ann. Statist., 6, 461–464.
Article MathSciNet MATH Google Scholar
Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.
MATH Google Scholar
Shibata, R. (1980). “Asymptotically efficient selection of the order of the model for estimating parameter of a linear process.” Ann. Statist., 8, 147–164.
Article MathSciNet MATH Google Scholar
Smith, A. F. M. and Spiegelhalter, D. J. (1980). “Bayes factors and choice criteria for linear models.” J. Roy. Statist. Soc. Ser. B, 42, 213–220.
MathSciNet MATH Google Scholar
Stigler, S. M. (1975). “The transition from point to distribution estimation.” In Proceedings of the 40th ISI Meeting, 2, 332–340.
Google Scholar
Stone, C. J. (1982). “Local asymptotic admissibility of a generalization of Akaike ‘s model selection rule.” Ann. Inst. Statist. Math., 34A, 123–133.
Article Google Scholar
Stone, M. (1974). “Large deviations of empirical probability measures.” Ann. Statist., 2, 362–366.
Article MathSciNet MATH Google Scholar
Stone, M. (1977a). “Asymptotics for and against cross-validation.” Biometrika, 64, 29–35.
Article MathSciNet MATH Google Scholar
Stone, M. (1977b). “Asymptotics equivalence of choice of models by cross-validation and Akaike ‘s criterion.” J. Roy. Statist. Soc. Ser. B, 39, 44–47.
MathSciNet MATH Google Scholar
Wald, A. (1943). “Tests of statistical hypotheses concerning several parameters when the number of observations is large.” Trans. Amer. Math. Soc., 54, 426–482.
Article MathSciNet MATH Google Scholar
Williams, P. M. (1980). “Bayesian conditionalization and the principle of minimum information.” Brit. J. Philos. Sci., 31, 131–144.
Article Google Scholar
Zellner, A. (1977). “Maximal data information prior distributions.” In A. Aykac and C. Brumat (eds.), New Developments in the Applications of Bayesian Methods. Amsterdam: North-Holland, 211–232.
Google Scholar
Zellner, A. (1978). “Jeffreys-Bayes posterior odds ratio and the Akaike information criterion for discriminating between models.” Economic Letters, 1, 337–342.
Article MathSciNet Google Scholar

Download references

Authors

Hirotugu Akaike
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Imperial College of Science and Technology, SW7 2BZ, London, England
Anthony C. Atkinson
Carnegie-Mellon University, 15213, Pittsburgh, Pennsylvania, USA
Stephen E. Fienberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akaike, H. (1985). Prediction and Entropy. In: Atkinson, A.C., Fienberg, S.E. (eds) A Celebration of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8560-8_1

Download citation

DOI: https://doi.org/10.1007/978-1-4613-8560-8_1
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4613-8562-2
Online ISBN: 978-1-4613-8560-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics