Skip to main content

Prediction and Entropy

  • Conference paper
A Celebration of Statistics

Abstract

The emergence of the magic number 2 in recent statistical literature is explained by adopting the predictive point of view of statistics with entropy as the basic criterion of the goodness of a fitted model. The historical development of the concept of entropy is reviewed, and its relation to statistics is explained by examples. The importance of the entropy maximization principle as a basis of the unification of conventional and Bayesian statistics is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  • Aitchison, J. (1975). “Goodness of prediction fit.” Biometrika, 62, 547–554

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1969). “Fitting autoregressive models for prediction.” Ann. Inst. Statist. Math., 21, 243–247.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1970). “Statistical predictor identification.” Ann. Inst. Statist. Math., 22, 203–217.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1973). “Information theory and an extension of the maximum likelihood principle.” In B. N. Petrov and F. Csaki (eds.), Second International Symposium on Information Theory. Budapest: Akademiai Kiado, 267–281.

    Google Scholar 

  • Akaike, H. (1974). “A new look at the statistical model identification.” IEEE Trans. Automat. Control, AC-19, 716–723.

    Article  MathSciNet  Google Scholar 

  • Akaike, H. (1977). On entropy maximization principle. In P. R. Krishnaiah, (ed.), Applications of Statistics. Amsterdam: North-Holland, 27–41.

    Google Scholar 

  • Akaike, H. (1978a). “A Bayesian analysis of the minimum aic procedure”. Ann. Inst. Statist. Math., 30A, 9–14.

    Article  MathSciNet  Google Scholar 

  • Akaike, H. (1978b). “A new look at the Bayes procedure”. Biometrika, 65, 53–59.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1979). “A Bayesian extension of the minimum aic procedure of autoregressive model fitting.” Biometrika, 66, 237–242.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1980a). “Seasonal adjustment by a Bayesian modeling.” J. Time Series Anal., 1, 1–13.

    Article  MathSciNet  MATH  Google Scholar 

  • Akaike, H. (1980b). “Ignorance prior distribution of a hyperparameter and Stein‘s estimator.” Ann. Inst. Statist. Math., 33A, 171–179.

    Article  MathSciNet  Google Scholar 

  • Akaike, H. (1981). “Abstract and commentary on ‘A new look at the statistical model identification’.” Current Contents, Engineering, Technology and Applied Sciences, 12, No. 51, 22.

    Google Scholar 

  • Akaike, H. (1983a). “On minimum information prior distributions.” Ann. Inst. Statist. Math., 34A, 139–149.

    Article  MathSciNet  Google Scholar 

  • Akaike, H. (1983b). “Information measures and model selection.” In Proceedings of the 44th Session of ISI, 1, 277–291.

    Google Scholar 

  • Atkinson, A. C. (1980). “A note on the generalized information criterion for choice of a model.” Biometrika, 67, 413–418.

    Article  MATH  Google Scholar 

  • Bahadur, R. R. (1967). An optimal property of the likelihood ratio statistic. In L. M. Lam and J. Neyman (eds.), Proc. 5th Berkeley Symp. Math. Statist. and Probab., 1. Berkeley: Univ. of California Press, 13–26.

    Google Scholar 

  • Bernardo, J. M. (1979). “Reference posterior distributions for Bayesian inference (with discussion).” J. Roy. Statist. Soc. Ser. B, 41, 113–147.

    MathSciNet  MATH  Google Scholar 

  • Boltzman, L. (1872). “Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen.” Wiener Berichte, 66, 275–370.

    Google Scholar 

  • Boltzman, L. (1877a). “Bemerkungen über einige Probleme der mechanischen Wärmetheorie.” Wiener Berichte, 75, 62–100.

    Google Scholar 

  • Boltzman, L. (1877b). “Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respective den Sätzen über das Wärmegleichgewicht.” Wiener Berichte, 76, 373–435.

    Google Scholar 

  • Boltzmann, L. (1878). “Weitere Bemerkungen über einige Plobleme der mechanischen Wärmetheorie.” Wiener Berichte, 78, 7–46.

    Google Scholar 

  • Box, G. E. P. (1980). “Sampling and Bayes’ inference in scientific modelling and robustness.” J. Roy. Statist. Soc. Ser. A, 143, 383–430.

    Article  MathSciNet  MATH  Google Scholar 

  • Chernoff, H. (1956). “Large sample theory—parametric case.” Ann. Math. Statist., 27, 1–22.

    Article  MathSciNet  MATH  Google Scholar 

  • Csiszar, I. (1975). “I-divergence geometry of probability distributions and minimization problems.” Ann. Probab., 3, 146–158.

    Article  MathSciNet  MATH  Google Scholar 

  • Fienberg, S. E. (1980). Analysis of Cross-classified Categorical Data (2nd ed.). Cambridge, MA: M.I.T. Press.

    MATH  Google Scholar 

  • de Finetti, B. (1972). Probability, Induction and Statistics. London: Wiley.

    MATH  Google Scholar 

  • Fisher, R. A. (1935). “The fiducial argument in statistical inference.” Ann. Eugenics, 6, 391–398. Paper 25 in Contributions to Mathematical Statistics (1950). New York: Wiley.

    Google Scholar 

  • Fisher, R. A. (1936). “Uncertain inference.” Proc. Amer. Acad. Arts and Sciences, 71, 245–258.

    Article  Google Scholar 

  • Geisser, S. and Eddy, W. F. (1979). “A predictive approach to model selection.” J. Amer. Statist. Assoc., 74, 153–160.

    Article  MathSciNet  MATH  Google Scholar 

  • Good, I. J. (1965). The Estimation of Probabilities. Cambridge, MA: M.I.T. Press.

    MATH  Google Scholar 

  • Goodman, L. A. (1971). “The analysis of multidimensional contingency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications.” Technometries, 13, 33–61.

    Article  MATH  Google Scholar 

  • Guttman, I. (1967). “The use of the concept of a future observation in goodness-of-fit problems.” J Roy. Statist. Soc. Ser. B, 29, 83–100.

    MathSciNet  MATH  Google Scholar 

  • Hannan, E. J. and Quinn, B. G. (1979). “The determination of the order of an autoregression.” J. Roy. Statist. Soc. Ser. B, 41, 190–195.

    MathSciNet  MATH  Google Scholar 

  • Jaynes, E. T. (1957). “Information theory and statistical mechanics.” Phys. Rev., 106, 620–630; 108, 171–182.

    Article  MathSciNet  Google Scholar 

  • Jeffreys, H. (1946). “An invariant form for the prior probability in estimation problems.” Proc. Roy. Soc. London Ser. A, 186, 453–461.

    Article  MathSciNet  MATH  Google Scholar 

  • Kerridge, D. F. (1961). “Inaccuracy and inference.” J. Roy. Statist. Soc. Ser. B, 23, 184–194.

    MathSciNet  MATH  Google Scholar 

  • Kullback, S. (1959). Information Theory and Statistics. New York: Wiley.

    MATH  Google Scholar 

  • Kullback, S. and Leibler, R. A. (1951). “On information and sufficiency.” Ann. Math. Statist., 22, 79–86.

    Article  MathSciNet  MATH  Google Scholar 

  • Leonard, T. (1977). “A Bayesian approach to some multinomial estimation and pretesting problems.” J. Amer. Statist. Assoc., 72, 869–876.

    Article  MathSciNet  MATH  Google Scholar 

  • Leonard, T. and Ord, K. (1976). “An investigation of the F-test procedure as an estimation short-cut.” J. Roy. Statist. Soc. Ser. B, 38, 95–98.

    MathSciNet  Google Scholar 

  • Lindley, D. V. (1956). “On a measure of the information provided by an experiment.” Ann. Math. Statist., 27, 986–1005.

    Article  MathSciNet  MATH  Google Scholar 

  • Mallows, C. L. (1973). “Some comments on C p .” Technometrics, 15, 661–675.

    Article  MATH  Google Scholar 

  • Pearson, K. (1929). “Laplace, being extracts from lectures delivered by Karl Pearson.” Biometrika, 21, 202–216.

    MATH  Google Scholar 

  • Rao, C. R. (1961). “Asymptotic efficiency and limiting information.” In J. Neyman, (ed.), Proc. 4th Berkeley Symp. Math. Statist, and Probab., 1. Berkeley: Univ. of California Press, 531–548.

    Google Scholar 

  • Rao, C. R. (1962). “Efficient estimates and optimum inference procedures in large samples.” J. Roy. Statist. Soc. Ser. B, 24, 46–72.

    MathSciNet  Google Scholar 

  • Sanov, I. N. (1957). “On the probability of large deviations of random variables.” (in Russian). Mat. Sbornik N.S., 42, No. 84, 11–44. English transl., Selected Transl. Math. Statist. Probab., 1 (1961), 213–244.

    MathSciNet  Google Scholar 

  • Schwarz, G. (1978). “Estimating the dimension of a model.” Ann. Statist., 6, 461–464.

    Article  MathSciNet  MATH  Google Scholar 

  • Shannon, C. E. and Weaver, W. (1949). The Mathematical Theory of Communication. Urbana: Univ. of Illinois Press.

    MATH  Google Scholar 

  • Shibata, R. (1980). “Asymptotically efficient selection of the order of the model for estimating parameter of a linear process.” Ann. Statist., 8, 147–164.

    Article  MathSciNet  MATH  Google Scholar 

  • Smith, A. F. M. and Spiegelhalter, D. J. (1980). “Bayes factors and choice criteria for linear models.” J. Roy. Statist. Soc. Ser. B, 42, 213–220.

    MathSciNet  MATH  Google Scholar 

  • Stigler, S. M. (1975). “The transition from point to distribution estimation.” In Proceedings of the 40th ISI Meeting, 2, 332–340.

    Google Scholar 

  • Stone, C. J. (1982). “Local asymptotic admissibility of a generalization of Akaike ‘s model selection rule.” Ann. Inst. Statist. Math., 34A, 123–133.

    Article  Google Scholar 

  • Stone, M. (1974). “Large deviations of empirical probability measures.” Ann. Statist., 2, 362–366.

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, M. (1977a). “Asymptotics for and against cross-validation.” Biometrika, 64, 29–35.

    Article  MathSciNet  MATH  Google Scholar 

  • Stone, M. (1977b). “Asymptotics equivalence of choice of models by cross-validation and Akaike ‘s criterion.” J. Roy. Statist. Soc. Ser. B, 39, 44–47.

    MathSciNet  MATH  Google Scholar 

  • Wald, A. (1943). “Tests of statistical hypotheses concerning several parameters when the number of observations is large.” Trans. Amer. Math. Soc., 54, 426–482.

    Article  MathSciNet  MATH  Google Scholar 

  • Williams, P. M. (1980). “Bayesian conditionalization and the principle of minimum information.” Brit. J. Philos. Sci., 31, 131–144.

    Article  Google Scholar 

  • Zellner, A. (1977). “Maximal data information prior distributions.” In A. Aykac and C. Brumat (eds.), New Developments in the Applications of Bayesian Methods. Amsterdam: North-Holland, 211–232.

    Google Scholar 

  • Zellner, A. (1978). “Jeffreys-Bayes posterior odds ratio and the Akaike information criterion for discriminating between models.” Economic Letters, 1, 337–342.

    Article  MathSciNet  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1985 Springer-Verlag New York Inc.

About this paper

Cite this paper

Akaike, H. (1985). Prediction and Entropy. In: Atkinson, A.C., Fienberg, S.E. (eds) A Celebration of Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4613-8560-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-8560-8_1

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4613-8562-2

  • Online ISBN: 978-1-4613-8560-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics