Statistics and Computing

, Volume 24, Issue 6, pp 997–1016 | Cite as

Understanding predictive information criteria for Bayesian models

Article

Abstract

We review the Akaike, deviance, and Watanabe-Akaike information criteria from a Bayesian perspective, where the goal is to estimate expected out-of-sample-prediction error using a bias-corrected adjustment of within-sample error. We focus on the choices involved in setting up these measures, and we compare them in three simple examples, one theoretical and two applied. The contribution of this paper is to put all these information criteria into a Bayesian predictive context and to better understand, through small examples, how these methods can apply in practice.

Keywords

AIC DIC WAIC Cross-validation Prediction Bayes 

References

  1. Aitkin, M.: Statistical Inference: an Integrated Bayesian/Likelihood Approach. Chapman & Hall, London (2010) CrossRefGoogle Scholar
  2. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973). Reprinted in: Kotz, S. (ed.) Breakthroughs in Statistics, pp. 610–624. Springer, New York (1992) Google Scholar
  3. Ando, T., Tsay, R.: Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast. 26, 744–763 (2010) CrossRefGoogle Scholar
  4. Bernardo, J.M.: Expected information as expected utility. Ann. Stat. 7, 686–690 (1979) CrossRefMATHMathSciNetGoogle Scholar
  5. Bernardo, J.M., Smith, A.F.M.: Bayesian Theory. Wiley, New York (1994) CrossRefMATHGoogle Scholar
  6. Burman, P.: A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76, 503–514 (1989) CrossRefMATHMathSciNetGoogle Scholar
  7. Burman, P., Chow, E., Nolan, D.: A cross-validatory method for dependent data. Biometrika 81, 351–358 (1994) CrossRefMATHMathSciNetGoogle Scholar
  8. Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference: a Practical Information Theoretic Approach. Springer, New York (2002) Google Scholar
  9. Celeux, G., Forbes, F., Robert, C., Titterington, D.: Deviance information criteria for missing data models. Bayesian Anal. 1, 651–706 (2006) CrossRefMathSciNetGoogle Scholar
  10. DeGroot, M.H.: Optimal Statistical Decisions. McGraw-Hill, New York (1970) MATHGoogle Scholar
  11. Dempster, A.P.: The direct use of likelihood for significance testing. In: Proceedings of Conference on Foundational Questions in Statistical Inference, Department of Theoretical Statistics: University of Aarhus, pp. 335–352 (1974) Google Scholar
  12. Draper, D.: Model uncertainty yes, discrete model averaging maybe. Stat. Sci. 14, 405–409 (1999) MathSciNetGoogle Scholar
  13. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall, New York (1993) CrossRefMATHGoogle Scholar
  14. Geisser, S., Eddy, W.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979) CrossRefMATHMathSciNetGoogle Scholar
  15. Gelfand, A., Dey, D.: Bayesian model choice: asymptotics and exact calculations. J. R. Stat. Soc. B 56, 501–514 (1994) MATHMathSciNetGoogle Scholar
  16. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. CRC Press, London (2003) Google Scholar
  17. Gelman, A., Meng, X.L., Stern, H.S.: Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Stat. Sin. 6, 733–807 (1996) MATHMathSciNetGoogle Scholar
  18. Gelman, A., Shalizi, C.: Philosophy and the practice of Bayesian statistics (with discussion). Br. J. Math. Stat. Psychol. 66, 8–80 (2013) CrossRefMathSciNetGoogle Scholar
  19. Gneiting, T.: Making and evaluating point forecasts. J. Am. Stat. Assoc. 106, 746–762 (2011) CrossRefMATHMathSciNetGoogle Scholar
  20. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007) CrossRefMATHMathSciNetGoogle Scholar
  21. Hibbs, D.: Implications of the ‘bread and peace’ model for the 2008 U.S. presidential election. Public Choice 137, 1–10 (2008) CrossRefGoogle Scholar
  22. Hoeting, J., Madigan, D., Raftery, A.E., Volinsky, C.: Bayesian model averaging (with discussion). Stat. Sci. 14, 382–417 (1999) CrossRefMATHMathSciNetGoogle Scholar
  23. Jones, H.E., Spiegelhalter, D.J.: Improved probabilistic prediction of healthcare performance indicators using bidirectional smoothing models. J. R. Stat. Soc. A 175, 729–747 (2012) CrossRefMathSciNetGoogle Scholar
  24. McCulloch, R.E.: Local model influence. J. Am. Stat. Assoc. 84, 473–478 (1989) CrossRefGoogle Scholar
  25. Plummer, M.: Penalized loss functions for Bayesian model comparison. Biostatistics 9, 523–539 (2008) CrossRefMATHGoogle Scholar
  26. Ripley, B.D.: Statistical Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996) CrossRefGoogle Scholar
  27. Robert, C.P.: Intrinsic losses. Theory Decis. 40, 191–214 (1996) CrossRefMATHGoogle Scholar
  28. Rubin, D.B.: Estimation in parallel randomized experiments. J. Educ. Stat. 6, 377–401 (1981) Google Scholar
  29. Rubin, D.B.: Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12, 1151–1172 (1984) CrossRefMATHGoogle Scholar
  30. Schwarz, G.E.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978) CrossRefMATHGoogle Scholar
  31. Shibata, R.: Statistical aspects of model selection. In: Willems, J.C. (ed.) From Data to Model, pp. 215–240. Springer, Berlin (1989) CrossRefGoogle Scholar
  32. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. B (2002) Google Scholar
  33. Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., Lunn, D.: BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England (1994, 2003). http://www.mrc-bsu.cam.ac.uk/bugs/
  34. Stone, M.: An asymptotic equivalence of choice of model cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 36, 44–47 (1977) Google Scholar
  35. van der Linde, A.: DIC in variable selection. Stat. Neerl. 1, 45–56 (2005) CrossRefGoogle Scholar
  36. Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14, 2439–2468 (2002) CrossRefMATHGoogle Scholar
  37. Vehtari, A., Ojanen, J.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv. 6, 142–228 (2012) CrossRefMATHMathSciNetGoogle Scholar
  38. Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge (2009) CrossRefMATHGoogle Scholar
  39. Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010) MATHMathSciNetGoogle Scholar
  40. Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14, 867–897 (2013) MATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of StatisticsColumbia UniversityNew YorkUSA
  2. 2.Department of StatisticsHarvard UniversityCambridgeUSA
  3. 3.Department of Biomedical Engineering and Computational ScienceAalto UniversityEspooFinland

Personalised recommendations