Statistics and Computing

, Volume 27, Issue 5, pp 1413–1432 | Cite as

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article

Abstract

Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

Keywords

Bayesian computation Leave-one-out cross-validation (LOO) K-fold cross-validation Widely applicable information criterion (WAIC) Stan Pareto smoothed importance sampling (PSIS) 

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
  2. Ando, T., Tsay, R.: Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast. 26, 744–763 (2010)CrossRefGoogle Scholar
  3. Arolot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)MathSciNetCrossRefMATHGoogle Scholar
  4. Bernardo, J.M., Smith A.F.M.: Bayesian Theory. Wiley, New York (1994)Google Scholar
  5. Burman, P.: A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika 76, 503–514 (1989)MathSciNetCrossRefMATHGoogle Scholar
  6. Epifani, I., MacEachern, S.N., Peruggia, M.: Case-deletion importance sampling estimators: central limit theorems and related results. Electron. J. Stat. 2, 774–806 (2008)MathSciNetCrossRefMATHGoogle Scholar
  7. Gabry, J., Goodrich, B.: rstanarm: Bayesian applied regression modeling via Stan. R package version 2.10.0. (2016). http://mc-stan.org/interfaces/rstanarm
  8. Geisser, S., Eddy, W.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)MathSciNetCrossRefMATHGoogle Scholar
  9. Gelfand, A.E.: Model determination using sampling-based methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 145–162. Chapman and Hall, London (1996)Google Scholar
  10. Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, 4th edn, pp. 147–167. Oxford University Press, Oxford (1992)Google Scholar
  11. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, 3rd edn. CRC Press, London (2013)MATHGoogle Scholar
  12. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007)Google Scholar
  13. Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24, 997–1016 (2014)MathSciNetCrossRefMATHGoogle Scholar
  14. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)Google Scholar
  15. Hoeting, J., Madigan, D., Raftery, A.E., Volinsky, C.: Bayesian model averaging. Stat. Sci. 14, 382–417 (1999)MathSciNetCrossRefMATHGoogle Scholar
  16. Hoffman, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014)MathSciNetMATHGoogle Scholar
  17. Ionides, E.L.: Truncated importance sampling. J. Comput. Graph. Stat. 17, 295–311 (2008)MathSciNetCrossRefGoogle Scholar
  18. Koopman, S.J., Shephard, N., Creal, D.: Testing the assumptions behind importance sampling. J. Econom. 149, 2–11 (2009)MathSciNetCrossRefMATHGoogle Scholar
  19. Peruggia, M.: On the variability of case-deletion importance sampling weights in the Bayesian linear model. J. Am. Stat. Assoc. 92, 199–207 (1997)MathSciNetCrossRefMATHGoogle Scholar
  20. Piironen, J., Vehtari, A.: Comparison of Bayesian predictive methods for model selection. Stat. Comput. (2016) (In press). http://link.springer.com/article/10.1007/s11222-016-9649-y
  21. Plummer, M.: Penalized loss functions for Bayesian model comparison. Biostatistics 9, 523–539 (2008)CrossRefMATHGoogle Scholar
  22. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2016). https://www.R-project.org/
  23. Rubin, D.B.: Estimation in parallel randomized experiments. J. Educ. Stat. 6, 377–401 (1981)Google Scholar
  24. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639 (2002)MathSciNetCrossRefMATHGoogle Scholar
  25. Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., Lunn, D.: BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England (1994, 2003). http://www.mrc-bsu.cam.ac.uk/bugs/
  26. Stan Development Team: The Stan C++ Library, version 2.10.0 (2016a). http://mc-stan.org/
  27. Stan Development Team: RStan: the R interface to Stan, version 2.10.1 (2016b). http://mc-stan.org/interfaces/rstan.html
  28. Stone, M.: An asymptotic equivalence of choice of model cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 36, 44–47 (1977)MathSciNetMATHGoogle Scholar
  29. van der Linde, A.: DIC in variable selection. Stat. Neerl. 1, 45–56 (2005)MathSciNetCrossRefMATHGoogle Scholar
  30. Vehtari, A., Gelman, A.: Pareto smoothed importance sampling (2015). arXiv:1507.02646
  31. Vehtari, A., Gelman, A., Gabry, J.: loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6 (2016a). https://github.com/stan-dev/loo
  32. Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., Winther, O.: Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. J. Mach. Learn. Res. 17, 1–38 (2016b)Google Scholar
  33. Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14, 2439–2468 (2002)CrossRefMATHGoogle Scholar
  34. Vehtari, A., Ojanen, J.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv. 6, 142–228 (2012)MathSciNetCrossRefMATHGoogle Scholar
  35. Vehtari, A., Riihimäki, J.: Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Anal. 9, 425–448 (2014)MathSciNetCrossRefMATHGoogle Scholar
  36. Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010)MathSciNetMATHGoogle Scholar
  37. Zhang, J., Stephens, M.A.: A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316–325 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer Science, Helsinki Institute for Information Technology HIITAalto UniversityEspooFinland
  2. 2.Department of StatisticsColumbia UniversityNew YorkUSA

Personalised recommendations