Statistics and Computing

, Volume 27, Issue 5, pp 1413–1432 | Cite as

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

  • Aki VehtariEmail author
  • Andrew Gelman
  • Jonah Gabry


Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.


Bayesian computation Leave-one-out cross-validation (LOO) K-fold cross-validation Widely applicable information criterion (WAIC) Stan Pareto smoothed importance sampling (PSIS) 



We thank Bob Carpenter, Avraham Adler, Joona Karjalainen, Sean Raleigh, Sumio Watanabe, and Ben Lambert for helpful comments, Juho Piironen for R help, Tuomas Sivula for Python port, and the U.S. National Science Foundation, Institute of Education Sciences, and Office of Naval Research for partial support of this research.


  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
  2. Ando, T., Tsay, R.: Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast. 26, 744–763 (2010)CrossRefGoogle Scholar
  3. Arolot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv. 4, 40–79 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Bernardo, J.M., Smith A.F.M.: Bayesian Theory. Wiley, New York (1994)Google Scholar
  5. Burman, P.: A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika 76, 503–514 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Epifani, I., MacEachern, S.N., Peruggia, M.: Case-deletion importance sampling estimators: central limit theorems and related results. Electron. J. Stat. 2, 774–806 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Gabry, J., Goodrich, B.: rstanarm: Bayesian applied regression modeling via Stan. R package version 2.10.0. (2016).
  8. Geisser, S., Eddy, W.: A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  9. Gelfand, A.E.: Model determination using sampling-based methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 145–162. Chapman and Hall, London (1996)Google Scholar
  10. Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, 4th edn, pp. 147–167. Oxford University Press, Oxford (1992)Google Scholar
  11. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, 3rd edn. CRC Press, London (2013)zbMATHGoogle Scholar
  12. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007)Google Scholar
  13. Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24, 997–1016 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)Google Scholar
  15. Hoeting, J., Madigan, D., Raftery, A.E., Volinsky, C.: Bayesian model averaging. Stat. Sci. 14, 382–417 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hoffman, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014)MathSciNetzbMATHGoogle Scholar
  17. Ionides, E.L.: Truncated importance sampling. J. Comput. Graph. Stat. 17, 295–311 (2008)MathSciNetCrossRefGoogle Scholar
  18. Koopman, S.J., Shephard, N., Creal, D.: Testing the assumptions behind importance sampling. J. Econom. 149, 2–11 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  19. Peruggia, M.: On the variability of case-deletion importance sampling weights in the Bayesian linear model. J. Am. Stat. Assoc. 92, 199–207 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Piironen, J., Vehtari, A.: Comparison of Bayesian predictive methods for model selection. Stat. Comput. (2016) (In press).
  21. Plummer, M.: Penalized loss functions for Bayesian model comparison. Biostatistics 9, 523–539 (2008)CrossRefzbMATHGoogle Scholar
  22. R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2016).
  23. Rubin, D.B.: Estimation in parallel randomized experiments. J. Educ. Stat. 6, 377–401 (1981)Google Scholar
  24. Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  25. Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., Lunn, D.: BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England (1994, 2003).
  26. Stan Development Team: The Stan C++ Library, version 2.10.0 (2016a).
  27. Stan Development Team: RStan: the R interface to Stan, version 2.10.1 (2016b).
  28. Stone, M.: An asymptotic equivalence of choice of model cross-validation and Akaike’s criterion. J. R. Stat. Soc. B 36, 44–47 (1977)MathSciNetzbMATHGoogle Scholar
  29. van der Linde, A.: DIC in variable selection. Stat. Neerl. 1, 45–56 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Vehtari, A., Gelman, A.: Pareto smoothed importance sampling (2015). arXiv:1507.02646
  31. Vehtari, A., Gelman, A., Gabry, J.: loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6 (2016a).
  32. Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., Winther, O.: Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. J. Mach. Learn. Res. 17, 1–38 (2016b)Google Scholar
  33. Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14, 2439–2468 (2002)CrossRefzbMATHGoogle Scholar
  34. Vehtari, A., Ojanen, J.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv. 6, 142–228 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Vehtari, A., Riihimäki, J.: Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Anal. 9, 425–448 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010)MathSciNetzbMATHGoogle Scholar
  37. Zhang, J., Stephens, M.A.: A new and efficient estimation method for the generalized Pareto distribution. Technometrics 51, 316–325 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Computer Science, Helsinki Institute for Information Technology HIITAalto UniversityEspooFinland
  2. 2.Department of StatisticsColumbia UniversityNew YorkUSA

Personalised recommendations