# Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

## Abstract

Leave-one-out cross-validation (LOO) and the widely applicable information criterion (WAIC) are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values. LOO and WAIC have various advantages over simpler estimates of predictive error such as AIC and DIC but are less used in practice because they involve additional computational steps. Here we lay out fast and stable computations for LOO and WAIC that can be performed using existing simulation draws. We introduce an efficient computation of LOO using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. Although WAIC is asymptotically equal to LOO, we demonstrate that PSIS-LOO is more robust in the finite case with weak priors or influential observations. As a byproduct of our calculations, we also obtain approximate standard errors for estimated predictive errors and for comparison of predictive errors between two models. We implement the computations in an R package called loo and demonstrate using models fit with the Bayesian inference package Stan.

## Keywords

Bayesian computation Leave-one-out cross-validation (LOO)*K*-fold cross-validation Widely applicable information criterion (WAIC) Stan Pareto smoothed importance sampling (PSIS)

## Notes

### Acknowledgments

We thank Bob Carpenter, Avraham Adler, Joona Karjalainen, Sean Raleigh, Sumio Watanabe, and Ben Lambert for helpful comments, Juho Piironen for R help, Tuomas Sivula for Python port, and the U.S. National Science Foundation, Institute of Education Sciences, and Office of Naval Research for partial support of this research.

## References

- Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) Proceedings of the Second International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)Google Scholar
- Ando, T., Tsay, R.: Predictive likelihood for Bayesian model selection and averaging. Int. J. Forecast.
**26**, 744–763 (2010)CrossRefGoogle Scholar - Arolot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Stat. Surv.
**4**, 40–79 (2010)MathSciNetCrossRefzbMATHGoogle Scholar - Bernardo, J.M., Smith A.F.M.: Bayesian Theory. Wiley, New York (1994)Google Scholar
- Burman, P.: A comparative study of ordinary cross-validation, \(v\)-fold cross-validation and the repeated learning-testing methods. Biometrika
**76**, 503–514 (1989)MathSciNetCrossRefzbMATHGoogle Scholar - Epifani, I., MacEachern, S.N., Peruggia, M.: Case-deletion importance sampling estimators: central limit theorems and related results. Electron. J. Stat.
**2**, 774–806 (2008)MathSciNetCrossRefzbMATHGoogle Scholar - Gabry, J., Goodrich, B.: rstanarm: Bayesian applied regression modeling via Stan. R package version 2.10.0. (2016). http://mc-stan.org/interfaces/rstanarm
- Geisser, S., Eddy, W.: A predictive approach to model selection. J. Am. Stat. Assoc.
**74**, 153–160 (1979)MathSciNetCrossRefzbMATHGoogle Scholar - Gelfand, A.E.: Model determination using sampling-based methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (eds.) Markov Chain Monte Carlo in Practice, pp. 145–162. Chapman and Hall, London (1996)Google Scholar
- Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based methods. In: Bernardo, J.M., Berger, J.O., Dawid, A.P., Smith, A.F.M. (eds.) Bayesian Statistics, 4th edn, pp. 147–167. Oxford University Press, Oxford (1992)Google Scholar
- Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis, 3rd edn. CRC Press, London (2013)zbMATHGoogle Scholar
- Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2007)Google Scholar
- Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput.
**24**, 997–1016 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc.
**102**, 359–378 (2007)Google Scholar - Hoeting, J., Madigan, D., Raftery, A.E., Volinsky, C.: Bayesian model averaging. Stat. Sci.
**14**, 382–417 (1999)MathSciNetCrossRefzbMATHGoogle Scholar - Hoffman, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res.
**15**, 1593–1623 (2014)MathSciNetzbMATHGoogle Scholar - Ionides, E.L.: Truncated importance sampling. J. Comput. Graph. Stat.
**17**, 295–311 (2008)MathSciNetCrossRefGoogle Scholar - Koopman, S.J., Shephard, N., Creal, D.: Testing the assumptions behind importance sampling. J. Econom.
**149**, 2–11 (2009)MathSciNetCrossRefzbMATHGoogle Scholar - Peruggia, M.: On the variability of case-deletion importance sampling weights in the Bayesian linear model. J. Am. Stat. Assoc.
**92**, 199–207 (1997)MathSciNetCrossRefzbMATHGoogle Scholar - Piironen, J., Vehtari, A.: Comparison of Bayesian predictive methods for model selection. Stat. Comput. (2016) (
**In press**). http://link.springer.com/article/10.1007/s11222-016-9649-y - Plummer, M.: Penalized loss functions for Bayesian model comparison. Biostatistics
**9**, 523–539 (2008)CrossRefzbMATHGoogle Scholar - R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2016). https://www.R-project.org/
- Rubin, D.B.: Estimation in parallel randomized experiments. J. Educ. Stat.
**6**, 377–401 (1981)Google Scholar - Spiegelhalter, D.J., Best, N.G., Carlin, B.P., van der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. B
**64**, 583–639 (2002)MathSciNetCrossRefzbMATHGoogle Scholar - Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., Lunn, D.: BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England (1994, 2003). http://www.mrc-bsu.cam.ac.uk/bugs/
- Stan Development Team: The Stan C++ Library, version 2.10.0 (2016a). http://mc-stan.org/
- Stan Development Team: RStan: the R interface to Stan, version 2.10.1 (2016b). http://mc-stan.org/interfaces/rstan.html
- Stone, M.: An asymptotic equivalence of choice of model cross-validation and Akaike’s criterion. J. R. Stat. Soc. B
**36**, 44–47 (1977)MathSciNetzbMATHGoogle Scholar - van der Linde, A.: DIC in variable selection. Stat. Neerl.
**1**, 45–56 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - Vehtari, A., Gelman, A.: Pareto smoothed importance sampling (2015). arXiv:1507.02646
- Vehtari, A., Gelman, A., Gabry, J.: loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models. R package version 0.1.6 (2016a). https://github.com/stan-dev/loo
- Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., Winther, O.: Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. J. Mach. Learn. Res.
**17**, 1–38 (2016b)Google Scholar - Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput.
**14**, 2439–2468 (2002)CrossRefzbMATHGoogle Scholar - Vehtari, A., Ojanen, J.: A survey of Bayesian predictive methods for model assessment, selection and comparison. Stat. Surv.
**6**, 142–228 (2012)MathSciNetCrossRefzbMATHGoogle Scholar - Vehtari, A., Riihimäki, J.: Laplace approximation for logistic Gaussian process density estimation and regression. Bayesian Anal.
**9**, 425–448 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res.
**11**, 3571–3594 (2010)MathSciNetzbMATHGoogle Scholar - Zhang, J., Stephens, M.A.: A new and efficient estimation method for the generalized Pareto distribution. Technometrics
**51**, 316–325 (2009)MathSciNetCrossRefGoogle Scholar