Skip to main content

WAIC and WBIC for mixture models

Abstract

In Bayesian statistical inference, an unknown probability distribution is estimated from a sample using a statistical model and a prior. In general, a pair of a model and a prior may or may not be appropriate for unknown distribution; some evaluation procedures of the estimated result are necessary. If a statistical model is regular and the likelihood function can be approximated by some Gaussian function, then AIC and BIC can be applied to such evaluation processes. However, if a statistical model contains hierarchical structure or latent variables, then regularity condition is not satisfied. The information criteria WAIC and WBIC are devised so as to estimate the generalization loss and the free energy, respectively, even if the posterior distribution is far from any normal distribution and even if the unknown true distribution is not realizable by a statistical model. In this paper, we introduce mathematical foundation and computing methods of WAIC and WBIC in a normal mixture which is a typically singular statistical model, and discuss their properties in statistical inference. Also, we study the case that samples are not independently and identically distributed, for example, they are conditional independent or exchangeable.

This is a preview of subscription content, access via your institution.

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723

    MathSciNet  Article  Google Scholar 

  • Akaike H (1980) On the transition of the paradigm of statistical inference. Proc Inst Stat Math 27:5–12

    MathSciNet  MATH  Google Scholar 

  • Aoyagi M (2005) Stochastic complexities of reduced rank regression in Bayesian estimation. Neural Netw 18:924–933

    Article  Google Scholar 

  • Atiyah MF (1970) Resolution of singularities and division of distributions. Commun Pure Appl Math 23(2):145–150

    MathSciNet  Article  Google Scholar 

  • Bernstein IN (1972) The analytic continuation of generalized functions with respect to a parameter. Funct Anal Appl 6:273–285

    MathSciNet  Article  Google Scholar 

  • Drton M, Plummer M (2017) A Bayesian information criterion for singular models. J R Stat Soc B 56:1–38

    MATH  Google Scholar 

  • Epifani I, MacEchern SN, Peruggia M (2008) Case-deletion importance sampling estimators: central limit theorems and related results. Electr J Stat 2:774–806

    MathSciNet  Article  Google Scholar 

  • Gelman A, Shalizi CS (2013) Philosophy and the practice of Bayesian statistics. Br J Math Stat Psychol 66:8–38

    MathSciNet  Article  Google Scholar 

  • Gelman A et al (2013) Bayesian data analysis III. CRC Press, Boca Raton

    Book  Google Scholar 

  • Hayashi N (2020) The exact asymptotic form of Bayesian generalization error in latent Dirichlet allocation. arXiv:2008.01304

  • Hironaka H (1964) Resolution of singularities of an algebraic variety over a field of characteristic zero I, II. Ann Math 79:109–326

    MathSciNet  Article  Google Scholar 

  • McElreath S (2020) Statistical rethinking: a Bayesian course with examples in R and STAN, 2nd edn. CRC Press, Boca Raton

    Book  Google Scholar 

  • Nagata K (2008) Asymptotic behavior of exchange ratio in exchange Monte Carlo method. Neural Netw 21(7):980–988

    Article  Google Scholar 

  • O’Neill Ben (2009) Exchangeability, correlation, and Bayes’ effect. Int Stat Rev 77(2):241–250

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    MathSciNet  Article  Google Scholar 

  • Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27(5):1413–1432

    MathSciNet  Article  Google Scholar 

  • Watanabe S (2009) Algebraic geometry and statistical learning theory. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Watamane S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594

    MathSciNet  MATH  Google Scholar 

  • Watamane S (2013) A widely applicable Bayesian information criterion. J Mach Learn Res 14:867–897

    MathSciNet  MATH  Google Scholar 

  • Watanabe S (2018) Mathematical theory of Bayesian statistics. CRC Press, Boca Raton

    Book  Google Scholar 

  • Watanabe K, Watanabe S (2006) Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. J Mach Learn Res 7:625–644

    MathSciNet  MATH  Google Scholar 

  • Yamazaki K (2016) Asymptotic accuracy of Bayes estimation for latent variables with redundancy. Mach Learn 102:1–28

    MathSciNet  Article  Google Scholar 

  • Yamazaki K, Kaji D (2013) Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures. Neural Netw 44:36–43

    Article  Google Scholar 

  • Yamazaki K, Watanabe S (2003) Singularities in mixture models and upper bounds of stochastic complexity. Int J Neural Netw 16(7):1029–1038

    Article  Google Scholar 

  • Zwiernik P (2011) An asymptotic behaviour of the marginal likelihood for general Markov models. J Mach Learn Res 12:3283–3310

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumio Watanabe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Maomi Ueno.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Watanabe, S. WAIC and WBIC for mixture models. Behaviormetrika 48, 5–21 (2021). https://doi.org/10.1007/s41237-021-00133-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41237-021-00133-z

Keywords

  • Information criteria
  • WAIC
  • WBIC
  • Normal mixture