Two Alternative Criteria for a Split-Merge MCMC on Dirichlet Process Mixture Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10614)

Abstract

The free energy and the generalization error are two major model selection criteria. However, in general, they are not equivalent. In previous studies, for the split-merge algorithm on conjugate Dirichlet process mixture models, the complete free energy was mainly used. In this work, we propose, the new criterion, the complete leave one out cross validation which is based on the approximation of the generalization error. In numerical experiments, our proposal outperforms the previous methods with the test set perplexity. Finally, we discuss the appropriate usage of these two criteria taking into account the experimental results.

References

  1. 1.
    Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory. Academiai Kiado (1973)Google Scholar
  2. 2.
    Blackwell, D., MacQueen, J.B.: Ferguson distributions via pólya urn schemes. Ann. Stat. 1(2), 353–355 (1973)CrossRefMATHGoogle Scholar
  3. 3.
    Dahl, D.B.: An improved merge-split sampler for conjugate dirichlet process mixture models. Technical report 1, 086 (2003)Google Scholar
  4. 4.
    Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification. Technical report, Department of Computer Science, National Taiwan University (2003)Google Scholar
  5. 5.
    Jain, S., Neal, R.M.: A split-merge markov chain monte carlo procedure for the dirichlet process mixture model. J. Comput. Graph. Stat. 13(1), 158–182 (2004)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Kenji, N., Jun, K., Shin-ichi, N., Satoshi, E., Ryoi, T., Masato, O.: An exhaustive search and stability of sparse estimation for feature selection problem. IPSJ Trans. Math. Model. Appl. 8(2), 23–30 (2015)Google Scholar
  7. 7.
    MacKay, D.J.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)MATHGoogle Scholar
  8. 8.
    Neal, R.M.: Markov chain sampling methods for dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)MathSciNetGoogle Scholar
  9. 9.
    Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRefMATHGoogle Scholar
  10. 10.
    Sato, I., Nakagawa, H.: Stochastic divergence minimization for online collapsed variational Bayes zero inference of latent Dirichlet allocation. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1035–1044. ACM (2015)Google Scholar
  11. 11.
    Schwarz, G., et al.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)CrossRefMATHMathSciNetGoogle Scholar
  12. 12.
    Wang, C., Blei, D.M.: A split-merge MCMC algorithm for the hierarchical dirichlet process. arXiv preprint arXiv:1201.1657 (2012)
  13. 13.
    Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11(Dec), 3571–3594 (2010)MATHMathSciNetGoogle Scholar
  14. 14.
    Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14(Mar), 867–897 (2013)MATHMathSciNetGoogle Scholar
  15. 15.
    Welling, M., Kurihara, K.: Bayesian k-means as a “maximization-expectation” algorithm. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 474–478. SIAM (2006)Google Scholar
  16. 16.
    Yamazaki, K.: Asymptotic accuracy of Bayes estimation for latent variables with redundancy. Mach. Learn. 102(1), 1–28 (2016)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Nihon Unisys, Ltd.TokyoJapan

Personalised recommendations