Skip to main content
Log in

Accelerated parallel non-conjugate sampling for Bayesian non-parametric models

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Inference of latent feature models in the Bayesian nonparametric setting is generally difficult, especially in high dimensional settings, because it usually requires proposing features from some prior distribution. In special cases, where the integration is tractable, we can sample new feature assignments according to a predictive likelihood. We present a novel method to accelerate the mixing of latent variable model inference by proposing feature locations based on the data, as opposed to the prior. First, we introduce an accelerated feature proposal mechanism that we show is a valid MCMC algorithm for posterior inference. Next, we propose an approximate inference strategy to perform accelerated inference in parallel. A two-stage algorithm that combines the two approaches provides a computationally attractive method that can quickly reach local convergence to the posterior distribution of our model, while allowing us to exploit parallelization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For a more substantial introduction to Bayesian non-parametrics, we direct the reader to Ghosh and Ramamoorthi (2003); Hjort et al. (2010); Müller et al. (2015) and a recent survey on Bayesian non-parametrics (Xuan et al. 2019).

  2. Code is available at https://github.com/michaelzhang01/acceleratedDP

References

  • Aldous, D. J.: Exchangeability and related topics. In École d’Été de probabilités de Saint-Flour XIII, (1985)

  • Antoniak, C.E.: Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2(6), 1152–1174 (1974)

    Article  MathSciNet  Google Scholar 

  • Au, S.-K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset simulation. Probab. Eng. Mech. 16(4), 263–277 (2001)

    Article  Google Scholar 

  • Blei, D.M., Jordan, M.I.: Variational inference for Dirichlet process mixtures. Bayesian Anal. 1(1), 121–143 (2006)

    Article  MathSciNet  Google Scholar 

  • Broderick, T., Kulis, B., Jordan, M.: MAD-Bayes: MAP-based asymptotic derivations from Bayes. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 226–234, Atlanta, Georgia, USA. PMLR, (2013)

  • Chang, J., Fisher III, J. W.: Parallel sampling of DP mixture models using sub-cluster splits. In Advances in Neural Information Processing Systems, p 620–628, (2013)

  • Dahl, D.B.: Sequentially-allocated merge-split sampler for conjugate and nonconjugate Dirichlet process mixture models. J. Comput. Graph. Statist. 11(1), 6 (2005)

    Google Scholar 

  • Damien, P., Wakefield, J., Walker, S.: Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Series B (Statistical Methodology) 61(2), 331–344 (1999)

    Article  MathSciNet  Google Scholar 

  • Dubey, A., Zhang, M.M., Xing, E.P., Williamson, S.A.: Distributed, partially collapsed MCMC for Bayesian nonparametrics. Int. Conf. Artif. Intell. Stat. 108, 3685–3695 (2020)

    Google Scholar 

  • Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. Ann. Statist. 1(2), 209–230 (1973)

    Article  MathSciNet  Google Scholar 

  • Fox, E.B., Hughes, M.C., Sudderth, E.B., Jordan, M.I., et al.: Joint modeling of multiple time series via the beta process with application to motion capture segmentation. The Ann. Appl. Stat. 8(3), 1281–1313 (2014)

    Article  MathSciNet  Google Scholar 

  • Ge, H., Chen, Y., Wan, M., Ghahramani, Z.: Distributed inference for Dirichlet process mixture models. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2276–2284, (2015)

  • Gelman, A., Shalizi, C.R.: Philosophy and the practice of Bayesian statistics. British J. Math. Stat. Psych. 66(1), 8–38 (2013)

    Article  MathSciNet  Google Scholar 

  • Ghosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer, New York, NY (2003)

    MATH  Google Scholar 

  • Green, P.J., Hastie, D.I.: Reversible jump mcmc. Genetics 155(3), 1391–1403 (2009)

    Google Scholar 

  • Griffiths, T.L., Ghahramani, Z.: The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 12, 1185–1224 (2011)

    MathSciNet  MATH  Google Scholar 

  • Hjort, N. L., Holmes, C., Müller, P., Walker, S. G.: Bayesian nonparametrics, volume 28. Cambridge University Press, (2010)

  • Hughes, M. C., Sudderth, E.: Memoized online variational inference for Dirichlet process mixture models. In Advances in Neural Information Processing Systems, p 1133–1141, (2013)

  • Ishwaran, H., James, L.F.: Gibbs sampling methods for stick-breaking priors. JASA 96(453), 161–173 (2001)

    Article  MathSciNet  Google Scholar 

  • Ishwaran, H., Zarepour, M.: Exact and approximate sum representations for the Dirichlet process. Canad J. Stat. 30(2), 269–283 (2002)

    Article  MathSciNet  Google Scholar 

  • Jain, S., Neal, R.M.: A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. J. Comput. Graph. Stat. 13(1), 158–182 (2004)

    Article  MathSciNet  Google Scholar 

  • Jain, S., Neal, R.M.: Splitting and merging components of a nonconjugate Dirichlet process mixture model. Bayesian Anal. 2(3), 445–472 (2007)

    MathSciNet  MATH  Google Scholar 

  • Jordan, M.I.: The era of big data. ISBA Bulletin 18(2), 1–3 (2011)

    Google Scholar 

  • Katafygiotis, L.S., Zuev, K.M.: Geometric insight into the challenges of solving high-dimensional reliability problems. Probab. Eng. Mech. 23(2–3), 208–218 (2008)

    Article  Google Scholar 

  • Kim, B., Shah, J. A., Doshi-Velez, F.: Mind the gap: A generative approach to interpretable feature selection and extraction. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R., editors, Advances in Neural Information Processing Systems 28, 2251–2259. Curran Associates, Inc., (2015)

  • Kingma, D. P., Welling, M.: Auto-encoding variational Bayes. ICLR 2014, (2014)

  • Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto, (2009)

  • LeCun, Y., Cortes, C.: The MNIST database of handwritten digits, (1998)

  • Lee, K.-C., Ho, J., Kriegman, D.J.: Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. intelligence 27(5), 684–698 (2005)

    Article  Google Scholar 

  • Liu, J.S.: The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Am. Stat. Assoc. 89(427), 958–966 (1994)

    Article  MathSciNet  Google Scholar 

  • Mescheder, L. M., Nowozin, S., Geiger, A.: Adversarial variational Bayes: Unifying variational autoencoders and generative adversarial networks, (2017). CoRR, arXiv:1701.04722

  • Miller, J. W., Harrison, M. T.: A simple example of Dirichlet process mixture inconsistency for the number of components. In Advances in Neural Information Processing Systems, p 199–206, (2013)

  • Müller, P., Quintana, F.A., Jara, A., Hanson, T.: Bayesian nonparametric data analysis. Springer, Cham (2015)

    Book  Google Scholar 

  • Murray, I., Adams, R.P., MacKay, D.J.: Elliptical slice sampling. J. Mach. Lear. Res. 9, 541–548 (2010)

    Google Scholar 

  • Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  • Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. Journal of Machine Learning Research, 10(8), (2009)

  • Papamarkou, T., Hinkle, J., Young, M. T., Womble, D.: Challenges in markov chain monte carlo for Bayesian neural networks, (2019). arXiv preprint arXiv:1910.06539

  • Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)

    Article  MathSciNet  Google Scholar 

  • Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica, pp 639–650, (1994)

  • Smyth, P., Welling, M., Asuncion, A. U.: Asynchronous distributed learning of topic models. In Advances in Neural Information Processing Systems, pp 81–88, (2009)

  • Teh, Y. W., Jordan, M. I., Beal, M. J., Blei, D. M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc., 101, (2004)

  • Tran, D., Ranganath, R., Blei, D. M.: Deep and hierarchical implicit models, (2017). CoRR, arXiv:1702.08896

  • Ueda, N., Ghahramani, Z.: Bayesian model search for mixture models based on optimizing variational bounds. Neural Netw. 15(10), 1223–1241 (2002)

    Article  Google Scholar 

  • Walker, S.G.: Sampling the Dirichlet mixture model with slices. Communications in Statistics—Simulation and Computation® 36(1), 45–54 (2007)

    Article  MathSciNet  Google Scholar 

  • West, M.: Hyperparameter estimation in Dirichlet process mixture models, (1992)

  • Williamson, S. A., Dubey, A., Xing, E.: Parallel Markov chain Monte Carlo for nonparametric mixture models. In Proceedings of the 30th International Conference on Machine Learning, pp 98–106, (2013)

  • Xuan, J., Lu, J., Zhang, G.: A survey on Bayesian nonparametric learning. ACM Computing Surveys (CSUR) 52(1), 1–36 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Minyi Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The contribution of Michael Zhang and Sinead Williamson was funded by NSF grant IIS 1447721. The contribution of Michael Zhang was also funded by the University of Hong Kong’s Seed Fund for Basic Research for New Staff.

Appendix

Appendix

See Figs. 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20.

Fig. 8
figure 8

Yale faces features (left), MNIST features (middle) and CIFAR features (right) obtained via uncollapsed sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 9
figure 9

MNIST features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 10
figure 10

Yale faces features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 11
figure 11

CIFAR features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 12
figure 12

MNIST features obtained via variational inference, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 13
figure 13

Yale faces features obtained via variational inference, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 14
figure 14

CIFAR features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 15
figure 15

MNIST features obtained via collapsed sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 16
figure 16

Yale faces features obtained via variational inference, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 17
figure 17

CIFAR faces features obtained via collapsed sampling, sorted in descending order of popularity for the multinomial-Dirichlet model

Fig. 18
figure 18

MNIST features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-log normal model

Fig. 19
figure 19

Yale faces features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-log normal model

Fig. 20
figure 20

CIFAR features obtained via accelerated sampling, sorted in descending order of popularity for the multinomial-log normal model

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M.M., Williamson, S.A. & Pérez-Cruz, F. Accelerated parallel non-conjugate sampling for Bayesian non-parametric models. Stat Comput 32, 50 (2022). https://doi.org/10.1007/s11222-022-10108-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10108-z

Keywords

Navigation