Journal of Signal Processing Systems

, Volume 74, Issue 3, pp 359–374 | Cite as

Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference

  • Zhanyu MaEmail author
  • Arne Leijon
  • Zheng-Hua Tan
  • Sheng Gao


In Bayesian analysis of a statistical model, the predictive distribution is obtained by marginalizing over the parameters with their posterior distributions. Compared to the frequently used point estimate plug-in method, the predictive distribution leads to a more reliable result in calculating the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution. The approximated predictive distribution obtained by minimizing the upper-bound is analytically tractable, facilitating the computation of the predictive likelihood. With synthesized data and real data evaluations, the good performance of the proposed LVI based method is demonstrated by comparing with some conventionally used methods.


Predictive distribution Dirichlet mixture model Bayesian inference Local variational inference 


  1. 1.
    Bjørnstad, J.F. (1990). Predictive likelihood: a review. Statistical Science, 5, 242–254.CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
  3. 3.
    Sorenson, H.W. (1980). Parameter estimation: principles and problems. New York: Marcel Dekker.zbMATHGoogle Scholar
  4. 4.
    Kamen, E.W., & Su, J. (1999). Introduction to optimal estimation, ser. Advanced textbooks in control and signal processing. London: Springer.Google Scholar
  5. 5.
    Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733–807.zbMATHMathSciNetGoogle Scholar
  6. 6.
    Sinharay, S., & Stern, H.S. (2003). Posterior predictive model checking in hierarchical models. Journal of Statistical Planning and Inference, 111, 209–221.CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Patel, J.K., & Read C.B. (1996). Handbook of the normal distribution, ser. Statistics, textbooks and monographs. Marcel Dekker.Google Scholar
  8. 8.
    Jain, A.K., Duin, R.P.W., Mao, J. (2000). Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 4–37.CrossRefGoogle Scholar
  9. 9.
    McLachlan, G., & Peel, D. (2000). Finite mixture models, ser. Wiley series in probability and statistics: applied probability and statistics. Wiley.Google Scholar
  10. 10.
    Figueiredo, M.A.T., & Jain, A.K. (2002). Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 381–396.CrossRefGoogle Scholar
  11. 11.
    McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, ser. Wiley series in probability and statistics. Wiley-Interscience.Google Scholar
  12. 12.
    Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Ma, Z. (2011). Non-Gaussian statistical models and their applications. Ph.D. dissertation, US-AB, Stockholm: KTH - Royal Institute of Technology.Google Scholar
  14. 14.
    Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.CrossRefGoogle Scholar
  15. 15.
    Atapattu, S., Tellambura, C., Jiang, H. (2011). A mixture Gamma distribution to model the SNR of wireless channels. IEEE Transactions on Wireless Communications, 10(12), 4193–4203.CrossRefGoogle Scholar
  16. 16.
    Ma, Z., Leijon, A., Kleijn, W.B. (2013). Vector quantization of LSF parameters with a mixture of Dirichlet distributions. IEEE Transactions on Audio, Speech, and Language Processing, 21(9), 1777–1790.CrossRefGoogle Scholar
  17. 17.
    Bouguila, N., Ziou, D., Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application. IEEE Transactions on Image Processing, 13(11), 1533–1543.CrossRefGoogle Scholar
  18. 18.
    Blei, D.M. (2004). Probabilistic models of text and images. Ph.D, dissertation. University of California, Berkeley.Google Scholar
  19. 19.
    Rana, P.K., Ma, Z., Taghia, J., Flierl, M. (2013). Multiview depth map enhancement by variational Bayes inference estimation of Dirichlet mixture models. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP).Google Scholar
  20. 20.
    Ma, Z., & Leijon, A. (2010). Modeling speech line spectral frequencies with Dirichlet mixture models. In Proceedings of INTERSPEECH (pp. 2370–2373).Google Scholar
  21. 21.
    Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  22. 22.
    Blei, D.M., & Jordan, M.I. (2005). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1, 121–144.CrossRefMathSciNetGoogle Scholar
  23. 23.
    Orbanz, P., & Teh, Y.W. (2010). Bayesian nonparametric models. Encyclopedia of Machine Learning, 88–89.Google Scholar
  24. 24.
    Orbanz, P. (2010). Construction of nonparametric Bayesian models from parametric Bayes equations. In Advances in neural information processing systems.Google Scholar
  25. 25.
    Ghahramani Z. (2012). Bayesian non-parametrics and the probabilistic approach to modelling. Philosophical Transactions of the Royal Society A, 371.Google Scholar
  26. 26.
    Minka, T.P. (2003). Estimating a Dirichlet distribution. Annals of Physics, 2000(8), 1–13.Google Scholar
  27. 27.
    Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.CrossRefGoogle Scholar
  28. 28.
    Fan, W., Bouguila, N., Ziou, D. (2012). Variational learning for finite Dirichlet mixture models and applications. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 762–774.CrossRefGoogle Scholar
  29. 29.
    Palmer, J.A. (2003). Relative convexity. ECE Dept., UCSD Tech. Rep.Google Scholar
  30. 30.
    Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model of Science. The Annals of Applied Statistics, 1, 17–35.CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Jaakkola, T.S., & Jordan, M.I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10, 25–37.CrossRefGoogle Scholar
  32. 32.
    Jaakkola, T.S. (2001). Tutorial on variational approximation methods. In M. Opper & D. Saad (Eds.), Advances in mean field methods (pp. 129–159). Cambridge: MIT Press.Google Scholar
  33. 33.
    Hoffman, M., Blei, D., Cook, P. (2010). Bayesian nonparametric matrix factorization for recorded music. In Proceedings of the international conference on machine learning.Google Scholar
  34. 34.
    Minka, T.P. (2001). Expectation propagation for approximate Bayesian inference. In Proceedings of the seventeenth conference on uncertainty in artificial intelligence (pp. 362–369).Google Scholar
  35. 35.
    Minka, T.P. (2001). A family of algorithms for approximate Bayesian inference. Ph.D. dissertation. Massachusetts Institute of Technology.Google Scholar
  36. 36.
    Ma, Z. (2012). Bayesian estimation of the Dirichlet distribution with expectation propagation. In Proceeding of the 20th European signal processing conference (pp. 689–693).Google Scholar
  37. 37.
    Ma, Z., & Leijon, A. (2011). Approximating the predictive distribution of the beta distribution with the local variational method. In Proceedings of IEEE international workshop on machine learning for signal processing (pp. 1–6).Google Scholar
  38. 38.
    Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  39. 39.
    Brookes, M. (2013). The matrix reference manual. Available online: Accessed 9 Aug 2013.
  40. 40.
    Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces. Journal of Neural Engineering, 4(2), R1.CrossRefGoogle Scholar
  41. 41.
    Prasad, S., Tan, Z.-H., Prasad, R., Cabrera, A.F., Gu, Y., Dremstrup, K. (2011). Feature selection strategy for classification of single-trial EEG elicited by motor imagery. In International symposium on wireless personal multimedia communications (WPMC) (pp. 1–4).Google Scholar
  42. 42.
    Ma, Z., Tan, Z.-H., Prasad, S. (2012). EEG signal classification with super-Dirichlet mixture model. In Proceedings of IEEE statistical signal processing workshop (pp. 440–443).Google Scholar
  43. 43.
    Subasi, A. (2007). EEG signal classification using wavelet feature extraction and a mixture of expert model. Expert Systems with Applications, 32(4), 1084–1093.CrossRefGoogle Scholar
  44. 44.
    Farina, D., Nascimento, O.F., Lucas, M.F., Doncarli, C. (2007). Optimization of wavelets for classification of movement-related cortical potentials generated by variation of force-related parameters. Journal of Neuroscience Methods, 162, 357–363.CrossRefGoogle Scholar
  45. 45.
    Ma, Z., & Leijon, A. (2011). Super-Dirichlet mixture models using differential line spectral frequences for text-independent speaker identification. In Proceedings of INTERSPEECH (pp. 2349–2352).Google Scholar
  46. 46.
  47. 47.
    Lal, T.N., Schroder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Scholkopf, B. (2004). Support vector channel selection in BCI. IEEE Transactions on Biomedical Engineering, 51(6), 1003–1010.CrossRefGoogle Scholar
  48. 48.
    Malina, W. (1981). On an extended fisher criterion for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3(5), 611–614.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Zhanyu Ma
    • 1
    Email author
  • Arne Leijon
    • 2
  • Zheng-Hua Tan
    • 3
  • Sheng Gao
    • 1
  1. 1.Pattern Recognition and Intelligent System LaboratoryBeijing University of Posts and TelecommunicationsBeijingChina
  2. 2.School of Electrical EngineeringKTH - Royal Institute of TechnologyStockholmSweden
  3. 3.Department of Electronic SystemsAalborg UniversityAalborgDenmark

Personalised recommendations