## Abstract

In Bayesian analysis of a statistical model, the predictive distribution is obtained by marginalizing over the parameters with their posterior distributions. Compared to the frequently used point estimate plug-in method, the predictive distribution leads to a more reliable result in calculating the predictive likelihood of the new upcoming data, especially when the amount of training data is small. The Bayesian estimation of a Dirichlet mixture model (DMM) is, in general, not analytically tractable. In our previous work, we have proposed a global variational inference-based method for approximately calculating the posterior distributions of the parameters in the DMM analytically. In this paper, we extend our previous study for the DMM and propose an algorithm to calculate the predictive distribution of the DMM with the local variational inference (LVI) method. The true predictive distribution of the DMM is analytically intractable. By considering the concave property of the multivariate inverse beta function, we introduce an upper-bound to the true predictive distribution. As the global minimum of this upper-bound exists, the problem is reduced to seek an approximation to the true predictive distribution. The approximated predictive distribution obtained by minimizing the upper-bound is analytically tractable, facilitating the computation of the predictive likelihood. With synthesized data and real data evaluations, the good performance of the proposed LVI based method is demonstrated by comparing with some conventionally used methods.

This is a preview of subscription content, access via your institution.

## Notes

In an extreme case, if the posterior distribution has no variance, the point estimate has absolute certainty.

There was another Bayesian estimation method proposed in [28]. However, the method introduced in [28] used the multiple lower-bounds (MLB) approximation to derive an analytically tractable solution. Different from [28], the method presented in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished) used the single lower-bound (SLB) approximation. As discussed in Ma et al., Bayesian estimation of Dirichlet mixture model with varitional inference (unpublished), the MLB approximation based solution cannot guarantee the convergency, while the SLB approximation based solution is more concise and can guarantee the convergency.

If a function

*f*(*x*) is not convex in*x*but convex in ln*x*, it is called “convex relative to” ln*x*.To prevent confusion, we use

*f*(*x*;*a*) to denote the PDF of*x*parameterized by parameter*a*.*f*(*x*|*a*) is used to denote the conditional PDF of*x*given*a*, where both*x*and*a*are random variables. Both*f*(*x*;*a*) and*f*(*x*|*a*) have exactly the same mathematical expressions.\(\tilde {\mathbf {u}}_{\backslash j}\) denotes all the elements in \(\tilde {\mathbf {u}}\) except \(\tilde {u}_j\).

The KL divergence from

*f*(*x*) to*g*(*x*) is calculated as \(\text {KL}(f\|g)=\int f(x)\ln \frac {f(x)}{g(x)} dx\)⊘ is the element-wise division.

Here, the dimensionalities of the mDWT coefficients are the same for all the channels.

## References

Bjørnstad, J.F. (1990). Predictive likelihood: a review.

*Statistical Science*,*5*, 242–254.Bishop, C.M. (2006).

*Pattern recognition and machine learning*. New York: Springer.Sorenson, H.W. (1980).

*Parameter estimation: principles and problems*. New York: Marcel Dekker.Kamen, E.W., & Su, J. (1999).

*Introduction to optimal estimation, ser. Advanced textbooks in control and signal processing*. London: Springer.Gelman, A., Meng, X.-L., Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies.

*Statistica Sinica*,*6*, 733–807.Sinharay, S., & Stern, H.S. (2003). Posterior predictive model checking in hierarchical models.

*Journal of Statistical Planning and Inference*,*111*, 209–221.Patel, J.K., & Read C.B. (1996).

*Handbook of the normal distribution, ser. Statistics, textbooks and monographs*. Marcel Dekker.Jain, A.K., Duin, R.P.W., Mao, J. (2000). Statistical pattern recognition: a review.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*22*, 4–37.McLachlan, G., & Peel, D. (2000).

*Finite mixture models, ser. Wiley series in probability and statistics: applied probability and statistics*. Wiley.Figueiredo, M.A.T., & Jain, A.K. (2002). Unsupervised learning of finite mixture models.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*24*, 381–396.McLachlan, G.J., & Krishnan, T. (2008).

*The EM algorithm and extensions, ser. Wiley series in probability and statistics*. Wiley-Interscience.Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering.

*Biometrics*,*49*(3), 803–821.Ma, Z. (2011).

*Non-Gaussian statistical models and their applications*. Ph.D. dissertation, US-AB, Stockholm: KTH - Royal Institute of Technology.Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*33*(11), 2160–2173.Atapattu, S., Tellambura, C., Jiang, H. (2011). A mixture Gamma distribution to model the SNR of wireless channels.

*IEEE Transactions on Wireless Communications*,*10*(12), 4193–4203.Ma, Z., Leijon, A., Kleijn, W.B. (2013). Vector quantization of LSF parameters with a mixture of Dirichlet distributions.

*IEEE Transactions on Audio, Speech, and Language Processing*,*21*(9), 1777–1790.Bouguila, N., Ziou, D., Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application.

*IEEE Transactions on Image Processing*,*13*(11), 1533–1543.Blei, D.M. (2004).

*Probabilistic models of text and images*. Ph.D, dissertation. University of California, Berkeley.Rana, P.K., Ma, Z., Taghia, J., Flierl, M. (2013). Multiview depth map enhancement by variational Bayes inference estimation of Dirichlet mixture models. In

*Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP)*.Ma, Z., & Leijon, A. (2010). Modeling speech line spectral frequencies with Dirichlet mixture models. In

*Proceedings of INTERSPEECH*(pp. 2370–2373).Blei, D.M., Ng, A.Y., Jordan, M.I. (2003). Latent Dirichlet allocation.

*Journal of Machine Learning Research*,*3*, 993–1022.Blei, D.M., & Jordan, M.I. (2005). Variational inference for Dirichlet process mixtures.

*Bayesian Analysis*,*1*, 121–144.Orbanz, P., & Teh, Y.W. (2010). Bayesian nonparametric models.

*Encyclopedia of Machine Learning*, 88–89.Orbanz, P. (2010). Construction of nonparametric Bayesian models from parametric Bayes equations. In

*Advances in neural information processing systems*.Ghahramani Z. (2012). Bayesian non-parametrics and the probabilistic approach to modelling.

*Philosophical Transactions of the Royal Society A*,*371*.Minka, T.P. (2003). Estimating a Dirichlet distribution.

*Annals of Physics*,*2000*(8), 1–13.Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*29*(10), 1716–1731.Fan, W., Bouguila, N., Ziou, D. (2012). Variational learning for finite Dirichlet mixture models and applications.

*IEEE Transactions on Neural Networks and Learning Systems*,*23*(5), 762–774.Palmer, J.A. (2003). Relative convexity. ECE Dept., UCSD Tech. Rep.

Blei, D.M., & Lafferty, J.D. (2007). A correlated topic model of Science.

*The Annals of Applied Statistics*,*1*, 17–35.Jaakkola, T.S., & Jordan, M.I. (2000). Bayesian parameter estimation via variational methods.

*Statistics and Computing*,*10*, 25–37.Jaakkola, T.S. (2001). Tutorial on variational approximation methods. In M. Opper & D. Saad (Eds.),

*Advances in mean field methods*(pp. 129–159). Cambridge: MIT Press.Hoffman, M., Blei, D., Cook, P. (2010). Bayesian nonparametric matrix factorization for recorded music. In

*Proceedings of the international conference on machine learning*.Minka, T.P. (2001). Expectation propagation for approximate Bayesian inference. In

*Proceedings of the seventeenth conference on uncertainty in artificial intelligence*(pp. 362–369).Minka, T.P. (2001).

*A family of algorithms for approximate Bayesian inference*. Ph.D. dissertation. Massachusetts Institute of Technology.Ma, Z. (2012). Bayesian estimation of the Dirichlet distribution with expectation propagation. In

*Proceeding of the 20th European signal processing conference*(pp. 689–693).Ma, Z., & Leijon, A. (2011). Approximating the predictive distribution of the beta distribution with the local variational method. In

*Proceedings of IEEE international workshop on machine learning for signal processing*(pp. 1–6).Boyd, S., & Vandenberghe, L. (2004).

*Convex optimization*. Cambridge: Cambridge University Press.Brookes, M. (2013). The matrix reference manual. Available online: http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/intro.html. Accessed 9 Aug 2013.

Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., Arnaldi, B. (2007). A review of classification algorithms for EEG-based brain-computer interfaces.

*Journal of Neural Engineering*,*4*(2), R1.Prasad, S., Tan, Z.-H., Prasad, R., Cabrera, A.F., Gu, Y., Dremstrup, K. (2011). Feature selection strategy for classification of single-trial EEG elicited by motor imagery. In

*International symposium on wireless personal multimedia communications (WPMC)*(pp. 1–4).Ma, Z., Tan, Z.-H., Prasad, S. (2012). EEG signal classification with super-Dirichlet mixture model. In

*Proceedings of IEEE statistical signal processing workshop*(pp. 440–443).Subasi, A. (2007). EEG signal classification using wavelet feature extraction and a mixture of expert model.

*Expert Systems with Applications*,*32*(4), 1084–1093.Farina, D., Nascimento, O.F., Lucas, M.F., Doncarli, C. (2007). Optimization of wavelets for classification of movement-related cortical potentials generated by variation of force-related parameters.

*Journal of Neuroscience Methods*,*162*, 357–363.Ma, Z., & Leijon, A. (2011). Super-Dirichlet mixture models using differential line spectral frequences for text-independent speaker identification. In

*Proceedings of INTERSPEECH*(pp. 2349–2352).BCI competition III. http://www.bbci.de/competition/iii.

Lal, T.N., Schroder, M., Hinterberger, T., Weston, J., Bogdan, M., Birbaumer, N., Scholkopf, B. (2004). Support vector channel selection in BCI.

*IEEE Transactions on Biomedical Engineering*,*51*(6), 1003–1010.Malina, W. (1981). On an extended fisher criterion for feature selection.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*3*(5), 611–614.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

## About this article

### Cite this article

Ma, Z., Leijon, A., Tan, ZH. *et al.* Predictive Distribution of the Dirichlet Mixture Model by Local Variational Inference.
*J Sign Process Syst * **74**, 359–374 (2014). https://doi.org/10.1007/s11265-013-0769-8

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11265-013-0769-8

### Keywords

- Predictive distribution
- Dirichlet mixture model
- Bayesian inference
- Local variational inference