Skip to main content
Log in

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction over fifty years ago, and is now commonly utilized within a family setting. Families of mixture models arise when the component parameters, usually the component covariance (or scale) matrices, are decomposed and a number of constraints are imposed. Within the family setting, model selection involves choosing the member of the family, i.e., the appropriate covariance structure, in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for model selection, and the expectation-maximization (EM) algorithm is usually used for parameter estimation. In fact, this EM-BIC rubric has virtually monopolized the literature on families of mixture models. Deviating from this rubric, variational Bayes approximations are developed for parameter estimation and the deviance information criteria (DIC) for model selection. The variational Bayes approach provides an alternate framework for parameter estimation by constructing a tight lower bound on the complex marginal likelihood and maximizing this lower bound by minimizing the associated Kullback-Leibler divergence. The framework introduced, which we refer to as VB-DIC, is applied to the most commonly used family of Gaussian mixture models, and real and simulated data are used to compared with the EM-BIC rubric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14–22.

    MATH  Google Scholar 

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716–723.

    MathSciNet  MATH  Google Scholar 

  • Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 (3), 803–821.

    MathSciNet  MATH  Google Scholar 

  • Bensmail, H., Celeux, G., Raftery, A.E., Robert, C.P. (1997). Inference in model-based cluster analysis. Statistics and Computing, 7, 1–10.

    Google Scholar 

  • Biernacki, C., Celeux, G., Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (7), 719–725.

    Google Scholar 

  • Biernacki, C., & Lourme, A. (2019). Unifying data units and models in (co-)clustering. Advances in Data Analysis and Classification, 13 (1), 7–31.

    MathSciNet  MATH  Google Scholar 

  • Bingham, C. (1974). An antipodally symmetric distribution on the sphere. The Annals of Statistics, 2 (6), 1201–1225.

    MathSciNet  MATH  Google Scholar 

  • Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112 (518), 859–877.

    MathSciNet  Google Scholar 

  • Bock, H.H. (1996). Probabilistic models in cluster analysis. Computational Statistics and Data Analysis, 23, 5–28.

    MATH  Google Scholar 

  • Bock, H.H. (1998a). Data science, classification and related methods, (pp. 3–21). New York: Springer-Verlag.

    Google Scholar 

  • Bock, H.H. (1998b). Probabilistic approaches in cluster analysis. Bulletin of the International Statistical Institute, 57, 603–606.

    MATH  Google Scholar 

  • Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.

    MATH  Google Scholar 

  • Boulesteix, A.-L., Durif, G., Lambert-Lacroix, S., Peyre, J., Strimmer, K. (2018). plsgenomics: PLS Analyses for Genomics. R package version 1.5-2.

  • Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Computational Statistics and Data Analysis, 71, 52–78.

    MathSciNet  MATH  Google Scholar 

  • Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8 (2), 217–226.

    MathSciNet  MATH  Google Scholar 

  • Casella, G., Mengersen, K., Robert, C., Titterington, D. (2002). Perfect samplers for mixtures of distributions. Journal of the Royal Statistical Society: Series B, 64, 777–790.

    MathSciNet  MATH  Google Scholar 

  • Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28, 781–793.

    Google Scholar 

  • Celeux, G., Hurn, M., Robert, C. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970.

    MathSciNet  MATH  Google Scholar 

  • Cheam, A.S.M., Marbac, M., McNicholas, P.D. (2017). Model-based clustering for spatiotemporal data on air quality monitoring. Environmetrics, 93, 192–206.

    MathSciNet  Google Scholar 

  • Corduneanu, A., & Bishop, C. (2001). Variational Bayesian model selection for mixture distributions. In Artificial intelligence and statistics (pp. 27–34). Los Altos: Morgan Kaufmann.

  • Dang, U.J., Browne, R.P., McNicholas, P.D. (2015). Mixtures of multivariate power exponential distributions. Biometrics, 71 (4), 1081–1089.

    MathSciNet  MATH  Google Scholar 

  • Dang, U.J., Punzo, A., McNicholas, P.D., Ingrassia, S., Browne, R.P. (2017). Multivariate response and parsimony for Gaussian cluster-weighted models. Journal of Classification, 34 (1), 4–34.

    MathSciNet  MATH  Google Scholar 

  • Day, N.E. (1969). Estimating the components of a mixture of normal distributions. Biometrika, 56 (3), 463–474.

    MathSciNet  MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39 (1), 1–38.

    MathSciNet  MATH  Google Scholar 

  • Diebolt, J., & Robert, C. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society: Series B, 56, 363–375.

    MathSciNet  MATH  Google Scholar 

  • Fraley, C., & Raftery, A.E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.

    MathSciNet  MATH  Google Scholar 

  • Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (6), 1149–1157.

    Google Scholar 

  • Gallaugher, M.P.B., & McNicholas, P.D. (2018a). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83–93.

    Google Scholar 

  • Gallaugher, M.P.B., & McNicholas, P.D. (2018b). A mixture of matrix variate bilinear factor analyzers. In: Proceedings of the Joint Statistical Meetings. Alexandria, VA: American Statistical Association. Also available as arXiv preprint. arXiv:1712.08664v3.

  • Gallaugher, M.P.B., & McNicholas, P.D. (2019a). Mixtures of skewed matrix variate bilinear factor analyzers. Advances in Data Analysis and Classification. To appear. https://doi.org/10.1007/s11634-019-00377-4.

  • Gallaugher, M.P.B., & McNicholas, P.D. (2019b). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification, 36 (2), 232–265.

    MathSciNet  MATH  Google Scholar 

  • Gelman, A., Stern, H.S., Carlin, J.B., Dunson, D.B., Vehtari, A., Rubin, D.B. (2013). Bayesian data analysis. Boca Raton: Chapman and Hall/CRC Press.

    MATH  Google Scholar 

  • Gupta, A., & Nagar, D. (2000). Matrix variate distributions. Boca Raton: Chapman & Hall/CRC Press.

    MATH  Google Scholar 

  • Hartigan, J.A., & Wong, M.A. (1979). A k-means clustering algorithm. Applied Statistics, 28 (1), 100–108.

    MATH  Google Scholar 

  • Hasselblad, V. (1966). Estimation of parameters for a mixture of normal distributions. Technometrics, 8 (3), 431–444.

    MathSciNet  Google Scholar 

  • Hoff, P. (2012). rstiefel: random orthonormal matrix generation on the Stiefel manifold. R package version 0.9.

  • Hoff, P.D. (2009). Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data. Journal of Computational and Graphical Statistics, 18 (2), 438–456.

    MathSciNet  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    MATH  Google Scholar 

  • Jasra, A., Holmes, C.C., Stephens, D.A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Journal of the Royal Statistical Society: Series B, 10 (1), 50–67.

    MathSciNet  MATH  Google Scholar 

  • Jordan, M., Ghahramani, Z., Jaakkola, T., Saul, L. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.

    MATH  Google Scholar 

  • Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24, 181–202.

    MathSciNet  MATH  Google Scholar 

  • Lee, S.X., & McLachlan, G.J. (2016). Finite mixtures of canonical fundamental skew t-distributions – the unification of the restricted and unrestricted skew t-mixture models. Statistics and Computing, 26 (3), 573–589.

    MathSciNet  MATH  Google Scholar 

  • Lin, T., McLachlan, G.J., Lee, S.X. (2016). Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Journal of Multivariate Analysis, 143, 398–413.

    MathSciNet  MATH  Google Scholar 

  • Lin, T. -I., McNicholas, P. D., Hsiu, J. H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80–87.

    MathSciNet  MATH  Google Scholar 

  • MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.

  • McGrory, C., & Titterington, D. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics and Data Analysis, 51, 5352–5367.

    MathSciNet  MATH  Google Scholar 

  • McGrory, C., & Titterington, D. (2009). Variational Bayesian analysis for hidden Markov models. Australian and New Zealand Journal of Statistics, 51, 227–244.

    MathSciNet  MATH  Google Scholar 

  • McGrory, C., Titterington, D., Pettitt, A. (2009). Variational Bayes for estimating the parameters of a hidden Potts model. Computational Statistics and Data Analysis, 19 (3), 329–340.

    MathSciNet  Google Scholar 

  • McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, 2nd edn. New York: Wiley.

    MATH  Google Scholar 

  • McNicholas, P.D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140 (5), 1175–1181.

    MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D. (2016a). Mixture model-based classification. Boca Raton: Chapman & Hall/CRC Press.

    MATH  Google Scholar 

  • McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331–373.

    MathSciNet  MATH  Google Scholar 

  • McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18, 285–296.

    MathSciNet  Google Scholar 

  • McNicholas, P.D., & Murphy, T.B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26 (21), 2705–2712.

    Google Scholar 

  • Melnykov, V., & Zhu, X. (2018). On model-based clustering of skewed matrix data. Journal of Multivariate Analysis, 167, 181–194.

    MathSciNet  MATH  Google Scholar 

  • Morris, K., & McNicholas, P.D. (2016). Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures. Computational Statistics and Data Analysis, 97, 133–150.

    MathSciNet  MATH  Google Scholar 

  • Morris, K., Punzo, A., McNicholas, P.D., Browne, R.P. (2019). Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., Browne, R.B., McNicholas, P.D. (2014a). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.

    MathSciNet  MATH  Google Scholar 

  • Murray, P.M., Browne, R.P., McNicholas, P.D. (2019). Mixtures of hidden truncation hyperbolic factor analyzers. Journal of Classification. To appear. https://doi.org/10.1007/s00357-019-9309-y.

  • Murray, P.M., McNicholas, P.D., Browne, R.P. (2014b). A mixture of common skew-t factor analyzers. Stat, 3 (1), 68–82.

    MathSciNet  MATH  Google Scholar 

  • Neath, R.C., & et al. (2013). On convergence properties of the Monte Carlo EM algorithm. In: Advances in modern statistical theory and applications: a Festschrift in Honor of Morris L. Eaton, pp.43–62. Institute of Mathematical Statistics.

  • O’Hagan, A., Murphy, T.B., Gormley, I.C., McNicholas, P.D., Karlis, D. (2016). Clustering with the multivariate normal inverse Gaussian distribution. Computational Statistics and Data Analysis, 93, 18–30.

    MathSciNet  MATH  Google Scholar 

  • Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A, 185, 71–110.

    MATH  Google Scholar 

  • Punzo, A., Blostein, M., McNicholas, P.D. (2020). High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition, 98, 107031.

    Google Scholar 

  • R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.

    Google Scholar 

  • Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.

    Google Scholar 

  • Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B, 59, 731–792.

    MathSciNet  MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461–464.

    MathSciNet  MATH  Google Scholar 

  • Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8 (1), 205–233.

    Google Scholar 

  • Spiegelhalter, D., Best, N., Carlin, B., Van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society: Series B, 64, 583–639.

    MathSciNet  MATH  Google Scholar 

  • Stephens, M. (1997). Bayesian methods for mixtures of normal distributions. Oxford: Ph.D. thesis University of Oxford.

    Google Scholar 

  • Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components — an alternative to reversible jump methods. The Annals of Statistics, 28, 40–74.

    MathSciNet  MATH  Google Scholar 

  • Subedi, S., & McNicholas, P.D. (2014). Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Advances in Data Analysis and Classification, 8 (2), 167–193.

    MathSciNet  MATH  Google Scholar 

  • Subedi, S., Punzo, A., Ingrassia, S., McNicholas, P.D. (2015). Cluster-weighed t-factor analyzers for robust model-based clustering and dimension reduction. Statistical Methods and Applications, 24 (4), 623–649.

    MathSciNet  MATH  Google Scholar 

  • Titterington, D.M., Smith, A.F.M., Makov, U.E. (1985). Statistical analysis of finite mixture distributions. Chichester: John Wiley & Sons.

    MATH  Google Scholar 

  • Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36 (1), 26–57.

    MathSciNet  MATH  Google Scholar 

  • Ueda, N., & Ghahramani, Z. (2002). Bayesian model search for mixture models based on optimizing variational bounds. Neural Networks, 15, 1223–1241.

    Google Scholar 

  • Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer.

    MATH  Google Scholar 

  • Viroli, C. (2011). Finite mixtures of matrix normal distributions for classifying three-way data. Statistics and Computing, 21 (4), 511–522.

    MathSciNet  MATH  Google Scholar 

  • Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196–210.

    MathSciNet  MATH  Google Scholar 

  • Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32 (3), 359–381.

    MathSciNet  MATH  Google Scholar 

  • Wang, X., He, C.Z., Sun, D. (2005). Bayesian inference on the patient population size given list mismatches. Statistics in Medicine, 24 (2), 249–267.

    MathSciNet  Google Scholar 

  • Wolfe, J.H. (1965). A computer program for the maximum likelihood analysis of types. Technical Bulletin 65–15, U.S.Naval Personnel Research Activity.

  • Zhu, X., & Melnykov, V. (2018). Manly transformation in finite mixture modeling. Computational Statistics & Data Analysis, 121, 190–208.

    MathSciNet  MATH  Google Scholar 

Download references

Funding

This work was supported by a Postgraduate Scholarship from the Natural Science and Engineering Research Council of Canada (Subedi); the Canada Research Chairs program (McNicholas); and an E.W.R. Steacie Memorial Fellowship (McNicholas).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanjeena Subedi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Posterior distributions for the parameters of eigen-decomposed covariance matrix

Table 6 Posterior distributions of the precision parameters as well as their corresponding parameters for 12 of the members of the GPCM family

Appendix B: Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix

Table 7 Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix for 12 of the members of the GPCM family

Appendix C: Mathematical details for the EEV and VEV Models

3.1 C.1 EEV Model

The mixing proportions were assigned a Dirichlet prior distribution, such that

$$ q_{\rho}(\boldsymbol{\rho})=\text{Dir}(\boldsymbol{\rho};\alpha_{1}^{(0)},\ldots,\alpha_{G}^{(0)}). $$

For the mean, a Gaussian distribution conditional on the covariance matrix was used, such that

$$ q_{\boldsymbol{\mu}}(\boldsymbol{\mu} \mid\lambda,\mathbf{A},\mathbf{D}_{1},\ldots,\mathbf{D}_{G})=\prod\limits_{g=1}^{G}\phi_{d}(\boldsymbol{\mu}_{g};\mathbf{m}_{g}^{(0)},(\beta_{g}^{(0)-1}\lambda\mathbf{D}_{g}\mathbf{A}\mathbf{D}_{g}^{\prime})). $$

For the parameters of the covariance matrix, the following priors were used: the k th diagonal elements of (λA)− 1 were assigned a Gamma \((a^{(0)}_{k},b^{(0)}_{k})\) distribution and Dg was assigned a matrix von Mises-Fisher \((\mathbf {C}^{(0)}_{g})\) distribution. By setting τ = (λA)− 1, its prior can be written

$$ p_{\tau}(\boldsymbol{\tau})\propto \prod\limits_{k=1}^{K}\tau_{k}^{\frac{a^{(0)}_{k}}{2}-1} \exp\left\{-\frac{b^{(0)}_{k}}{2}\tau_{k}\right\}, $$

where τk is the k th diagonal element of τ = (λA)− 1.

The matrix D has a density as defined by Gupta and Nagar (2000):

$$ p(\mathbf{D}) = b(\mathbf{Q}^{(0)},\mathbf{P}_{g}^{(0)})\exp(\text{tr}\{\mathbf{Q}^{(0)}\mathbf{D}\mathbf{P}_{g}^{(0)}\mathbf{D}^{\prime}\}) [d\mathbf{D}], $$

for DO(d,d), where O(d,d) is the Stiefel manifold of d × d matrices, [dD] is the unit invariant measure on O(d,d), and A(0) and \(\mathbf {B}_{g}^{(0)}\) are symmetric and diagonal matrices, respectively.

The joint distribution of μ1,…,μG, τ, and D is

$$ \begin{array}{@{}rcl@{}} &&p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D}) \propto\prod\limits_{g=1}^{G}|\beta_{g}^{(0)}\boldsymbol{\tau}|^{\frac{1}{2}} \exp \left\{\frac{-(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}{2} \right\}\\ &&\times\exp \left\{\text{tr}(\mathbf{Q}^{(0)}\mathbf{D}\mathbf{P}_{g}^{(0)}\mathbf{D}^{\prime})\right\}\prod\limits_{k=1}^{K}\tau_{k}^{\frac{a^{(0)}_{k}}{2}-1} \exp\left\{-\frac{b^{(0)}_{k}}{2}\tau_{k}\right\}. \end{array} $$

The likelihood of the data can be written

$$ \mathcal{L}(\boldsymbol{\mu}_{1},\!\ldots\!,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D} \mid \mathbf{y}_{1}, \!\ldots\!, \mathbf{y}_{n}) \!\propto\!\prod\limits_{i=1}^{n}\prod\limits_{g=1}^{G}|\boldsymbol{\tau}|^{{\hat{z}_{ig}}/{2}} \exp\! \left\{\!-\frac{\hat{z}_{ig}}{2}(\mathbf{y}_{i} - \boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\mathbf{y}_{i} - \boldsymbol{\mu}_{g})'\right\}.$$

Therefore, the joint posterior distribution of μ, τ, and D can be written

$$p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \!\mathbf{D} \!\mid \mathbf{y}_{1}, \ldots, \mathbf{y}_{n}) \!\propto\! p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D}) \mathcal{L}(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D} \!\mid\! \mathbf{y}_{1}, \ldots, \mathbf{y}_{n}).$$

Thus, the posterior distribution of mean becomes

$$q_{\boldsymbol{\mu}}(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G} \mid \boldsymbol{\tau},\mathbf{D}_{1},\ldots,\mathbf{D}_{G})=\prod\limits_{g=1}^{G}\phi_{d}(\boldsymbol{\mu}_{g};\mathbf{m}_{g},(\beta_{g}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}\mathbf{D}_{g})^{-1}),$$

where \(\beta _{g} = \beta _{g}^{(0)}+{\sum }_{i=1}^{n}\hat {z}_{ig}\) and

$$\mathbf{m}_{g} =\frac{1}{\beta_{g}}\left( \beta_{g}^{(0)} \mathbf{m}_{g}^{(0)}+ \sum\limits_{i=1}^{n}\hat{z}_{ig}\mathbf{y}_{i}\right).$$

The posterior distribution for the k th diagonal element of τ = (λA)− 1 is

$$q_{\tau}(\tau_{k}) = \text{Gamma} (a_{k},b_{k})$$

where \(a_{k}=a_{k}^{(0)}+d{\sum }_{g=1}^{G}{\sum }_{i=1}^{n}\hat {z}_{ig}=a_{k}^{(0)}+dn\) and

$$b_{k}=b_{k}^{(0)}+\sum\limits_{g=1}^{G}\left( \sum\limits_{i=1}^{n}\hat{z}_{ig}y_{ik}^{2}+\beta_{g}^{(0)}m_{gk}^{2}- \beta_{g}m_{gk}^{2}\right).$$

We have

$$ \begin{array}{@{}rcl@{}} q(\mathbf{D}_{g}|\mathbf{y};\boldsymbol{\mu}_{g},\boldsymbol{\tau})&\propto& \exp\left\{ \text{tr} \left (-\frac{1}{2}{(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}\right ) \right \} \\ &&\times\exp\left\{ \text{tr} \left (-\frac{1}{2}{\sum\limits_{i=1}^{n}z_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\mathbf{y}-\boldsymbol{\mu}_{g})'}+\mathbf{Q}_{g}^{(0)}\mathbf{D}_{g}\mathbf{P}_{g}^{(0)}\mathbf{D}_{g}^{\prime}\right )\right \}, \end{array} $$

which has the functional form of a Bingham matrix distribution, i.e., the form

$$\exp \left\{{\text{tr}(\mathbf{Q}_{g} \mathbf{D}_{g}\mathbf{P}_{g} \mathbf{D}_{g}^{\prime}})\right\},$$

where \(\mathbf {Q}_{g} = \mathbf {Q}_{g}^{(0)}+\boldsymbol {\tau }\) and

$$ \mathbf{P}_{g} = \mathbf{P}_{g}^{(0)}-\frac{1}{2}\left[\sum\limits_{i=1}^{n}z_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})(\mathbf{y}-\boldsymbol{\mu}_{g})^{\prime}+(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'\right]. $$

3.2 C.2 VEV Model

Similarly, the posterior distribution of Dg for the VEV model has the form

$$ \begin{array}{@{}rcl@{}} q(\mathbf{D}_{g}|\mathbf{y};\boldsymbol{\mu}_{g},\boldsymbol{\tau}_{g})\!\!&\propto&\!\! \exp\left\{ \text{tr} \left (-\frac{1}{2}{(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}_{g} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}\right ) \right \} \\ &&\!\!\times \exp\left\{ \text{tr} \left (-\frac{1}{2} \sum\limits_{i=1}^{n}\hat{z}_{ig}(\mathbf{y}-\!\boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}_{g} \mathbf{D}_{g}(\mathbf{y}\!-\boldsymbol{\mu}_{g})'+\mathbf{Q}_{g}^{(0)}\mathbf{D}_{g}\mathbf{P}_{g}^{(0)}\mathbf{D}_{g}^{\prime}\right )\right \}, \end{array} $$

which has the functional form of a Bingham matrix distribution, i.e., the form

$$\exp\left\{{\text{tr}(\mathbf{Q}_{g} \mathbf{D}_{g}\mathbf{P}_{g} \mathbf{D}_{g}^{\prime}})\right\},$$

where \(\mathbf {Q}_{g}=\mathbf {Q}_{g}^{(0)}+\boldsymbol {\tau }_{g}\) and

$$\mathbf{P}_{g} = -\frac{1}{2}\left[\sum\limits_{i=1}^{n}\hat{z}_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})(\mathbf{y}-\boldsymbol{\mu}_{g})'+(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'\right].$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Subedi, S., McNicholas, P.D. A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting. J Classif 38, 89–108 (2021). https://doi.org/10.1007/s00357-019-09351-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09351-3

Keywords

Navigation