A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Subedi, Sanjeena; McNicholas, Paul D.

doi:10.1007/s00357-019-09351-3

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Published: 04 March 2020

Volume 38, pages 89–108, (2021)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Sanjeena Subedi¹ &
Paul D. McNicholas²

296 Accesses
6 Citations
Explore all metrics

Abstract

Mixture model-based clustering has become an increasingly popular data analysis technique since its introduction over fifty years ago, and is now commonly utilized within a family setting. Families of mixture models arise when the component parameters, usually the component covariance (or scale) matrices, are decomposed and a number of constraints are imposed. Within the family setting, model selection involves choosing the member of the family, i.e., the appropriate covariance structure, in addition to the number of mixture components. To date, the Bayesian information criterion (BIC) has proved most effective for model selection, and the expectation-maximization (EM) algorithm is usually used for parameter estimation. In fact, this EM-BIC rubric has virtually monopolized the literature on families of mixture models. Deviating from this rubric, variational Bayes approximations are developed for parameter estimation and the deviance information criteria (DIC) for model selection. The variational Bayes approach provides an alternate framework for parameter estimation by constructing a tight lower bound on the complex marginal likelihood and maximizing this lower bound by minimizing the associated Kullback-Leibler divergence. The framework introduced, which we refer to as VB-DIC, is applied to the most commonly used family of Gaussian mixture models, and real and simulated data are used to compared with the EM-BIC rubric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Article 14 January 2022

Salvatore D. Tomarchio, Luca Bagnato & Antonio Punzo

Recent Developments in Model-Based Clustering with Applications

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14–22.
MATH Google Scholar
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19 (6), 716–723.
MathSciNet MATH Google Scholar
Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49 (3), 803–821.
MathSciNet MATH Google Scholar
Bensmail, H., Celeux, G., Raftery, A.E., Robert, C.P. (1997). Inference in model-based cluster analysis. Statistics and Computing, 7, 1–10.
Google Scholar
Biernacki, C., Celeux, G., Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (7), 719–725.
Google Scholar
Biernacki, C., & Lourme, A. (2019). Unifying data units and models in (co-)clustering. Advances in Data Analysis and Classification, 13 (1), 7–31.
MathSciNet MATH Google Scholar
Bingham, C. (1974). An antipodally symmetric distribution on the sphere. The Annals of Statistics, 2 (6), 1201–1225.
MathSciNet MATH Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112 (518), 859–877.
MathSciNet Google Scholar
Bock, H.H. (1996). Probabilistic models in cluster analysis. Computational Statistics and Data Analysis, 23, 5–28.
MATH Google Scholar
Bock, H.H. (1998a). Data science, classification and related methods, (pp. 3–21). New York: Springer-Verlag.
Google Scholar
Bock, H.H. (1998b). Probabilistic approaches in cluster analysis. Bulletin of the International Statistical Institute, 57, 603–606.
MATH Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.
MATH Google Scholar
Boulesteix, A.-L., Durif, G., Lambert-Lacroix, S., Peyre, J., Strimmer, K. (2018). plsgenomics: PLS Analyses for Genomics. R package version 1.5-2.
Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Computational Statistics and Data Analysis, 71, 52–78.
MathSciNet MATH Google Scholar
Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8 (2), 217–226.
MathSciNet MATH Google Scholar
Casella, G., Mengersen, K., Robert, C., Titterington, D. (2002). Perfect samplers for mixtures of distributions. Journal of the Royal Statistical Society: Series B, 64, 777–790.
MathSciNet MATH Google Scholar
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28, 781–793.
Google Scholar
Celeux, G., Hurn, M., Robert, C. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, 957–970.
MathSciNet MATH Google Scholar
Cheam, A.S.M., Marbac, M., McNicholas, P.D. (2017). Model-based clustering for spatiotemporal data on air quality monitoring. Environmetrics, 93, 192–206.
MathSciNet Google Scholar
Corduneanu, A., & Bishop, C. (2001). Variational Bayesian model selection for mixture distributions. In Artificial intelligence and statistics (pp. 27–34). Los Altos: Morgan Kaufmann.
Dang, U.J., Browne, R.P., McNicholas, P.D. (2015). Mixtures of multivariate power exponential distributions. Biometrics, 71 (4), 1081–1089.
MathSciNet MATH Google Scholar
Dang, U.J., Punzo, A., McNicholas, P.D., Ingrassia, S., Browne, R.P. (2017). Multivariate response and parsimony for Gaussian cluster-weighted models. Journal of Classification, 34 (1), 4–34.
MathSciNet MATH Google Scholar
Day, N.E. (1969). Estimating the components of a mixture of normal distributions. Biometrika, 56 (3), 463–474.
MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39 (1), 1–38.
MathSciNet MATH Google Scholar
Diebolt, J., & Robert, C. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society: Series B, 56, 363–375.
MathSciNet MATH Google Scholar
Fraley, C., & Raftery, A.E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.
MathSciNet MATH Google Scholar
Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (6), 1149–1157.
Google Scholar
Gallaugher, M.P.B., & McNicholas, P.D. (2018a). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83–93.
Google Scholar
Gallaugher, M.P.B., & McNicholas, P.D. (2018b). A mixture of matrix variate bilinear factor analyzers. In: Proceedings of the Joint Statistical Meetings. Alexandria, VA: American Statistical Association. Also available as arXiv preprint. arXiv:1712.08664v3.
Gallaugher, M.P.B., & McNicholas, P.D. (2019a). Mixtures of skewed matrix variate bilinear factor analyzers. Advances in Data Analysis and Classification. To appear. https://doi.org/10.1007/s11634-019-00377-4.
Gallaugher, M.P.B., & McNicholas, P.D. (2019b). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification, 36 (2), 232–265.
MathSciNet MATH Google Scholar
Gelman, A., Stern, H.S., Carlin, J.B., Dunson, D.B., Vehtari, A., Rubin, D.B. (2013). Bayesian data analysis. Boca Raton: Chapman and Hall/CRC Press.
MATH Google Scholar
Gupta, A., & Nagar, D. (2000). Matrix variate distributions. Boca Raton: Chapman & Hall/CRC Press.
MATH Google Scholar
Hartigan, J.A., & Wong, M.A. (1979). A k-means clustering algorithm. Applied Statistics, 28 (1), 100–108.
MATH Google Scholar
Hasselblad, V. (1966). Estimation of parameters for a mixture of normal distributions. Technometrics, 8 (3), 431–444.
MathSciNet Google Scholar
Hoff, P. (2012). rstiefel: random orthonormal matrix generation on the Stiefel manifold. R package version 0.9.
Hoff, P.D. (2009). Simulation of the matrix Bingham-von Mises-Fisher distribution, with applications to multivariate and relational data. Journal of Computational and Graphical Statistics, 18 (2), 438–456.
MathSciNet Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
MATH Google Scholar
Jasra, A., Holmes, C.C., Stephens, D.A. (2005). Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Journal of the Royal Statistical Society: Series B, 10 (1), 50–67.
MathSciNet MATH Google Scholar
Jordan, M., Ghahramani, Z., Jaakkola, T., Saul, L. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.
MATH Google Scholar
Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24, 181–202.
MathSciNet MATH Google Scholar
Lee, S.X., & McLachlan, G.J. (2016). Finite mixtures of canonical fundamental skew t-distributions – the unification of the restricted and unrestricted skew t-mixture models. Statistics and Computing, 26 (3), 573–589.
MathSciNet MATH Google Scholar
Lin, T., McLachlan, G.J., Lee, S.X. (2016). Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Journal of Multivariate Analysis, 143, 398–413.
MathSciNet MATH Google Scholar
Lin, T. -I., McNicholas, P. D., Hsiu, J. H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80–87.
MathSciNet MATH Google Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.
McGrory, C., & Titterington, D. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics and Data Analysis, 51, 5352–5367.
MathSciNet MATH Google Scholar
McGrory, C., & Titterington, D. (2009). Variational Bayesian analysis for hidden Markov models. Australian and New Zealand Journal of Statistics, 51, 227–244.
MathSciNet MATH Google Scholar
McGrory, C., Titterington, D., Pettitt, A. (2009). Variational Bayes for estimating the parameters of a hidden Potts model. Computational Statistics and Data Analysis, 19 (3), 329–340.
MathSciNet Google Scholar
McLachlan, G.J., & Krishnan, T. (2008). The EM algorithm and extensions, 2nd edn. New York: Wiley.
MATH Google Scholar
McNicholas, P.D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference, 140 (5), 1175–1181.
MathSciNet MATH Google Scholar
McNicholas, P.D. (2016a). Mixture model-based classification. Boca Raton: Chapman & Hall/CRC Press.
MATH Google Scholar
McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331–373.
MathSciNet MATH Google Scholar
McNicholas, P.D., & Murphy, T.B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18, 285–296.
MathSciNet Google Scholar
McNicholas, P.D., & Murphy, T.B. (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics, 26 (21), 2705–2712.
Google Scholar
Melnykov, V., & Zhu, X. (2018). On model-based clustering of skewed matrix data. Journal of Multivariate Analysis, 167, 181–194.
MathSciNet MATH Google Scholar
Morris, K., & McNicholas, P.D. (2016). Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures. Computational Statistics and Data Analysis, 97, 133–150.
MathSciNet MATH Google Scholar
Morris, K., Punzo, A., McNicholas, P.D., Browne, R.P. (2019). Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric Laplace distributions. Computational Statistics and Data Analysis, 132, 145–166.
MathSciNet MATH Google Scholar
Murray, P.M., Browne, R.B., McNicholas, P.D. (2014a). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.
MathSciNet MATH Google Scholar
Murray, P.M., Browne, R.P., McNicholas, P.D. (2019). Mixtures of hidden truncation hyperbolic factor analyzers. Journal of Classification. To appear. https://doi.org/10.1007/s00357-019-9309-y.
Murray, P.M., McNicholas, P.D., Browne, R.P. (2014b). A mixture of common skew-t factor analyzers. Stat, 3 (1), 68–82.
MathSciNet MATH Google Scholar
Neath, R.C., & et al. (2013). On convergence properties of the Monte Carlo EM algorithm. In: Advances in modern statistical theory and applications: a Festschrift in Honor of Morris L. Eaton, pp.43–62. Institute of Mathematical Statistics.
O’Hagan, A., Murphy, T.B., Gormley, I.C., McNicholas, P.D., Karlis, D. (2016). Clustering with the multivariate normal inverse Gaussian distribution. Computational Statistics and Data Analysis, 93, 18–30.
MathSciNet MATH Google Scholar
Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London A, 185, 71–110.
MATH Google Scholar
Punzo, A., Blostein, M., McNicholas, P.D. (2020). High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognition, 98, 107031.
Google Scholar
R Core Team. (2018). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Google Scholar
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.
Google Scholar
Richardson, S., & Green, P. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B, 59, 731–792.
MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6 (2), 461–464.
MathSciNet MATH Google Scholar
Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8 (1), 205–233.
Google Scholar
Spiegelhalter, D., Best, N., Carlin, B., Van der Linde, A. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society: Series B, 64, 583–639.
MathSciNet MATH Google Scholar
Stephens, M. (1997). Bayesian methods for mixtures of normal distributions. Oxford: Ph.D. thesis University of Oxford.
Google Scholar
Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components — an alternative to reversible jump methods. The Annals of Statistics, 28, 40–74.
MathSciNet MATH Google Scholar
Subedi, S., & McNicholas, P.D. (2014). Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions. Advances in Data Analysis and Classification, 8 (2), 167–193.
MathSciNet MATH Google Scholar
Subedi, S., Punzo, A., Ingrassia, S., McNicholas, P.D. (2015). Cluster-weighed t-factor analyzers for robust model-based clustering and dimension reduction. Statistical Methods and Applications, 24 (4), 623–649.
MathSciNet MATH Google Scholar
Titterington, D.M., Smith, A.F.M., Makov, U.E. (1985). Statistical analysis of finite mixture distributions. Chichester: John Wiley & Sons.
MATH Google Scholar
Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2019). A mixture of coalesced generalized hyperbolic distributions. Journal of Classification, 36 (1), 26–57.
MathSciNet MATH Google Scholar
Ueda, N., & Ghahramani, Z. (2002). Bayesian model search for mixture models based on optimizing variational bounds. Neural Networks, 15, 1223–1241.
Google Scholar
Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer.
MATH Google Scholar
Viroli, C. (2011). Finite mixtures of matrix normal distributions for classifying three-way data. Statistics and Computing, 21 (4), 511–522.
MathSciNet MATH Google Scholar
Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196–210.
MathSciNet MATH Google Scholar
Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32 (3), 359–381.
MathSciNet MATH Google Scholar
Wang, X., He, C.Z., Sun, D. (2005). Bayesian inference on the patient population size given list mismatches. Statistics in Medicine, 24 (2), 249–267.
MathSciNet Google Scholar
Wolfe, J.H. (1965). A computer program for the maximum likelihood analysis of types. Technical Bulletin 65–15, U.S.Naval Personnel Research Activity.
Zhu, X., & Melnykov, V. (2018). Manly transformation in finite mixture modeling. Computational Statistics & Data Analysis, 121, 190–208.
MathSciNet MATH Google Scholar

Download references

Funding

This work was supported by a Postgraduate Scholarship from the Natural Science and Engineering Research Council of Canada (Subedi); the Canada Research Chairs program (McNicholas); and an E.W.R. Steacie Memorial Fellowship (McNicholas).

Author information

Authors and Affiliations

Department of Mathematical Sciences, Binghamton University, State University of New York, 4400 Vestal Parkway East, Binghamton, NY, 13902, USA
Sanjeena Subedi
Department of Mathematics & Statistics, McMaster University, 1280 Main St. W., Hamilton, ON, L8S 4K1, Canada
Paul D. McNicholas

Authors

Sanjeena Subedi
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. McNicholas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjeena Subedi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Posterior distributions for the parameters of eigen-decomposed covariance matrix

Table 6 Posterior distributions of the precision parameters as well as their corresponding parameters for 12 of the members of the GPCM family

Full size table

Appendix B: Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix

Table 7 Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix for 12 of the members of the GPCM family

Full size table

Appendix C: Mathematical details for the EEV and VEV Models

3.1 C.1 EEV Model

The mixing proportions were assigned a Dirichlet prior distribution, such that

$$ q_{\rho}(\boldsymbol{\rho})=\text{Dir}(\boldsymbol{\rho};\alpha_{1}^{(0)},\ldots,\alpha_{G}^{(0)}). $$

For the mean, a Gaussian distribution conditional on the covariance matrix was used, such that

$$ q_{\boldsymbol{\mu}}(\boldsymbol{\mu} \mid\lambda,\mathbf{A},\mathbf{D}_{1},\ldots,\mathbf{D}_{G})=\prod\limits_{g=1}^{G}\phi_{d}(\boldsymbol{\mu}_{g};\mathbf{m}_{g}^{(0)},(\beta_{g}^{(0)-1}\lambda\mathbf{D}_{g}\mathbf{A}\mathbf{D}_{g}^{\prime})). $$

For the parameters of the covariance matrix, the following priors were used: the k th diagonal elements of (λA)^− 1 were assigned a Gamma $(a^{(0)}_{k},b^{(0)}_{k})$ distribution and D_g was assigned a matrix von Mises-Fisher $(\mathbf {C}^{(0)}_{g})$ distribution. By setting τ = (λA)^− 1, its prior can be written

$$ p_{\tau}(\boldsymbol{\tau})\propto \prod\limits_{k=1}^{K}\tau_{k}^{\frac{a^{(0)}_{k}}{2}-1} \exp\left\{-\frac{b^{(0)}_{k}}{2}\tau_{k}\right\}, $$

where τ_k is the k th diagonal element of τ = (λA)^− 1.

The matrix D has a density as defined by Gupta and Nagar (2000):

$$ p(\mathbf{D}) = b(\mathbf{Q}^{(0)},\mathbf{P}_{g}^{(0)})\exp(\text{tr}\{\mathbf{Q}^{(0)}\mathbf{D}\mathbf{P}_{g}^{(0)}\mathbf{D}^{\prime}\}) [d\mathbf{D}], $$

for D ∈ O(d,d), where O(d,d) is the Stiefel manifold of d × d matrices, [dD] is the unit invariant measure on O(d,d), and A⁽⁰⁾ and $\mathbf {B}_{g}^{(0)}$ are symmetric and diagonal matrices, respectively.

The joint distribution of μ₁,…,μ_G, τ, and D is

$$ \begin{array}{@{}rcl@{}} &&p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D}) \propto\prod\limits_{g=1}^{G}|\beta_{g}^{(0)}\boldsymbol{\tau}|^{\frac{1}{2}} \exp \left\{\frac{-(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}{2} \right\}\\ &&\times\exp \left\{\text{tr}(\mathbf{Q}^{(0)}\mathbf{D}\mathbf{P}_{g}^{(0)}\mathbf{D}^{\prime})\right\}\prod\limits_{k=1}^{K}\tau_{k}^{\frac{a^{(0)}_{k}}{2}-1} \exp\left\{-\frac{b^{(0)}_{k}}{2}\tau_{k}\right\}. \end{array} $$

The likelihood of the data can be written

$$ \mathcal{L}(\boldsymbol{\mu}_{1},\!\ldots\!,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D} \mid \mathbf{y}_{1}, \!\ldots\!, \mathbf{y}_{n}) \!\propto\!\prod\limits_{i=1}^{n}\prod\limits_{g=1}^{G}|\boldsymbol{\tau}|^{{\hat{z}_{ig}}/{2}} \exp\! \left\{\!-\frac{\hat{z}_{ig}}{2}(\mathbf{y}_{i} - \boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\mathbf{y}_{i} - \boldsymbol{\mu}_{g})'\right\}.$$

Therefore, the joint posterior distribution of μ, τ, and D can be written

$$p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \!\mathbf{D} \!\mid \mathbf{y}_{1}, \ldots, \mathbf{y}_{n}) \!\propto\! p(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D}) \mathcal{L}(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G},\boldsymbol{\tau}, \mathbf{D} \!\mid\! \mathbf{y}_{1}, \ldots, \mathbf{y}_{n}).$$

Thus, the posterior distribution of mean becomes

$$q_{\boldsymbol{\mu}}(\boldsymbol{\mu}_{1},\ldots,\boldsymbol{\mu}_{G} \mid \boldsymbol{\tau},\mathbf{D}_{1},\ldots,\mathbf{D}_{G})=\prod\limits_{g=1}^{G}\phi_{d}(\boldsymbol{\mu}_{g};\mathbf{m}_{g},(\beta_{g}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}\mathbf{D}_{g})^{-1}),$$

where $\beta _{g} = \beta _{g}^{(0)}+{\sum }_{i=1}^{n}\hat {z}_{ig}$ and

$$\mathbf{m}_{g} =\frac{1}{\beta_{g}}\left( \beta_{g}^{(0)} \mathbf{m}_{g}^{(0)}+ \sum\limits_{i=1}^{n}\hat{z}_{ig}\mathbf{y}_{i}\right).$$

The posterior distribution for the k th diagonal element of τ = (λA)^− 1 is

$$q_{\tau}(\tau_{k}) = \text{Gamma} (a_{k},b_{k})$$

where $a_{k}=a_{k}^{(0)}+d{\sum }_{g=1}^{G}{\sum }_{i=1}^{n}\hat {z}_{ig}=a_{k}^{(0)}+dn$ and

$$b_{k}=b_{k}^{(0)}+\sum\limits_{g=1}^{G}\left( \sum\limits_{i=1}^{n}\hat{z}_{ig}y_{ik}^{2}+\beta_{g}^{(0)}m_{gk}^{2}- \beta_{g}m_{gk}^{2}\right).$$

We have

$$ \begin{array}{@{}rcl@{}} q(\mathbf{D}_{g}|\mathbf{y};\boldsymbol{\mu}_{g},\boldsymbol{\tau})&\propto& \exp\left\{ \text{tr} \left (-\frac{1}{2}{(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}\right ) \right \} \\ &&\times\exp\left\{ \text{tr} \left (-\frac{1}{2}{\sum\limits_{i=1}^{n}z_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau} \mathbf{D}_{g}(\mathbf{y}-\boldsymbol{\mu}_{g})'}+\mathbf{Q}_{g}^{(0)}\mathbf{D}_{g}\mathbf{P}_{g}^{(0)}\mathbf{D}_{g}^{\prime}\right )\right \}, \end{array} $$

which has the functional form of a Bingham matrix distribution, i.e., the form

$$\exp \left\{{\text{tr}(\mathbf{Q}_{g} \mathbf{D}_{g}\mathbf{P}_{g} \mathbf{D}_{g}^{\prime}})\right\},$$

where $\mathbf {Q}_{g} = \mathbf {Q}_{g}^{(0)}+\boldsymbol {\tau }$ and

$$ \mathbf{P}_{g} = \mathbf{P}_{g}^{(0)}-\frac{1}{2}\left[\sum\limits_{i=1}^{n}z_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})(\mathbf{y}-\boldsymbol{\mu}_{g})^{\prime}+(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'\right]. $$

3.2 C.2 VEV Model

Similarly, the posterior distribution of D_g for the VEV model has the form

$$ \begin{array}{@{}rcl@{}} q(\mathbf{D}_{g}|\mathbf{y};\boldsymbol{\mu}_{g},\boldsymbol{\tau}_{g})\!\!&\propto&\!\! \exp\left\{ \text{tr} \left (-\frac{1}{2}{(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}_{g} \mathbf{D}_{g}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'}\right ) \right \} \\ &&\!\!\times \exp\left\{ \text{tr} \left (-\frac{1}{2} \sum\limits_{i=1}^{n}\hat{z}_{ig}(\mathbf{y}-\!\boldsymbol{\mu}_{g})\mathbf{D}_{g}^{\prime}\boldsymbol{\tau}_{g} \mathbf{D}_{g}(\mathbf{y}\!-\boldsymbol{\mu}_{g})'+\mathbf{Q}_{g}^{(0)}\mathbf{D}_{g}\mathbf{P}_{g}^{(0)}\mathbf{D}_{g}^{\prime}\right )\right \}, \end{array} $$

which has the functional form of a Bingham matrix distribution, i.e., the form

$$\exp\left\{{\text{tr}(\mathbf{Q}_{g} \mathbf{D}_{g}\mathbf{P}_{g} \mathbf{D}_{g}^{\prime}})\right\},$$

where $\mathbf {Q}_{g}=\mathbf {Q}_{g}^{(0)}+\boldsymbol {\tau }_{g}$ and

$$\mathbf{P}_{g} = -\frac{1}{2}\left[\sum\limits_{i=1}^{n}\hat{z}_{ig}(\mathbf{y}-\boldsymbol{\mu}_{g})(\mathbf{y}-\boldsymbol{\mu}_{g})'+(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})\beta_{g}^{(0)}(\boldsymbol{\mu}_{g}-\mathbf{m}_{g}^{(0)})'\right].$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Subedi, S., McNicholas, P.D. A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting. J Classif 38, 89–108 (2021). https://doi.org/10.1007/s00357-019-09351-3

Download citation

Published: 04 March 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00357-019-09351-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Abstract

Access this article

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Recent Developments in Model-Based Clustering with Applications

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Posterior distributions for the parameters of eigen-decomposed covariance matrix

Appendix B: Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix

Appendix C: Mathematical details for the EEV and VEV Models

3.1 C.1 EEV Model

3.2 C.2 VEV Model

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting

Abstract

Access this article

Similar content being viewed by others

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

Recent Developments in Model-Based Clustering with Applications

On the Use of the Matrix-Variate Tail-Inflated Normal Distribution for Parsimonious Mixture Modeling

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix A: Posterior distributions for the parameters of eigen-decomposed covariance matrix

Appendix B: Posterior expected value of the precision parameters of the eigen-decomposed covariance matrix

Appendix C: Mathematical details for the EEV and VEV Models

3.1 C.1 EEV Model

3.2 C.2 VEV Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation