Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

A data driven equivariant approach to constrained Gaussian mixture modeling


Maximum likelihood estimation of Gaussian mixture models with different class-specific covariance matrices is known to be problematic. This is due to the unboundedness of the likelihood, together with the presence of spurious maximizers. Existing methods to bypass this obstacle are based on the fact that unboundedness is avoided if the eigenvalues of the covariance matrices are bounded away from zero. This can be done imposing some constraints on the covariance matrices, i.e. by incorporating a priori information on the covariance structure of the mixture components. The present work introduces a constrained approach, where the class conditional covariance matrices are shrunk towards a pre-specified target matrix \(\varvec{\varPsi }.\) Data-driven choices of the matrix \(\varvec{\varPsi },\) when a priori information is not available, and the optimal amount of shrinkage are investigated. Then, constraints based on a data-driven \(\varvec{\varPsi }\) are shown to be equivariant with respect to linear affine transformations, provided that the method used to select the target matrix be also equivariant. The effectiveness of the proposal is evaluated on the basis of a simulation study and an empirical example.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. Anderson TW, Gupta SD (1963) Some inequalities on characteristic roots of matrices. Biometrika 50:522–524

  2. Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

  3. Biernacki C, Chrétien S (2003) Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM. Stat Probab Lett 61:373–382

  4. Browne RP, Subedi S, McNicholas P (2013) Constrained optimization for a subset of the Gaussian parsimonious clustering models. arXiv:1306.5824

  5. Chen J, Tan X (2009) Inference for multivariate normal mixtures. J Multivar Anal 100:1367–1383

  6. Chen J, Tan X, Zhang R (2008) Inference for normal mixtures in mean and variance. Stat Sin 18(2):443

  7. Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30(1):45–59

  8. Day NE (1969) Estimating the components of a mixture of two normal distributions. Biometrika 56:463–474

  9. Dawid AP (1981) Some matrix-variate distribution theory: notational considerations and a Bayesian application. Biometrika 68(1):265–274

  10. Dickey JM (1967) Matricvariate generalizations of the multivariate t distribution and the inverted multivariate t distribution. Ann Math Stat 38(2):511–518

  11. Di Mari R, Oberski DL, Vermunt JK (2016) Bias-adjusted three-step latent Markov modeling with covariates. Struct Equ Model Multidiscip J. doi:10.1080/10705511.2016.1191015

  12. Doherty KAJ, Adams RG (2007) Unsupervised learning with normalised data and non-Euclidean norms. Appl Soft Comput 7:20321

  13. Fraley C, Raftery AE (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181

  14. Fritz H, Garcia-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136

  15. Gallegos MT, Ritter G (2009a) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3(2):135–167

  16. Gallegos MT, Ritter G (2009b) Trimmed ML estimation of contaminated mixtures. Sankhya Indian J Stat Ser A (2008-) 71(2):164–220

  17. Garcia-Escudero LA, Gordaliza A, Matran C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345

  18. Garcia-Escudero LA, Gordaliza A, Matran C, Mayo-Iscar A (2014) Avoiding spurious local maximizers in mixture modeling. Stat Comput 25(3):619–633

  19. Greselin F, Ingrassia S (2013) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25(2):215–226

  20. Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800

  21. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

  22. Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13:151–166

  23. Ingrassia S, Rocci R (2007) A constrained monotone EM algorithm for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51:5339–5351

  24. Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55(4):1715–1725

  25. James W, Stein C (1961) Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability Vol. 1, No. 1961, pp 361–379

  26. Kearns M (1997) A bound on the error of cross validation using the approximation and estimation rates, with consequences for the training-test split. Neural Comput 9(5):1143–1161

  27. Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46:427–434

  28. Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:886906

  29. Kim D, Seo B (2014) Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers. J Multivar Anal 125:100–120

  30. Kleinber J (2002) An impossibility theorem for clustering. In: Advances in neural information processing systems, (NIPS). MIT Press, Cambridge, pp 446–453

  31. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

  32. McLachlan GJ, Peel D (1998) Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin A, Dori D, Pudil P, Freeman H (eds) Lecture Notes in Computer Science, vol 1451. Springer, Berlin, pp 658–666

  33. Milligan GW, Cooper MC (1988) A study of standardization of variables in cluster analysis. J Classif 5:181–204

  34. Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348

  35. Policello II GE (1981) Conditional maximum likelihood estimation in gaussian mixtures. In: Taillie C, Patil GP, Baldessari BA (eds) Statistical distributions in scientific work. Volume 5–inferential problems and properties proceedings of the NATO advanced study institute held at the Università degli Studi di Trieste, Trieste, Italy, July 10-August 1 1980. NATO advanced study institutes series, vol 79. Springer, Netherlands, pp 111–125

  36. Ridolfi A, Idier J (1999) Penalized maximum likelihood estimation for univariate normal mixture distributions. In: Actes du 17’ colloque GRETSI, Vannes, pp 259–262

  37. Ridolfi A, Idier J (2000) Penalized maximum likelihood estimation for univariate normal mixture distributions. Bayesian inference and maximum entropy methods, MaxEnt workshops. Gif-sur-Yvette, July 2000

  38. Ritter G (2014) Robust cluster analysis and variable selection. CRC Press, Boca Raton

  39. Roth M (2013) On the multivariate \(t\) distribution. Technical report, Linköping university, Division of automatic control

  40. Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470

  41. Smyth P (1996) Clustering using Monte-Carlo cross validation. In Proceedings of the second international conference on knowledge discovery and data mining. AAAI Press, Menlo Park, p 126133

  42. Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10(1):63–72

  43. Snoussi H, Mohammad-Djafari A (2001) Penalized maximum likelihood for multivariate Gaussian mixture. In: Fry RL (ed) MaxEnt workshops: Bayesian inference and maximum entropy methods, Aug 2001, pp 36–46

  44. Tan X, Chen J, Zhang R (2007) Consistency of the constrained maximum likelihood estimator in finite normal mixture models. In: Proceedings of the American Statistical Association, American Statistical Association, Alexandria, 2007 [CD-ROM], pp 2113–2119

  45. Tanaka K, Takemura A (2006) Strong consistency of the maximum likelihood estimator for finite mixtures of locationscale distributions when the scale parameters are exponentially small. Bernoulli 12(6):1003–1017

  46. van der Laan MJ, Dudoit S, Keles S (2004) Asymptotic optimality of likelihood-based cross-validation. Stat Appl Genet Mol Biol 3(1):1–23

  47. Vermunt JK (2010) Latent class modeling with covariates: two improved three-step approaches. Polit Anal 18(4):450–469

  48. Xu J, Tan X, Zhang R (2010) A note on Phillips (1991): a constrained maximum likelihood approach to estimating switching regressions. J Econom 154:35–41

Download references


The authors are grateful to the associate editor and the two anonymous referees for their useful comments which have lead to a considerable improvement of the paper.

Author information

Correspondence to Roberto Di Mari.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rocci, R., Gattone, S.A. & Di Mari, R. A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif 12, 235–260 (2018). https://doi.org/10.1007/s11634-016-0279-1

Download citation


  • Model based clustering
  • Gaussian mixture models
  • Equivariant estimators

Mathematics Subject Classification

  • 62H30
  • 62-07