Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Eigenvalues and constraints in mixture modeling: geometric and computational issues

Abstract

This paper presents a review about the usage of eigenvalues restrictions for constrained parameter estimation in mixtures of elliptical distributions according to the likelihood approach. The restrictions serve a twofold purpose: to avoid convergence to degenerate solutions and to reduce the onset of non interesting (spurious) local maximizers, related to complex likelihood surfaces. The paper shows how the constraints may play a key role in the theory of Euclidean data clustering. The aim here is to provide a reasoned survey of the constraints and their applications, considering the contributions of many authors and spanning the literature of the last 30 years.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate \(t\)-factor analyzers. Stat Comput 21:361–373

  2. Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate \(t\)-distributions. Comput Stat Data Anal 55:520–529

  3. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821

  4. Baudry J-P, Celeux G (2015) EM for mixtures. Stat Comput 22(5):1021–1029

  5. Biernacki C (2004) Initializing EM using the properties of its trajectories in gaussian mixtures. Stat Comput 14:267–279

  6. Biernacki C, Chrétien S (2003) Degeneracy in the maximum likelihood estimation of univariate Gaussian mixtures with the EM. Stat Probab Lett 61:373–382

  7. Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the em algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575

  8. Boyles RA (1983) On the convergence of the EM algorithm. J R Stat Soc B 45:47–50

  9. Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793

  10. Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2016) Finding the number of groups in model-based clustering via constrained likelihoods. http://uvadoc.uva.es/handle/10324/18093

  11. Chanda KC (1954) A note on the consistency and maxima of the roots of likelihood equations. Biometrika 41:56–61

  12. Ciuperca G, Ridolfi A, Idier J (2003) Penalized maximum likelihood estimator for normal mixtures. Scand J Stat 30:45–59

  13. Cramér H (1946) Math Methods Stat. Princeton University Press, Princeton

  14. Day N (1969) Estimating the components of a mixture of normal distributions. Biometrika 56(3):463–474

  15. Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38

  16. Dennis JE (1981) Algorithms for non linear fitting. In: Proceedings of the NATO advanced research symposium, Cambridge, England. Cambridge University

  17. Dykstra RL (1983) An algorithm for restricted least squares regression. J Am Stat Assoc 78(384):837–842

  18. Fang K, Anderson T (1990) Statistical inference in elliptically contoured and related distributions. Alberton, New York

  19. Fraley C, Raftery AE (2006) Mclust version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document

  20. Fraley C, Raftery A (2007) Bayesian regularization for normal mixture estimation and model-based clustering. J Classif 24(2):155–181

  21. Fraley C, Raftery A, Murphy T, Scrucca L (2012) mclust version 4 for R: normal mixture modeling for model-based clustering, classification, and density estimation. University of Washington, Seattle

  22. Fritz H, García-Escudero LA, Mayo-Iscar A (2012) tclust: an R package for a trimming approach to cluster analysis. J Stat Softw 47(12):1–26

  23. Fritz H, García-Escudero LA, Mayo-Iscar A (2013) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136

  24. Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering, and data analysis: recent advances and applications. Springer, Berlin, pp 247–255

  25. Gallegos M, Ritter G (2009) Trimmed ML estimation of contaminated mixture. Sankhya (Ser A) 71:164–220

  26. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345

  27. García-Escudero LA, Gordaliza A, Mayo-Iscar A (2014) A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv Data Anal Classif 8(1):27–43

  28. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2015) Avoiding spurious local maximizers in mixture modelling. Stat Comput 25:619–633

  29. García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2016) The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers. Comput Stat Data Anal 99:131–147

  30. García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A (2017) Eigenvalues in robust approaches to mixture modeling: a review. Technical report, in preparation

  31. Ghahramani Z, Hinton G (1997) The EM algorithm for factor analyzers. Technical report CRG-TR-96-1, University of Toronto

  32. Greselin F, Ingrassia S (2010) Constrained monotone em algorithms for mixtures of multivariate \(t\)-distributions. Stat Comput 20:9–22

  33. Greselin F, Ingrassia S (2015) Maximum likelihood estimation in constrained parameter spaces for mixtures of factor analyzers. Stat Comput 25(2):215–226

  34. Greselin F, Ingrassia S, Punzo A (2011) Assessing the pattern of covariance matrices via an augmentation multiple testing procedure. Stat Methods Appl 20:141–170

  35. Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2):795–800

  36. Hathaway RJ (1996) A constrained EM algorithm for univariate normal mixtures. J Stat Comput Simul 23:211–230

  37. Hennig C (2004) Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann Stat 32:1313–1340

  38. Ingrassia S (1992) A comparison between the simulated annealing and the EM algorithms in normal mixture decompositions. Stat Comput 2:203–211

  39. Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13(4):151–166

  40. Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate gaussians. Comput Stat Data Anal 51(11):5339–5351

  41. Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55(4):1715–1725

  42. Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27(4):887–906

  43. Kiefer NM (1978) Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46(2):427–434

  44. Lindsay BG (1995) Mixture models: theory, geometry and applications. NSF-CBMS regional conference series in probability and statistics, vol 5. Institute of Mathematical Statistics, Hayward, CA

  45. McLachlan G, Krishnan T (2008a) The EM algorithm and extensions, 2nd edn, vol 589. Wiley, New York

  46. McLachlan GJ, Krishnan T (2008b) The EM algorithm and its extensions, 2nd edn. Wiley, New York

  47. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York

  48. McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18:285–296

  49. Meng X-L (1994) On the rate of convergence of the ECM algorithm. Ann Stat 22(1):326–339

  50. Meng X-L, van Dyk D (1997) The EM algorithm. An old folk song sung to a fast new tune. J R Stat Soc B 59(3):511–567

  51. Nettleton D (1999) Convergence properties of the EM algorithm in constrained spaces. Canad J Stat 27(3):639–644

  52. O’Hagan A, White A (2016) Improved model-based clustering performance using bayes initialization averaging. Technical report, arXiv:1504.06870v4

  53. O’Hagan A, Murphy TB, Gormley C (2013) Computational aspects of fitting mixture model via the expectation-maximisation algorithm. Comput Stat Data Anal 56(12):3843–3864

  54. Puntanen S, Styan GP, Isotalo J (2011) Matrix tricks for linear statistical models. Springer, Berlin

  55. Redner RA, Walker HF (1984) Mixture densities maximum likelihood and the EM algorithm. SIAM Rev 26(2):195–239

  56. Ritter G (2014) Cluster analysis and variable selection. CRC Press, Boca Raton

  57. Rocci R, Gattone SA, Di Mari R (2016) A data driven equivariant approach to constrained Gaussian mixture modeling. Adv Data Anal Classif. doi:10.1007/s11634-016-0279-1

  58. Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, New York

  59. Rudin W (1976) Principles of mathematical analysis, 3rd edn. McGraw-Hill, Tokyo

  60. Scrucca L, Fop M, Murphy TB, Raftery A (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289–317

  61. Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56:2454–2470

  62. Subedi S, Punzo A, Ingrassia S, McNicholas PD (2013) Clustering and classification via cluster-weighted factor analyzers. Adv Data Anal Classif 7:5–40

  63. Subedi S, Punzo A, Ingrassia S, McNicholas PD (2015) Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction. Stat Methods Appl 24(4):623–649

  64. Tanaka K (2009) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when penalty is imposed on the ratios of the scale parameters. Scand J Stat 36(1):171–184

  65. Tanaka K, Takemura A (2006) Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when the scale parameters are exponentially small. Bernoulli 12(6):1003–1017

  66. Tarone RD, Gruenhage G (1975) A note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 70(352):903–904

  67. Tarone RD, Gruenhage G (1979) Corrigenda: a note on the uniqueness of the roots of the likelihood equations for vector-valued parameters. J Am Stat Assoc 74(367):744

  68. Theobald C (1975) An inequality with application to multivariate analysis. Biometrika 62(2):461–466

  69. Theobald C (1976) Corrections and amendments: an inequality with application to multivariate analysis. Biometrika 63(3):685

  70. Tipping M, Bishop CM (1999) Mixtures of probabilistic principal component mixtures of probabilistic principal component analysers. Neural Comput 11(2):443–482

  71. Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York

  72. van Laarhoven PJM, Aarts EHL (1988) Simulated annealing: theory and practice. D. Reidel, Dordecht

  73. Wu CFJ (1983) On convergence properties of the EM algorithm. Ann Stat 11:95–103

Download references

Author information

Correspondence to Salvatore Ingrassia.

Appendix: Proof of Theorem 4

Appendix: Proof of Theorem 4

To maximize \(\mathcal {L}(\mathbf {\psi })\) means to jointly maximize \(|\varvec{\varSigma }_g|^{-1/2}\) and to minimize the argument of the exponential, i.e., \((\mathbf {x}_n- \varvec{\mu }_g)' \varvec{\varSigma }_g^{-1}(\mathbf {x}_n- \varvec{\mu }_g)\), for each \(g=1,\ldots ,G\). Hence, firstly we will show that, for a given \(\varvec{\varSigma }_g\), the mean vector \(\varvec{\mu }_g\) has to lie in a compact subset in \(\mathbb {R}^d\). Let C be the convex hull of \(\mathcal {X}\), i.e., the intersection of all convex sets containing the N points, given by

$$\begin{aligned} C(\mathcal {X})=\left\{ \sum _{n=1}^N u_n \mathbf {x}_n \; | \; \sum _{n=1}^N u_n=1, u_n \ge 0\right\} . \end{aligned}$$
(53)

Suppose now that \(\bar{\mathbf {\psi }} \in \mathbf {\Psi }_{a,b}\) satisfies \(\bar{\varvec{\mu }}_g \notin C(\mathcal {X})\). Then \(\mathcal {L}(\bar{\mathbf {\psi }}) \le \mathcal {L}(\mathbf {\psi }^*)\) where \(\mathbf {\psi }^* \in \mathbf {\Psi }_{a,b}\) is obtained from \(\bar{\mathbf {\psi }}\) by changing the gth mean component to \(\varvec{\mu }_g=\alpha \bar{\varvec{\mu }}_g\) for some \(\alpha \in (0,1)\) (i.e., along the line joining \(\mathbf{0}\) and \(\bar{\varvec{\mu }}_g\)) such that \(\varvec{\mu }_g \in C(\mathcal {X})\).

Let us set \(S= \{ \mathbf {\psi }\in \mathbf {\Psi }_{a,b} \, | \, \varvec{\mu }_g\in C(\mathcal {X}); 0< a \le \lambda _i(\varvec{\varSigma }_g)\le b < + \infty \quad g=1, \ldots , G\}\). Then, it follows that

$$\begin{aligned} \sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi }) = \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) . \end{aligned}$$

By the compactness of S and the continuity of \(\mathcal {L}(\mathbf {\psi })\), there exists a parameter \(\hat{\mathbf {\psi }}\in \mathbf {\Psi }_{a,b}\) satisfying

$$\begin{aligned} \mathcal {L}(\hat{\mathbf {\psi }}) =\sup _{\mathbf {\psi }\in \mathbf {\Psi }_{a,b}} \mathcal {L}(\mathbf {\psi })= \sup _{\mathbf {\psi }\in S} \mathcal {L}(\mathbf {\psi }) \end{aligned}$$

by Weierstrass’ theorem, see, e.g., Theorem 4.16 in Rudin (1976). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

García-Escudero, L.A., Gordaliza, A., Greselin, F. et al. Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv Data Anal Classif 12, 203–233 (2018). https://doi.org/10.1007/s11634-017-0293-y

Download citation

Keywords

  • Mixture model
  • EM algorithm
  • Eigenvalues
  • Model-based clustering

Mathematics Subject Classification

  • 62F10 Point estimation
  • 62F12 Asymptotic properties of estimators
  • 62F30 Inference under constraints
  • 62F35 Robustness and adaptive procedures