# Maximum likelihood estimation of Gaussian mixture models without matrix operations

- 868 Downloads
- 9 Citations

## Abstract

The Gaussian mixture model (GMM) is a popular tool for multivariate analysis, in particular, cluster analysis. The expectation–maximization (EM) algorithm is generally used to perform maximum likelihood (ML) estimation for GMMs due to the M-step existing in closed form and its desirable numerical properties, such as monotonicity. However, the EM algorithm has been criticized as being slow to converge and thus computationally expensive in some situations. In this article, we introduce the linear regression characterization (LRC) of the GMM. We show that the parameters of an LRC of the GMM can be mapped back to the natural parameters, and that a minorization–maximization (MM) algorithm can be constructed, which retains the desirable numerical properties of the EM algorithm, without the use of matrix operations. We prove that the ML estimators of the LRC parameters are consistent and asymptotically normal, like their natural counterparts. Furthermore, we show that the LRC allows for simple handling of singularities in the ML estimation of GMMs. Using numerical simulations in the R programming environment, we then demonstrate that the MM algorithm can be faster than the EM algorithm in various large data situations, where sample sizes range in the tens to hundreds of thousands and for estimating models with up to 16 mixture components on multivariate data with up to 16 variables.

## Keywords

Gaussian mixture model Minorization–maximization algorithm matrix operation-free Linear Regression## Mathematics Subject Classification

65C60 62E10## References

- Amemiya T (1985) Advanced econometrics. Harvard University Press, CambridgeGoogle Scholar
- Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New YorkzbMATHGoogle Scholar
- Andrews JL, McNicholas PD (2013) Using evolutionary algorithms for model-based clustering. Pattern Recognit Lett 34:987–992CrossRefGoogle Scholar
- Atienza N, Garcia-Heras J, Munoz-Pichardo JM, Villa R (2007) On the consistency of MLE in finite mixture models of exponential families. J Stat Plan Inference 137:496–505zbMATHMathSciNetCrossRefGoogle Scholar
- Becker MP, Yang I, Lange K (1997) EM algorithms without missing data. Stat Methods Med Res 6:38–54CrossRefGoogle Scholar
- Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
- Botev Z, Kroese DP (2004) Global likelihood optimization via the cross-entropy method with an application to mixture models. In: Proceedings of the 36th conference on winter simulationGoogle Scholar
- Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
- Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332zbMATHMathSciNetCrossRefGoogle Scholar
- Clarke B, Fokoue E, Zhang HH (2009) Principles and theory for data mining and machine learning. Springer, New YorkzbMATHCrossRefGoogle Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38zbMATHMathSciNetGoogle Scholar
- Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
- Ganesalingam S, McLachlan GJ (1980) A comparison of the mixture and classification approaches to cluster analysis. Commun Stat Theory Methods 9:923–933CrossRefGoogle Scholar
- Greselin F, Ingrassia S (2008) A note on constrained EM algorithms for mixtures of elliptical distributions. Advances in data analysis, data handling and business intelligence In: Proceedings of the 32nd annual conference of the German classification society. vol 53Google Scholar
- Hartigan JA (1985) Statistical theory in clustering. J Classif 2:63–76zbMATHMathSciNetCrossRefGoogle Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkzbMATHCrossRefGoogle Scholar
- Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800zbMATHMathSciNetCrossRefGoogle Scholar
- Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37MathSciNetCrossRefGoogle Scholar
- Ingrassia S (1991) Mixture decomposition via the simulated annealing algorithm. Appl Stoch Models Data Anal 7:317–325CrossRefGoogle Scholar
- Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13:151–166MathSciNetCrossRefGoogle Scholar
- Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51:5339–5351zbMATHMathSciNetCrossRefGoogle Scholar
- Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55:1714–1725MathSciNetCrossRefGoogle Scholar
- Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29:363–401MathSciNetCrossRefGoogle Scholar
- Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182Google Scholar
- Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666CrossRefGoogle Scholar
- Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
- Jennrich RI (1969) Asymptotic properties of non-linear least squares estimators. Ann Math Stat 40:633–643zbMATHMathSciNetCrossRefGoogle Scholar
- MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkley symposium on mathematical statistics and probability, University of California press, 281–297Google Scholar
- McLachlan GJ (1982) The classification and mixture maximum likelihood approaches to cluster analysis. In: Krishnaiah PR, Kanal L (eds) Handbook of statistics, vol 2. North-Holland, AmsterdamGoogle Scholar
- McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New YorkGoogle Scholar
- McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New YorkzbMATHCrossRefGoogle Scholar
- McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New YorkzbMATHCrossRefGoogle Scholar
- Pernkopf F, Bouchaffra D (2005) Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 27:1344–1348CrossRefGoogle Scholar
- R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
- Razaviyayn M, Hong M, Luo ZQ (2013) A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J Optim 23:1126–1153zbMATHMathSciNetCrossRefGoogle Scholar
- Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239zbMATHMathSciNetCrossRefGoogle Scholar
- Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
- Seber GAF (2008) A matrix handbook for statisticians. Wiley, New YorkGoogle Scholar
- Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New YorkzbMATHGoogle Scholar
- Zhou H, Lange K (2010) Mm algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665MathSciNetCrossRefGoogle Scholar