Advances in Data Analysis and Classification

, Volume 9, Issue 4, pp 371–394 | Cite as

Maximum likelihood estimation of Gaussian mixture models without matrix operations

  • Hien D. NguyenEmail author
  • Geoffrey J. McLachlan
Regular Article


The Gaussian mixture model (GMM) is a popular tool for multivariate analysis, in particular, cluster analysis. The expectation–maximization (EM) algorithm is generally used to perform maximum likelihood (ML) estimation for GMMs due to the M-step existing in closed form and its desirable numerical properties, such as monotonicity. However, the EM algorithm has been criticized as being slow to converge and thus computationally expensive in some situations. In this article, we introduce the linear regression characterization (LRC) of the GMM. We show that the parameters of an LRC of the GMM can be mapped back to the natural parameters, and that a minorization–maximization (MM) algorithm can be constructed, which retains the desirable numerical properties of the EM algorithm, without the use of matrix operations. We prove that the ML estimators of the LRC parameters are consistent and asymptotically normal, like their natural counterparts. Furthermore, we show that the LRC allows for simple handling of singularities in the ML estimation of GMMs. Using numerical simulations in the R programming environment, we then demonstrate that the MM algorithm can be faster than the EM algorithm in various large data situations, where sample sizes range in the tens to hundreds of thousands and for estimating models with up to 16 mixture components on multivariate data with up to 16 variables.


Gaussian mixture model Minorization–maximization algorithm matrix operation-free Linear Regression 

Mathematics Subject Classification

65C60 62E10 


  1. Amemiya T (1985) Advanced econometrics. Harvard University Press, CambridgeGoogle Scholar
  2. Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New YorkzbMATHGoogle Scholar
  3. Andrews JL, McNicholas PD (2013) Using evolutionary algorithms for model-based clustering. Pattern Recognit Lett 34:987–992CrossRefGoogle Scholar
  4. Atienza N, Garcia-Heras J, Munoz-Pichardo JM, Villa R (2007) On the consistency of MLE in finite mixture models of exponential families. J Stat Plan Inference 137:496–505zbMATHMathSciNetCrossRefGoogle Scholar
  5. Becker MP, Yang I, Lange K (1997) EM algorithms without missing data. Stat Methods Med Res 6:38–54CrossRefGoogle Scholar
  6. Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  7. Botev Z, Kroese DP (2004) Global likelihood optimization via the cross-entropy method with an application to mixture models. In: Proceedings of the 36th conference on winter simulationGoogle Scholar
  8. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  9. Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14:315–332zbMATHMathSciNetCrossRefGoogle Scholar
  10. Clarke B, Fokoue E, Zhang HH (2009) Principles and theory for data mining and machine learning. Springer, New YorkzbMATHCrossRefGoogle Scholar
  11. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38zbMATHMathSciNetGoogle Scholar
  12. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkzbMATHGoogle Scholar
  13. Ganesalingam S, McLachlan GJ (1980) A comparison of the mixture and classification approaches to cluster analysis. Commun Stat Theory Methods 9:923–933CrossRefGoogle Scholar
  14. Greselin F, Ingrassia S (2008) A note on constrained EM algorithms for mixtures of elliptical distributions. Advances in data analysis, data handling and business intelligence In: Proceedings of the 32nd annual conference of the German classification society. vol 53Google Scholar
  15. Hartigan JA (1985) Statistical theory in clustering. J Classif 2:63–76zbMATHMathSciNetCrossRefGoogle Scholar
  16. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkzbMATHCrossRefGoogle Scholar
  17. Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800zbMATHMathSciNetCrossRefGoogle Scholar
  18. Hunter DR, Lange K (2004) A tutorial on MM algorithms. Am Stat 58:30–37MathSciNetCrossRefGoogle Scholar
  19. Ingrassia S (1991) Mixture decomposition via the simulated annealing algorithm. Appl Stoch Models Data Anal 7:317–325CrossRefGoogle Scholar
  20. Ingrassia S (2004) A likelihood-based constrained algorithm for multivariate normal mixture models. Stat Methods Appl 13:151–166MathSciNetCrossRefGoogle Scholar
  21. Ingrassia S, Rocci R (2007) Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput Stat Data Anal 51:5339–5351zbMATHMathSciNetCrossRefGoogle Scholar
  22. Ingrassia S, Rocci R (2011) Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints. Comput Stat Data Anal 55:1714–1725MathSciNetCrossRefGoogle Scholar
  23. Ingrassia S, Minotti SC, Vittadini G (2012) Local statistical modeling via a cluster-weighted approach with elliptical distributions. J Classif 29:363–401MathSciNetCrossRefGoogle Scholar
  24. Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182Google Scholar
  25. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31:651–666CrossRefGoogle Scholar
  26. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323CrossRefGoogle Scholar
  27. Jennrich RI (1969) Asymptotic properties of non-linear least squares estimators. Ann Math Stat 40:633–643zbMATHMathSciNetCrossRefGoogle Scholar
  28. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkley symposium on mathematical statistics and probability, University of California press, 281–297Google Scholar
  29. McLachlan GJ (1982) The classification and mixture maximum likelihood approaches to cluster analysis. In: Krishnaiah PR, Kanal L (eds) Handbook of statistics, vol 2. North-Holland, AmsterdamGoogle Scholar
  30. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker, New YorkGoogle Scholar
  31. McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  32. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  33. Pernkopf F, Bouchaffra D (2005) Genetic-based EM algorithm for learning Gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 27:1344–1348CrossRefGoogle Scholar
  34. R Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  35. Razaviyayn M, Hong M, Luo ZQ (2013) A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J Optim 23:1126–1153zbMATHMathSciNetCrossRefGoogle Scholar
  36. Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239zbMATHMathSciNetCrossRefGoogle Scholar
  37. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, CambridgezbMATHCrossRefGoogle Scholar
  38. Seber GAF (2008) A matrix handbook for statisticians. Wiley, New YorkGoogle Scholar
  39. Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New YorkzbMATHGoogle Scholar
  40. Zhou H, Lange K (2010) Mm algorithms for some discrete multivariate distributions. J Comput Graph Stat 19:645–665MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Mathematics, School of Mathematics and PhysicsUniversity of QueenslandSt. LuciaAustralia

Personalised recommendations