A comparison of the $$L_2$$ minimum distance estimator and the EM-algorithm when fitting $${\varvec{{k}}}$$ -component univariate normal mixtures

Clarke, Brenton R.; Davidson, Thomas; Hammarstrand, Robert

doi:10.1007/s00362-016-0747-x

A comparison of the $L_2$ minimum distance estimator and the EM-algorithm when fitting ${\varvec{{k}}}$-component univariate normal mixtures

Regular Article
Published: 24 February 2016

Volume 58, pages 1247–1266, (2017)
Cite this article

Statistical Papers Aims and scope Submit manuscript

Brenton R. Clarke ORCID: orcid.org/0000-0003-1419-0768¹,
Thomas Davidson² &
Robert Hammarstrand¹

329 Accesses
3 Citations
Explore all metrics

Abstract

The method of maximum likelihood using the EM-algorithm for fitting finite mixtures of normal distributions is the accepted method of estimation ever since it has been shown to be superior to the method of moments. Recent books testify to this. There has however been criticism of the method of maximum likelihood for this problem, the main criticism being when the variances of component distributions are unequal the likelihood is in fact unbounded and there can be multiple local maxima. Another major criticism is that the maximum likelihood estimator is not robust. Several alternative minimum distance estimators have since been proposed as a way of dealing with the first problem. This paper deals with one of these estimators which is not only superior due to its robustness, but in fact can have an advantage in numerical studies even at the model distribution. Importantly, robust alternatives of the EM-algorithm, ostensibly fitting t distributions when in fact the data are mixtures of normals, are also not competitive at the normal mixture model when compared to the chosen minimum distance estimator. It is argued for instance that natural processes should lead to mixtures whose component distributions are normal as a result of the Central Limit Theorem. On the other hand data can be contaminated because of extraneous sources as are typically assumed in robustness studies. This calls for a robust estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating significance in linear mixed-effects models in R

Article 12 September 2016

Steven G. Luke

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Article 25 March 2024

Rémi Thériault, Mattan S. Ben-Shachar, … Dominique Makowski

Parsimonious ultrametric Gaussian mixture models

Article Open access 01 April 2024

Carlo Cavicchia, Maurizio Vichi & Giorgia Zaccaria

References

Amemiya T (1985) Advanced econometrics. Harvard University Press, Cambridge
Google Scholar
Benaglia T, Chauveau D, Hunter DR, Young D (2009) Mixtools: an R package for analysing finite mixture models. J Stat Soft 32(6):1–29
Article Google Scholar
Basford KE, Greenway DR, McLachlan GJ, Peel D (1997) Standard errors of fitted means under normal mixture. Comput Stat 12:1–17
MATH Google Scholar
Biernacki C, Chretien S (2003) Degeneracy in the maximum likelihood estimation of univariate gaussian mixtures with em. Stat Prob Lett 61:373–382
Article MATH MathSciNet Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Strategies for getting the highest likelihoods in mixture models. Guest Editors: Böhning and Seidel. Comput Stat Data Anal 41:561–575
Article MATH Google Scholar
Choi K, Bulgren WG (1968) An estimation procedure for mixtures of distributions. J R Stat Soc B 30:444–460
MATH MathSciNet Google Scholar
Clarke BR (1989) An unbiased minimum distance estimator of the proportion parameter in a mixture of two normal distributions. Stat Prob Lett 7(4):275–281
Article MATH MathSciNet Google Scholar
Clarke BR (2000) A review of differentiability in relation to robustness with an application to seismic data analysis. PINSA 66A:467–482
Google Scholar
Clarke BR, Heathcote CR (1978) Comment on “Estimating mixtures of normal distributions and switching regressions” by Quandt, R.E. and Ramsey, J.B. J Am Stat Assoc 73:749–750
Article Google Scholar
Clarke BR, Heathcote CR (1994) Robust estimation of $k$-component univariate normal mixtures. Ann Inst Stat Math 46:83–93
Article MATH MathSciNet Google Scholar
Clarke BR, Futshik A (2007) On the convergence of Newton’s method when estimating higher dimensional parameters. J Multivar Anal 98:916–931
Article MATH MathSciNet Google Scholar
Cutler A, Cordiero-Braña OI (1996) Minimum Hellinger distance estimation for finite mixture models. J Am Stat Assoc 91:1716–1723
Article MATH MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DP (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
MATH Google Scholar
Depraetere N, Vandebroek M (2014) Order selection in finite mixtures of linear regressions. Stat Pap 55:871–911
Article MATH Google Scholar
Fisher RA (1947) The analysis of covariance method for the relation between a part and the whole. Biometrics 3:65–68
Article Google Scholar
Fryer JG, Robertson CA (1972) A comparison of some methods for estimating mixed normal distributions. Biometrika 59:639–648
Article MATH MathSciNet Google Scholar
Hasselman B (2013) nleqslv: solve systems of non linear equations. R package version 2.0
Huber PJ, Ronchetti EM (2009) Robust statistics, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Klar B, Meintanis SG (2005) Tests for normal mixtures based on the empirical characteristic function. Comput Stat Data Anal 49:227–242
Article MATH MathSciNet Google Scholar
Lee SX, McLachlan GJ (2013) On mixtures of skew normal and skew t-distributions. Adv Data Anal Classif 7(3):241–266
Article MATH MathSciNet Google Scholar
Lee SX, McLachlan GJ (2014) EMMIXuskew: an R package for fitting mixtures of multivariate skew t-distributions via the EM algorithm. J Stat Soft 55(12):1–22
Google Scholar
Macdonald PDM (1971) Comment on a paper by Choi, K. and Bulgren, W.G. J R Stat Soc B 33:326–329
Google Scholar
McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York
Book MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
McLachlan GJ, Peel D, Basford KE, Adams P (1999) The EMMIX software for the fitting of mixtures of normal and t-components. J Stat Softw 4(2):1–14
Article Google Scholar
Pearson K (1894) Contributions to the mathematical theory of evolution. Philos Trans R Soc Lond A 185(1887–1895):71–110
Article MATH Google Scholar
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10:339–348
Article Google Scholar
Quandt RE, Ramsey JB (1978) Estimating mixtures of normal distributions and switching regressions. J Am Stat Assoc 73:730–738
Article MATH MathSciNet Google Scholar
R Development Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.r-project.org/
Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239
Article MATH MathSciNet Google Scholar
Seidel W, Mosler K, Alker M (2000) A cautionary note on likelihood ratio tests in mixture models. Ann Inst Stat Math 52:481–487
Article MATH MathSciNet Google Scholar
Seidel W, Ševčíková H (2004) Types of likelihood maxima in mixture models and their implication on the performance of tests. Ann Inst Stat Math 56:631–654
Article MATH MathSciNet Google Scholar
Tan WY, Chang WC (1972) Some comparisons of the method of moments and the method of maximum likelihood in estimating parameters of a mixture of two normal densities. J Am Stat Assoc 67:702–708
Article MATH Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Wang K, McLachlan GJ, Ng A, Peel D (2009) EMMIX-skew EM algorithm for mixture of multivariate skew normal/t distributions. EMMIX was originally written in Fortran by David Peel, R package version 1.0.20. http://www.maths.uq.edu.au/gjm/mix_soft/EMMIX-skew
Woodward WA, Parr WC, Schucany WR, Lindsey H (1984) A comparison of minimum distance and maximum likelihood estimation of a mixture proportion. J Am Stat Assoc 79:590–598
Article MATH MathSciNet Google Scholar
Wu CFJ (1983) On the convergence properties of the EM algorithm. Ann Stat 11:95–103
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The authors are indebted to Emeritus Professor C.R. Heathcote, now retired, for his pioneering work on minimum distance estimators. Views expressed in this paper are those of the authors and do not necessarily represent those of the Australian Bureau of Statistics. The authors also acknowledge the helpful suggestions on presentation afforded by two anonymous referees that led to an improved paper.

Author information

Authors and Affiliations

Mathematics and Statistics, School of Eng. and I.T., Murdoch University, Murdoch, WA, 6150, Australia
Brenton R. Clarke & Robert Hammarstrand
Australian Bureau of Statistics, Perth, WA, 6000, Australia
Thomas Davidson

Authors

Brenton R. Clarke
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Davidson
View author publications
You can also search for this author in PubMed Google Scholar
Robert Hammarstrand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brenton R. Clarke.

Appendix: Averages and mean squared errors of estimates

For completeness we include here averages and mean squared errors for individual parameters for parametric models (1)–(7) (Tables 10 and 11) and models (8)–(12) (Tables 12 and 13) for the $L_2$ method and the EM algorithm for the MLE obtained using mixtools, that is EM_N.

Table 10 Average of 100 successful estimates for models (1)–(7)

Full size table

Table 11 Mean squared errors from 100 successful estimates for models (1)–(7)

Full size table

Table 12 Average of 100 successful estimates for models (8)–(12)

Full size table

Table 13 Mean squared errors from 100 successful estimates for models (8)–(12)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clarke, B.R., Davidson, T. & Hammarstrand, R. A comparison of the $L_2$ minimum distance estimator and the EM-algorithm when fitting ${\varvec{{k}}}$-component univariate normal mixtures. Stat Papers 58, 1247–1266 (2017). https://doi.org/10.1007/s00362-016-0747-x

Download citation

Received: 23 December 2014
Revised: 11 January 2016
Published: 24 February 2016
Issue Date: December 2017
DOI: https://doi.org/10.1007/s00362-016-0747-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of the \(L_2\) minimum distance estimator and the EM-algorithm when fitting \({\varvec{{k}}}\)-component univariate normal mixtures

Abstract

Access this article

Similar content being viewed by others

Evaluating significance in linear mixed-effects models in R

Check your outliers! An introduction to identifying statistical outliers in R with easystats

Parsimonious ultrametric Gaussian mixture models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Averages and mean squared errors of estimates

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A comparison of the \(L_2\) minimum distance estimator and the EM-algorithm when fitting \({\varvec{{k}}}\)-component univariate normal mixtures

Abstract

Access this article

Similar content being viewed by others

Evaluating significance in linear mixed-effects models in R

Check your outliers﻿! An introduction to identifying statistical outliers in R with easystats

Parsimonious ultrametric Gaussian mixture models

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Averages and mean squared errors of estimates

Appendix: Averages and mean squared errors of estimates

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

Check your outliers! An introduction to identifying statistical outliers in R with easystats