Advances in Data Analysis and Classification

, Volume 4, Issue 2–3, pp 111–135 | Cite as

A simulation study to compare robust clustering methods based on mixtures

Regular Article

Abstract

The following mixture model-based clustering methods are compared in a simulation study with one-dimensional data, fixed number of clusters and a focus on outliers and uniform “noise”: an ML-estimator (MLE) for Gaussian mixtures, an MLE for a mixture of Gaussians and a uniform distribution (interpreted as “noise component” to catch outliers), an MLE for a mixture of Gaussian distributions where a uniform distribution over the range of the data is fixed (Fraley and Raftery in Comput J 41:578–588, 1998), a pseudo-MLE for a Gaussian mixture with improper fixed constant over the real line to catch “noise” (RIMLE; Hennig in Ann Stat 32(4): 1313–1340, 2004), and MLEs for mixtures of t-distributions with and without estimation of the degrees of freedom (McLachlan and Peel in Stat Comput 10(4):339–348, 2000). The RIMLE (using a method to choose the fixed constant first proposed in Coretto, The noise component in model-based clustering. Ph.D thesis, Department of Statistical Science, University College London, 2008) is the best method in some, and acceptable in all, simulation setups, and can therefore be recommended.

Keywords

Model-based clustering Gaussian mixture Mixture of t-distributions Noise component 

Mathematics Subject Classification (2000)

62H30 62F35 62F10 62F12 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banfield J, Raftery AE (1993) Model-based gaussian and non-gaussian clustering. Biometrics 49: 803–821MATHCrossRefMathSciNetGoogle Scholar
  2. Coretto P (2008) The noise component in model-based clustering. PhD thesis, Department of Statistical Science, University College London. http://www.ontherubicon.com/pietro/docs/phdthesis.pdf
  3. Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25: 553–576MATHCrossRefGoogle Scholar
  4. Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41: 578–588MATHCrossRefGoogle Scholar
  5. Fraley C, Raftery AE (2006) Mclust version 3 for r: normal mixture modeling and model-based clustering. Technical report 504, Department of Statistics, University of WashingtonGoogle Scholar
  6. Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(5): 347–380MATHCrossRefMathSciNetGoogle Scholar
  7. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 38(3): 1324–1345CrossRefGoogle Scholar
  8. Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13: 795–800MATHCrossRefMathSciNetGoogle Scholar
  9. Hennig C (2004) Breakdown points for maximum likelihood estimators of location-scale mixtures. Ann Stat 32(4): 1313–1340MATHCrossRefMathSciNetGoogle Scholar
  10. Hennig C (2005) Robustness of ML estimators of location-scale mixtures. In: Baier D, Wernecke KD (eds) Innovations in classification. Data science, and information systems. Springer, Heidelberg, pp 128–137CrossRefGoogle Scholar
  11. Hennig C, Coretto P (2008) The noise component in model-based cluster analysis. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications. Springer, Berlin, , pp 127–138CrossRefGoogle Scholar
  12. Hosmer DW (1978) Comment on “Estimating mixtures of normal distributions and switching regressions” by R. Quandt and J.B. Ramsey. J Am Stat Assoc 73(364): 730–752CrossRefGoogle Scholar
  13. Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4): 577–590CrossRefMathSciNetGoogle Scholar
  14. Liu C (1997) ML estimation of the multivariate t distribution and the EM algorithms. J Multivar Anal 63: 296–312MATHCrossRefGoogle Scholar
  15. McLachlan G, Krishnan T (1997) The EM algorithm and extensions. Wiley, New YorkMATHGoogle Scholar
  16. McLachlan G, Peel D (2000) Robust mixture modelling using the t-distribution. Stat Comput 10(4): 339–348CrossRefGoogle Scholar
  17. Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 17(3): 299–308CrossRefMathSciNetGoogle Scholar
  18. Redner R, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  1. 1.Dipartimento di Scienze Economiche e StatisticheUniversità degli Studi di SalernoFiscianoItaly
  2. 2.Department of Statistical SciencesUniversity College LondonLondonUK

Personalised recommendations