Clustering with EM: Complex Models vs. Robust Estimation
Clustering multivariate data that are contaminated by noise is a complex issue, particularly in the framework of mixture model estimation because noisy data can significantly affect the parameters estimates. This paper addresses this problem with respect to likelihood maximization using the Expectation-Maximization algorithm. Two different approaches are compared. The first one consists in defining mixture models that take into account noise. The second one is based of robust estimation of the model parameters in the maximization step of EM. Both have been tested separately, then jointly. Finally, a hybrid model is proposed. Results on artificial data are given and discussed.
KeywordsClustering Expectation-Maximization Robustness M-estimation
- 1.Campbell, N.A., Lopuhad, H.P., Rousseeuw, P.J.: On the calculation of a robust S-estimator of a covariance matrix. Delft University of Technology. Tech. Report DUT-TWI-95-117 (1995)Google Scholar
- 4.McLachlan, G.J., Peel, D., Basford, K.E., and Adams, P.: The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software 4, No. 2. (1999).Google Scholar
- 5.Fraley, C., Raftery, A.E.: MCLUST: Software for model-based clustering and discriminant analysis. Univ. of Wash. Tech. Report TR-342 (1998)Google Scholar
- 7.Kharin, Y.: Robustness of clustering under outliers. LNCS 1280 (1997)Google Scholar
- 8.McLachlan, G.J., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. LNCS 1451 (1999) 658–667Google Scholar