The Noise Component in Model-based Cluster Analysis
The so-called noise-component has been introduced by Banfield and Raftery (1993) to improve the robustness of cluster analysis based on the normal mixture model. The idea is to add a uniform distribution over the convex hull of the data as an additional mixture component. While this yields good results in many practical applications, there are some problems with the original proposal: 1) As shown by Hennig (2004), the method is not breakdown-robust. 2) The original approach doesn’t define a proper ML estimator, and doesn’t have satisfactory asymptotic properties.
We discuss two alternatives. The first one consists of replacing the uniform distribution by a fixed constant, modelling an improper uniform distribution that doesn’t depend on the data. This can be proven to be more robust, though the choice of the involved tuning constant is tricky. The second alternative is to approximate the ML-estimator of a mixture of normals with a uniform distribution more precisely than it is done by the “convex hull” approach. The approaches are compared by simulations and for a real data example.
KeywordsMixture Model Mixture Component Noise Component Breakdown Point Extreme Outlier
Unable to display preview. Download preview PDF.
- CORETTO P. and HENNIG C. (2006): Identifiability for mixtures of distributions from a location-scale family with uniforms. DISES Working Papers No. 3.186, University of Salerno.Google Scholar
- CORETTO P. and HENNIG C. (2007): Choice of the improper density in robust improper ML for finite normal mixtures. Submitted.Google Scholar
- DONOHO, D. L. and HUBER, P. J. (1983): The notion of breakdown point. In P. J. Bickel, K. Doksum, and J. L. Hodges jr. (Eds.): A Festschrift for Erich L. Lehmann, Wadsworth, Belmont, CA, 157-184.Google Scholar