Robust model-based clustering with mild and gross outliers
- 40 Downloads
We propose a model-based clustering procedure where each component can take into account cluster-specific mild outliers through a flexible distributional assumption, and a proportion of observations is additionally trimmed. We propose a penalized likelihood approach for estimation and selection of the proportions of mild and gross outliers. A theoretically grounded penalty parameter is then obtained. Simulation studies illustrate the advantages of our procedure over flexible mixtures without trimming, and over trimmed normal mixture models (tclust). We conclude with an original real data example on the identification of the source from illicit drug shipments seized in Italy and Spain. The methodology proposed in this paper has been implemented in R functions which can be downloaded from https://github.com/afarcome/cntclust.
Keywordstclust Contaminated normal Penalized likelihood
Mathematics Subject Classification62H30 91C20 62F35
The authors are grateful to two referees for constructive and helpful suggestions.
- Schott JR (2016) Matrix analysis for statistics. Wiley series in probability and statistics, Wiley, HobokenGoogle Scholar
- Tukey JW (1960) A survey of sampling from contaminated distributions. In: Olkin I (ed) Contributions to probability and statistics: essays in honor of harold hotelling, stanford studies in mathematics and statistics, Chapter 39. Stanford University Press, California, pp 448–485Google Scholar