Abstract
An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution has both good properties in terms of robustness and efficiency and naturally adapts to the true underlying contamination level.
Similar content being viewed by others
References
Ballard, T.J., Kepple, A.W., Cafiero, C.: The food insecurity experience scale: developing a global standard for monitoring hunger worldwide. Technical report, Food and Agriculture Organization of the United Nations, Rome (2013)
Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the Minimum Covariance Determinant estimator. Ann. Stat. 21, 1385–1400 (1993)
Cafiero, C., Melgar-Quinonez, H.R., Ballard, T.J., Kepple, A.W.: Validity and reliability of food security measures. Ann. N. Y. Acad. Sci. 1331, 230–248 (2014)
Cafiero, C., Nord, M., Viviani, S., del Grossi, M.E., Ballard, T.J., Kepple, A.W., Miller, M., Nwosu, C.: Methods for estimating comparable rates of food insecurity experienced by adults throughout the world. Technical report, Food and Agriculture Organization of the United Nations, Rome (2016)
Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105, 147–156 (2010)
Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55, 544–553 (2011)
Cerioli, A., Farcomeni, A., Riani, M.: Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivar. Anal. 126, 167–183 (2014)
Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648–1659 (2016)
Cuesta-Albertos, J.A., Gordaliza, A., Matrán, C.: Trimmed \(k\)-means: an attempt to robustify quantizers. Ann. Stat. 25, 553–576 (1997)
Cuesta-Albertos, J.A., Matran, C., Mayo-Iscar, A.: Robust estimation in the normal mixture model based on robust clustering. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 779–802 (2008)
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015)
Flury, B., Riedwyl, H.: Multivariate Statistics. A Practical Approach. Chapman and Hall, London (1988)
Fritz, H., García-Escudero, L.A., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)
Gallegos, M.T., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)
Gallup: Worldwide Research Methodology and Codebook. Gallup Inc, Washington (2015)
García-Escudero, L.A., Gordaliza, A.: The importance of the scales in heterogeneous robust clustering. Comput. Stat. Data Anal. 51, 4403–4412 (2007)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89–109 (2010)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)
Godfray, H.C.J., Beddington, J.R., Crute, I.R., Haddad, K., Lawrence, D., Muir, J.F., Pretty, J., Robinson, S., Thomas, S.M., Toulmin, C.: Food security: the challenge of feeding 9 billion people. Science 327, 812–818 (2010)
Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the Minimum Covariance Determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004)
Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 928–946 (2005)
Hennig, C.: Breakdown points for maximum likelihood-estimators of location-scale mixtures. Ann. Stat. 32, 1313–1340 (2004)
Hennig, C.: Fuzzy and crisp Mahalanobis fixed point clusters. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds.) Data Analysis and Decision Support, pp. 47–56. Springer, Heidelberg (2005)
Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 1154–1176 (2008)
Jones, A.D., Ngure, F.M., Pelto, G., Young, S.L.: What are we assessing when we measure food security? A compendium and review of current metrics. Adv. Nutr. 4, 481–505 (2013)
Liu, R.Y., Parelius, J.M., Singh, K.: Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann. Stat. 27, 783–858 (1999)
Lopuhaa, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638–1665 (1999)
Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)
Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71, 447–466 (2009)
Ritter, G.: Robust Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)
Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley-Interscience, New York (1987)
Rousseeuw, P.J., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)
Acknowledgements
The research was partially supported by the Spanish Ministerio de Economía y Competitividad y fondos FEDER, Grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, Grant VA212U13. We are grateful to Gallup, Inc. and the Voices of the Hungry project, FAO, for access to the GWP/FIES data. We also would like to thank the AE and two referees for kind comments.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Dotto, F., Farcomeni, A., García-Escudero, L.A. et al. A reweighting approach to robust clustering. Stat Comput 28, 477–493 (2018). https://doi.org/10.1007/s11222-017-9742-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9742-x