Statistics and Computing

, Volume 28, Issue 2, pp 477–493 | Cite as

A reweighting approach to robust clustering

  • Francesco Dotto
  • Alessio FarcomeniEmail author
  • Luis Angel García-Escudero
  • Agustín Mayo-Iscar


An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution has both good properties in terms of robustness and efficiency and naturally adapts to the true underlying contamination level.


Cluster analysis Trimming Robustness Minimum covariance determinant estimator 



The research was partially supported by the Spanish Ministerio de Economía y Competitividad y fondos FEDER, Grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, Grant VA212U13. We are grateful to Gallup, Inc. and the Voices of the Hungry project, FAO, for access to the GWP/FIES data. We also would like to thank the AE and two referees for kind comments.

Supplementary material

11222_2017_9742_MOESM1_ESM.pdf (559 kb)
Supplementary material 1 (pdf 559 KB)


  1. Ballard, T.J., Kepple, A.W., Cafiero, C.: The food insecurity experience scale: developing a global standard for monitoring hunger worldwide. Technical report, Food and Agriculture Organization of the United Nations, Rome (2013)Google Scholar
  2. Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the Minimum Covariance Determinant estimator. Ann. Stat. 21, 1385–1400 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  3. Cafiero, C., Melgar-Quinonez, H.R., Ballard, T.J., Kepple, A.W.: Validity and reliability of food security measures. Ann. N. Y. Acad. Sci. 1331, 230–248 (2014)CrossRefGoogle Scholar
  4. Cafiero, C., Nord, M., Viviani, S., del Grossi, M.E., Ballard, T.J., Kepple, A.W., Miller, M., Nwosu, C.: Methods for estimating comparable rates of food insecurity experienced by adults throughout the world. Technical report, Food and Agriculture Organization of the United Nations, Rome (2016)Google Scholar
  5. Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105, 147–156 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55, 544–553 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cerioli, A., Farcomeni, A., Riani, M.: Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivar. Anal. 126, 167–183 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  8. Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648–1659 (2016)MathSciNetCrossRefGoogle Scholar
  9. Cuesta-Albertos, J.A., Gordaliza, A., Matrán, C.: Trimmed \(k\)-means: an attempt to robustify quantizers. Ann. Stat. 25, 553–576 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Cuesta-Albertos, J.A., Matran, C., Mayo-Iscar, A.: Robust estimation in the normal mixture model based on robust clustering. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 779–802 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  11. Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015)CrossRefzbMATHGoogle Scholar
  12. Flury, B., Riedwyl, H.: Multivariate Statistics. A Practical Approach. Chapman and Hall, London (1988)CrossRefzbMATHGoogle Scholar
  13. Fritz, H., García-Escudero, L.A., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  14. Gallegos, M.T., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Gallup: Worldwide Research Methodology and Codebook. Gallup Inc, Washington (2015)Google Scholar
  16. García-Escudero, L.A., Gordaliza, A.: The importance of the scales in heterogeneous robust clustering. Comput. Stat. Data Anal. 51, 4403–4412 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  17. García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  18. García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89–109 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  19. García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Godfray, H.C.J., Beddington, J.R., Crute, I.R., Haddad, K., Lawrence, D., Muir, J.F., Pretty, J., Robinson, S., Thomas, S.M., Toulmin, C.: Food security: the challenge of feeding 9 billion people. Science 327, 812–818 (2010)CrossRefGoogle Scholar
  21. Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the Minimum Covariance Determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 928–946 (2005)MathSciNetCrossRefGoogle Scholar
  23. Hennig, C.: Breakdown points for maximum likelihood-estimators of location-scale mixtures. Ann. Stat. 32, 1313–1340 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  24. Hennig, C.: Fuzzy and crisp Mahalanobis fixed point clusters. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds.) Data Analysis and Decision Support, pp. 47–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 1154–1176 (2008)Google Scholar
  26. Jones, A.D., Ngure, F.M., Pelto, G., Young, S.L.: What are we assessing when we measure food security? A compendium and review of current metrics. Adv. Nutr. 4, 481–505 (2013)CrossRefGoogle Scholar
  27. Liu, R.Y., Parelius, J.M., Singh, K.: Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann. Stat. 27, 783–858 (1999)MathSciNetzbMATHGoogle Scholar
  28. Lopuhaa, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638–1665 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  29. Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71, 447–466 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  31. Ritter, G.: Robust Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)zbMATHGoogle Scholar
  32. Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  33. Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley-Interscience, New York (1987)CrossRefzbMATHGoogle Scholar
  34. Rousseeuw, P.J., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Dipartimento di Scienze StatisticheUniversità di Roma “La Sapienza”RomeItaly
  2. 2.Dipartimento di Sanità Pubblica e Malattie InfettiveUniversità di Roma “La Sapienza”RomeItaly
  3. 3.Departamento de Estadística e Investigación OperativaUniversidad de ValladolidValladolidSpain
  4. 4.Departamento de Estadística e Investigación OperativaUniversidad de ValladolidValladolidSpain

Personalised recommendations