Skip to main content
Log in

A reweighting approach to robust clustering

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

An iteratively reweighted approach for robust clustering is presented in this work. The method is initialized with a very robust clustering partition based on an high trimming level. The initial partition is then refined to reduce the number of wrongly discarded observations and substantially increase efficiency. Simulation studies and real data examples indicate that the final clustering solution has both good properties in terms of robustness and efficiency and naturally adapts to the true underlying contamination level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Ballard, T.J., Kepple, A.W., Cafiero, C.: The food insecurity experience scale: developing a global standard for monitoring hunger worldwide. Technical report, Food and Agriculture Organization of the United Nations, Rome (2013)

  • Butler, R.W., Davies, P.L., Jhun, M.: Asymptotics for the Minimum Covariance Determinant estimator. Ann. Stat. 21, 1385–1400 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Cafiero, C., Melgar-Quinonez, H.R., Ballard, T.J., Kepple, A.W.: Validity and reliability of food security measures. Ann. N. Y. Acad. Sci. 1331, 230–248 (2014)

    Article  Google Scholar 

  • Cafiero, C., Nord, M., Viviani, S., del Grossi, M.E., Ballard, T.J., Kepple, A.W., Miller, M., Nwosu, C.: Methods for estimating comparable rates of food insecurity experienced by adults throughout the world. Technical report, Food and Agriculture Organization of the United Nations, Rome (2016)

  • Cerioli, A.: Multivariate outlier detection with high-breakdown estimators. J. Am. Stat. Assoc. 105, 147–156 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli, A., Farcomeni, A.: Error rates for multivariate outlier detection. Comput. Stat. Data Anal. 55, 544–553 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Cerioli, A., Farcomeni, A., Riani, M.: Strong consistency and robustness of the forward search estimator of multivariate location and scatter. J. Multivar. Anal. 126, 167–183 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian clustering. J. Am. Stat. Assoc. 111, 1648–1659 (2016)

    Article  MathSciNet  Google Scholar 

  • Cuesta-Albertos, J.A., Gordaliza, A., Matrán, C.: Trimmed \(k\)-means: an attempt to robustify quantizers. Ann. Stat. 25, 553–576 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos, J.A., Matran, C., Mayo-Iscar, A.: Robust estimation in the normal mixture model based on robust clustering. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70, 779–802 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. CRC Press, Boca Raton (2015)

    Book  MATH  Google Scholar 

  • Flury, B., Riedwyl, H.: Multivariate Statistics. A Practical Approach. Chapman and Hall, London (1988)

    Book  MATH  Google Scholar 

  • Fritz, H., García-Escudero, L.A., Mayo-Iscar, A.: A fast algorithm for robust constrained clustering. Comput. Stat. Data Anal. 61, 124–136 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos, M.T., Ritter, G.: A robust method for cluster analysis. Ann. Stat. 33, 347–380 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Gallup: Worldwide Research Methodology and Codebook. Gallup Inc, Washington (2015)

  • García-Escudero, L.A., Gordaliza, A.: The importance of the scales in heterogeneous robust clustering. Comput. Stat. Data Anal. 51, 4403–4412 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A review of robust clustering methods. Adv. Data Anal. Classif. 4, 89–109 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Godfray, H.C.J., Beddington, J.R., Crute, I.R., Haddad, K., Lawrence, D., Muir, J.F., Pretty, J., Robinson, S., Thomas, S.M., Toulmin, C.: Food security: the challenge of feeding 9 billion people. Science 327, 812–818 (2010)

    Article  Google Scholar 

  • Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the Minimum Covariance Determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Hardin, J., Rocke, D.M.: The distribution of robust distances. J. Comput. Graph. Stat. 14, 928–946 (2005)

    Article  MathSciNet  Google Scholar 

  • Hennig, C.: Breakdown points for maximum likelihood-estimators of location-scale mixtures. Ann. Stat. 32, 1313–1340 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Hennig, C.: Fuzzy and crisp Mahalanobis fixed point clusters. In: Baier, D., Decker, R., Schmidt-Thieme, L. (eds.) Data Analysis and Decision Support, pp. 47–56. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  • Hennig, C.: Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J. Multivar. Anal. 99, 1154–1176 (2008)

  • Jones, A.D., Ngure, F.M., Pelto, G., Young, S.L.: What are we assessing when we measure food security? A compendium and review of current metrics. Adv. Nutr. 4, 481–505 (2013)

    Article  Google Scholar 

  • Liu, R.Y., Parelius, J.M., Singh, K.: Multivariate analysis by data depth: descriptive statistics, graphics and inference. Ann. Stat. 27, 783–858 (1999)

    MathSciNet  MATH  Google Scholar 

  • Lopuhaa, H.P.: Asymptotics of reweighted estimators of multivariate location and scatter. Ann. Stat. 27, 1638–1665 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Neykov, N., Filzmoser, P., Dimova, R., Neytchev, P.: Robust fitting of mixtures using the trimmed likelihood estimator. Comput. Stat. Data Anal. 52, 299–308 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Riani, M., Atkinson, A., Cerioli, A.: Finding an unknown number of multivariate outliers. J. R. Stat. Soc. Ser. B 71, 447–466 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Ritter, G.: Robust Cluster Analysis and Variable Selection. CRC Press, Boca Raton (2014)

    MATH  Google Scholar 

  • Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283–297 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley-Interscience, New York (1987)

    Book  MATH  Google Scholar 

  • Rousseeuw, P.J., van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)

    Article  Google Scholar 

Download references

Acknowledgements

The research was partially supported by the Spanish Ministerio de Economía y Competitividad y fondos FEDER, Grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, Grant VA212U13. We are grateful to Gallup, Inc. and the Voices of the Hungry project, FAO, for access to the GWP/FIES data. We also would like to thank the AE and two referees for kind comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Farcomeni.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 559 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dotto, F., Farcomeni, A., García-Escudero, L.A. et al. A reweighting approach to robust clustering. Stat Comput 28, 477–493 (2018). https://doi.org/10.1007/s11222-017-9742-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9742-x

Keywords

Navigation