Advances in Data Analysis and Classification

, Volume 11, Issue 4, pp 691–710 | Cite as

A fuzzy approach to robust regression clustering

  • Francesco Dotto
  • Alessio Farcomeni
  • Luis Angel García-Escudero
  • Agustín Mayo-Iscar
Regular Article

Abstract

A new robust fuzzy regression clustering method is proposed. We estimate coefficients of a linear regression model in each unknown cluster. Our method aims to achieve robustness by trimming a fixed proportion of observations. Assignments to clusters are fuzzy: observations contribute to estimates in more than one single cluster. We describe general criteria for tuning the method. The proposed method seems to be robust with respect to different types of contamination.

Keywords

Robustness Fuzzy clustering Trimming Regression clustering 

Mathematics Subject Classification

62H30 

Notes

Acknowledgments

The authors are grateful to three referees and the Associated Editor for several constructive suggestions. Research partially supported by the Spanish Ministerio de Economía y Competitividad, Grant MTM2014-56235-C2-1-P, and by Consejería de Educación de la Junta de Castilla y León, Grant VA212U13.

Supplementary material

11634_2016_271_MOESM1_ESM.pdf (554 kb)
Supplementary material 1 (pdf 553 KB)

References

  1. Ali AM, Karmakar GC, Dooley LS (2008) Review on fuzzy clustering algorithms. J Adv Comput 2:169–181Google Scholar
  2. Bezdek JC (1981) Pattern recognition with fuzzy objective function algoritms. Plenum Press, New YorkCrossRefMATHGoogle Scholar
  3. Bock HH (1969) The equivalence of two extremal problems and its application to the iterative classification of multivariate data. Paper presented at the Workshop “Medizinische Statistik”, Forschungsinstitut OberwolfachGoogle Scholar
  4. Bryant PG (1991) Large-sample results for optimization-based clustering methods. J Classif 8:31–44MathSciNetCrossRefMATHGoogle Scholar
  5. Celeux G, Govaert A (1992) Classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 13:315–332MathSciNetCrossRefMATHGoogle Scholar
  6. Cerioli A, Farcomeni A, Riani M (2013) Robust distances for outlier free goodness-of-fit testing. Comput Stat Data Anal 65:29–45MathSciNetCrossRefGoogle Scholar
  7. Cerioli A, Farcomeni A (2011) Error rates for multivariate outlier detection. Comput Stat Data Anal 55:544–553MathSciNetCrossRefMATHGoogle Scholar
  8. Coretto P, Hennig C (2016) Robust improper maximum likelihood: tuning, computation and a comparison with other methods for robust Gaussian clustering. J Am Stat Assoc (in press)Google Scholar
  9. DeSarbo WS, Cron WL (1988) A maximum likelihood methodology for clusterwise linear regression. J Classif 5:249–282MathSciNetCrossRefMATHGoogle Scholar
  10. D’Urso P, Massari R, Santoro A (2011) Robust fuzzy regression analysis. Inf Sci 18:4154–4174MathSciNetCrossRefMATHGoogle Scholar
  11. D’Urso P, De Giovanni L, Massari R (2014) Trimmed fuzzy clustering for interval-values data. Adv Data Anal Classif 9:21–40CrossRefGoogle Scholar
  12. Farcomeni A (2014a) Snipping for robust \(k\)-means clustering under component-wise contamination. Stat Comput 24:909–917MathSciNetCrossRefMATHGoogle Scholar
  13. Farcomeni A (2014b) Robust constrained clustering in presence of entry-wise outliers. Technometrics 56:102–111MathSciNetCrossRefGoogle Scholar
  14. Farcomeni A, Greco L (2015) Robust methods for data reduction. Chapman and Hall/CRC Press, Boca RatonCrossRefMATHGoogle Scholar
  15. Fritz H, García-Escudero LA, Mayo-Iscar A (2013a) Robust constrained fuzzy clustering. Inf Sci 245:38–52MathSciNetCrossRefMATHGoogle Scholar
  16. Fritz H, García-Escudero LA, Mayo-Iscar A (2013b) A fast algorithm for robust constrained clustering. Comput Stat Data Anal 61:124–136MathSciNetCrossRefMATHGoogle Scholar
  17. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345MathSciNetCrossRefMATHGoogle Scholar
  18. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) Robust clusterwise linear regression through trimming. Comput Stat Data Anal 54:3057–3069MathSciNetCrossRefMATHGoogle Scholar
  19. García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21:585–599MathSciNetCrossRefMATHGoogle Scholar
  20. Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11:773–781CrossRefMATHGoogle Scholar
  21. Gustafson DE, Kessel WC (1979) Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of the IEEE international conference on fuzzy systems, vol 25, pp 761–766Google Scholar
  22. Hathaway RJ, Bezdek JC (1993) Switching regression models and fuzzy clustering. IEEE Trans Fuzzy Syst 1:195–204CrossRefGoogle Scholar
  23. Hennig C, Liao TF (2013) How to find an appropriate clustering for mixed types of variables with application to socioeconomic stratification. J R Stat Sci Ser C (Appl Stat) 62:309–369CrossRefGoogle Scholar
  24. Honda K, Ohyama T, Ichihashi H, Notsu A (2008) FCM-type switching regression with alternating least square method. In: Proceedings of the IEEE international conference on fuzzy systems (FUZZ 2008), pp 122–127Google Scholar
  25. Hosmer DW Jr (1974) Maximum likelihood estimates of the parameters of a mixture of two regression lines. Commun Stat 3:995–1006CrossRefMATHGoogle Scholar
  26. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRefMATHGoogle Scholar
  27. Ingrassia S, Minotti SC, Punzo A (2014) Model-based clustering via linear cluster-weighted models. Comput Stat Data Anal 71:159–182MathSciNetCrossRefGoogle Scholar
  28. Kim J, Krishnapuram R, Davé RN (1996) Application of the least trimmed squares technique to prototype-based clustering. Pattern Recognit Lett 17:633–641CrossRefGoogle Scholar
  29. Leisch F (2006) A toolbox for K-centroids cluster analysis. Comput Stat Data Anal 51:526–544MathSciNetCrossRefMATHGoogle Scholar
  30. Lenstra AK, Lenstra JK, Rinnooy Kan AHG, Wansbeek TJ (1982) Two lines least squares. Ann Discrete Math 66:201–211MathSciNetMATHGoogle Scholar
  31. McLachlan G, Peel D (2000) Finite mixture models. Wiley, New YorkCrossRefMATHGoogle Scholar
  32. Perry PO (2009) Cross-validation for unsupervised learning. arXiv:0909.3052
  33. Ritter G (2015) Robust cluster analysis and variable selection. CRC Press, Boca RatonMATHGoogle Scholar
  34. Rousseeuw PJ, Kaufman L, Trauwaert E (1996) Fuzzy clustering using scatter matrices. Comput Stat Data Anal 23:135–151CrossRefMATHGoogle Scholar
  35. Ruspini EH (1969) A new approach to clustering. Inf Control 29:22–32CrossRefMATHGoogle Scholar
  36. Sadaaki M, Masao M (1997) Fuzzy \(c\)-means as a regularization and maximum entropy approach. In: Proceedings of the 7th international fuzzy systems association world congress (IFSA’97), vol 2. University of Economics, Prague, pp 86–92Google Scholar
  37. Song W, Yao W, Xing Y (2014) Robust mixture regression model fitting by Laplace distribution. Comput Stat Data Anal 71:128–137MathSciNetCrossRefGoogle Scholar
  38. Späth H (1982) A fast algorithm for clusterwise linear regression. Computing 29:175–181CrossRefMATHGoogle Scholar
  39. Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37:35–43MathSciNetCrossRefMATHGoogle Scholar
  40. Trauwaert E, Kaufman L, Rousseeuw P (1991) Fuzzy clustering algorithms based on the maximum likelihood principle. Fuzzy Sets Syst 42:213–227CrossRefMATHGoogle Scholar
  41. Wu KL, Yang MS, Hsieh, JN (2009) Alternative fuzzy switching regression. In: Proceedings of the international multiconference of engineers and computer scientists 2009 (IMECS 2009), 18–20 Mar, vol 1. Newswood Limited, Hong KongGoogle Scholar
  42. Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41:656–671MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Francesco Dotto
    • 1
  • Alessio Farcomeni
    • 2
  • Luis Angel García-Escudero
    • 3
  • Agustín Mayo-Iscar
    • 4
  1. 1.Dipartimento di Scienze StatisticheUniversità di Roma “La Sapienza”RomeItaly
  2. 2.Dipartimento di Sanità Pubblica e Malattie InfettiveUniversità di Roma “La Sapienza”RomeItaly
  3. 3.Departamento de Estadística e Investigación OperativaUniversidad de ValladolidValladolidSpain
  4. 4.Departamento de Estadística e Investigación OperativaUniversidad de ValladolidValladolidSpain

Personalised recommendations