Skip to main content
Log in

The influence function of the TCLUST robust clustering procedure

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

The TCLUST procedure performs robust clustering with the aim of finding clusters with different scatter structures and weights. An Eigenvalues Ratio constraint is considered by TCLUST in order to achieve a wide range of clustering alternatives depending on the allowed differences among cluster scatter matrices. Moreover, this constraint avoids finding uninteresting spurious clusters. In order to guarantee the robustness of the method against the presence of outliers and background noise, the method allows for trimming of a given proportion of observations self-determined by the data. Based on this “impartial trimming”, the procedure is assumed to have good robustness properties. As it was done for the trimmed k-means method, this article studies robustness properties of the TCLUST procedure in the univariate case with two clusters by means of the influence function. The conclusion is that the TCLUST has a robustness behavior close to that of the trimmed k-means in spite of the fact that it addresses a more general clustering approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin 18(2): 581–599

    MathSciNet  MATH  Google Scholar 

  • Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25(2): 553–576

    Article  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos MT (2001) Robust clustering under general normal assumptions. Technical Report MIP-0103, Fakultät für Mathematik und Informatik, Universität Passau

  • Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 247–255

  • Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1): 347–380

    Article  MathSciNet  MATH  Google Scholar 

  • Gallegos MT, Ritter G (2009) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3(2): 135–167

    Article  MathSciNet  Google Scholar 

  • García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447): 956–969

    MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A (2007) The importance of the scales in heterogeneous robust clustering. Comput Stat Data Anal 51(9): 4403–4412

    Article  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3): 1324–1345

    Article  MATH  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89–109

    Article  MathSciNet  Google Scholar 

  • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21: 585–599

    Article  MathSciNet  MATH  Google Scholar 

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics: probability and mathematical statistics. Wiley, New York

    Google Scholar 

  • Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2): 795–800

    Article  MathSciNet  MATH  Google Scholar 

  • Luenberger DG, Ye Y (2008) Linear and nonlinear programming. In: International series in operations research and management science, vol 116, 3rd edn. Springer, New York

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley series in probability and statistics: applied probability and statistics. Wiley-Interscience, New York

    Google Scholar 

  • Pison G, Van Aelst S (2004) Diagnostic plots for robust multivariate methods. J Comput Graph Stat 13(2): 310–329

    Article  MathSciNet  Google Scholar 

  • Rousseeuw P, van Zomeren B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633–651

    Google Scholar 

  • Ruwet C, Haesbroeck G (2011) Impact of contamination on training and test error rates in statistical clustering analysis. Commun Stat Simul Comput 40: 394–411

    Article  MathSciNet  MATH  Google Scholar 

  • Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4(6): 1001–1037

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Ruwet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruwet, C., García-Escudero, L.A., Gordaliza, A. et al. The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6, 107–130 (2012). https://doi.org/10.1007/s11634-012-0107-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-012-0107-1

Keywords

Mathematics Subject Classification

Navigation