The influence function of the TCLUST robust clustering procedure

Ruwet, C.; García-Escudero, L. A.; Gordaliza, A.; Mayo-Iscar, A.

doi:10.1007/s11634-012-0107-1

The influence function of the TCLUST robust clustering procedure

Regular Article
Published: 19 April 2012

Volume 6, pages 107–130, (2012)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

C. Ruwet¹,
L. A. García-Escudero²,
A. Gordaliza² &
…
A. Mayo-Iscar²

194 Accesses
8 Citations
Explore all metrics

Abstract

The TCLUST procedure performs robust clustering with the aim of finding clusters with different scatter structures and weights. An Eigenvalues Ratio constraint is considered by TCLUST in order to achieve a wide range of clustering alternatives depending on the allowed differences among cluster scatter matrices. Moreover, this constraint avoids finding uninteresting spurious clusters. In order to guarantee the robustness of the method against the presence of outliers and background noise, the method allows for trimming of a given proportion of observations self-determined by the data. Based on this “impartial trimming”, the procedure is assumed to have good robustness properties. As it was done for the trimmed k-means method, this article studies robustness properties of the TCLUST procedure in the univariate case with two clusters by means of the influence function. The conclusion is that the TCLUST has a robustness behavior close to that of the trimmed k-means in spite of the fact that it addresses a more general clustering approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Cluster validity index based on Jeffrey divergence

Article 31 January 2015

Hierarchical Means Clustering

Article Open access 23 September 2022

References

Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin 18(2): 581–599
MathSciNet MATH Google Scholar
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25(2): 553–576
Article MATH Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458): 611–631
Article MathSciNet MATH Google Scholar
Gallegos MT (2001) Robust clustering under general normal assumptions. Technical Report MIP-0103, Fakultät für Mathematik und Informatik, Universität Passau
Gallegos MT (2002) Maximum likelihood clustering with outliers. In: Classification, clustering, and data analysis (Cracow, 2002). Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 247–255
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1): 347–380
Article MathSciNet MATH Google Scholar
Gallegos MT, Ritter G (2009) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3(2): 135–167
Article MathSciNet Google Scholar
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94(447): 956–969
MATH Google Scholar
García-Escudero LA, Gordaliza A (2007) The importance of the scales in heterogeneous robust clustering. Comput Stat Data Anal 51(9): 4403–4412
Article MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3): 1324–1345
Article MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4: 89–109
Article MathSciNet Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21: 585–599
Article MathSciNet MATH Google Scholar
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley series in probability and mathematical statistics: probability and mathematical statistics. Wiley, New York
Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13(2): 795–800
Article MathSciNet MATH Google Scholar
Luenberger DG, Ye Y (2008) Linear and nonlinear programming. In: International series in operations research and management science, vol 116, 3rd edn. Springer, New York
McLachlan G, Peel D (2000) Finite mixture models. Wiley series in probability and statistics: applied probability and statistics. Wiley-Interscience, New York
Google Scholar
Pison G, Van Aelst S (2004) Diagnostic plots for robust multivariate methods. J Comput Graph Stat 13(2): 310–329
Article MathSciNet Google Scholar
Rousseeuw P, van Zomeren B (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85: 633–651
Google Scholar
Ruwet C, Haesbroeck G (2011) Impact of contamination on training and test error rates in statistical clustering analysis. Commun Stat Simul Comput 40: 394–411
Article MathSciNet MATH Google Scholar
Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4(6): 1001–1037
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of Liège, B37, Grande Traverse 12, 4000, Liege, Belgium
C. Ruwet
Departamento de Estadística e Investigación Operativa, University of Valladolid, 47002, Valladolid, Spain
L. A. García-Escudero, A. Gordaliza & A. Mayo-Iscar

Authors

C. Ruwet
View author publications
You can also search for this author in PubMed Google Scholar
L. A. García-Escudero
View author publications
You can also search for this author in PubMed Google Scholar
A. Gordaliza
View author publications
You can also search for this author in PubMed Google Scholar
A. Mayo-Iscar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Ruwet.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruwet, C., García-Escudero, L.A., Gordaliza, A. et al. The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6, 107–130 (2012). https://doi.org/10.1007/s11634-012-0107-1

Download citation

Received: 19 January 2012
Revised: 21 March 2012
Accepted: 29 March 2012
Published: 19 April 2012
Issue Date: July 2012
DOI: https://doi.org/10.1007/s11634-012-0107-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The influence function of the TCLUST robust clustering procedure

Abstract

Access this article

Similar content being viewed by others

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Cluster validity index based on Jeffrey divergence

Hierarchical Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The influence function of the TCLUST robust clustering procedure

Abstract

Access this article

Similar content being viewed by others

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Cluster validity index based on Jeffrey divergence

Hierarchical Means Clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation