On the breakdown behavior of the TCLUST clustering procedure

Ruwet, C.; García-Escudero, L. A.; Gordaliza, A.; Mayo-Iscar, A.

doi:10.1007/s11749-012-0312-4

On the breakdown behavior of the TCLUST clustering procedure

Original Paper
Published: 13 December 2012

Volume 22, pages 466–487, (2013)
Cite this article

TEST Aims and scope Submit manuscript

C. Ruwet¹,
L. A. García-Escudero²,
A. Gordaliza² &
…
A. Mayo-Iscar²

180 Accesses
11 Citations
Explore all metrics

Abstract

Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the “eigenvalues-ratio” of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion α of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion α of contamination as soon as the data set is sufficiently “well clustered”.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Article 25 June 2020

Estimating the number of clusters via a corrected clustering instability

Article Open access 18 May 2020

Notes

A set is relatively compact if its closure is a compact set.
The pigeonhole principle states that if n items are put into m pigeonholes with n>m, then at least one pigeonhole must contain more than one item.

References

Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25:553–576
Article MATH Google Scholar
Dennis JE Jr. (1982) Algorithms for nonlinear fitting. In: Nonlinear optimization, Cambridge, 1981. Academic Press, London, pp 67–78
Google Scholar
Donoho D, Huber PJ (1983) The notion of breakdown point. In: A festschrift for Erich L. Lehmann. Wadsworth, Belmont, pp 157–184
Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
Article MathSciNet MATH Google Scholar
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33:347–380
Article MathSciNet MATH Google Scholar
Gallegos MT, Ritter G (2009a) Trimmed ML estimation of contaminated mixtures. Sankhyā 71:164–220
MathSciNet MATH Google Scholar
Gallegos MT, Ritter G (2009b) Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3:135–167
Article MathSciNet Google Scholar
García-Escudero LA, Gordaliza A (1999) Robustness properties of k means and trimmed k means. J Am Stat Assoc 94:956–969
MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345
Article MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109
Article MathSciNet Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput 21:585–599
Article MathSciNet MATH Google Scholar
Genton MG, Lucas A (2003) Comprehensive definitions of breakdown points for independent and dependent observations. J R Stat Soc, Ser B, Stat Methodol 65:81–94
Article MathSciNet MATH Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13:795–800
Article MathSciNet MATH Google Scholar
Hennig C (2008) Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. J Multivar Anal 99:1154–1176
Article MathSciNet MATH Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley–Interscience, New York
Book Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley–Interscience, New York
Book MATH Google Scholar
Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52:299–308
Article MathSciNet MATH Google Scholar
Ruwet C, García-Escudero LA, Gordaliza A, Mayo-Iscar A (2012) The influence function of the TCLUST robust clustering procedure. Adv Data Anal Classif 6:107–130
Article MathSciNet MATH Google Scholar
Zhong S, Ghosh J (2004) A unified framework for model-based clustering. J Mach Learn Res 4:1001–1037
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research has been partially supported by the Spanish Ministerio de Ciencia y Tecnología and the FEDER grant MTM2011-28657-C02-01.

Author information

Authors and Affiliations

Department of Mathematics, University of Liege, Grande Traverse 7, 4000, Liege, Belgium
C. Ruwet
Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, 47002, Valladolid, Spain
L. A. García-Escudero, A. Gordaliza & A. Mayo-Iscar

Authors

C. Ruwet
View author publications
You can also search for this author in PubMed Google Scholar
L. A. García-Escudero
View author publications
You can also search for this author in PubMed Google Scholar
A. Gordaliza
View author publications
You can also search for this author in PubMed Google Scholar
A. Mayo-Iscar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Ruwet.

Appendix

Proof of Lemma 1

(a) From (1), we have

$$2 L(X_{\mathcal{R}_\theta}|\mathbf{p},\mathbf{m},\mathbf{V})=2 \sum _{j=1}^{g}\sum_{x_i\in R_j}\log \bigl(p_j\varphi (x_i;m_j,V_j ) \bigr)\leq 2 \sum_{j=1}^{g}\sum _{x_i\in R_j}\log\varphi (x_i;m_j,V_j ) $$

since 0≤p _j≤1 for all j. This inequality becomes a strict inequality if there is at least one j=1,…,g such that 0<p _j<1. Then, using the expression of the normal pdf, we obtain

by the (ER) restriction. Finally, we get

since $m_{j}=\bar{x}_{R_{j}}$ (Proposition 4 of García-Escudero et al. 2008).

(b) Since |R∩X _n|≥gd+1, there is a subset, w.l.o.g. R _l, that contains at least d+1 elements of X _n (pigeonhole principle).^{Footnote 2} By general position of X _n, $W_{R_{l}\cap X_{n}}$ is regular and

$$W_{\mathcal{R}_\theta}=\sum_{j=1}^{g}W_{R_j}\succeq W_{R_l} \succeq W_{R_l\cap X_n}\succeq K I_d$$

with some constant K>0 that depends only on X _n. The result follows directly from this relation and part (a). □

Proof of Proposition 1

(a) Let M be the modified data set where n−r+g−1 observations have been replaced arbitrarily and denote by $\mathcal{R}^{*}$ and θ ^∗=(p ^∗,m ^∗,V ^∗) the optimal partition and the optimal parameters for M, i.e. $\mathcal{R}_{\theta^{*}}=\mathcal{R}^{*}$ and $\theta_{\mathcal{R}^{*}}=\theta^{*}$. The maximum trimmed likelihood remains bounded from below by a strictly positive constant that depends only on X _n. To see this, take R={x ₁,…,x _r−g+1}∪{y ₁,…,y _g−1} with r−g+1 original observations and g−1 replacements, p _j=1/g, V _j=I _d for all j=1,…,g and m ₁=0 and m _j=y _j−1 for all j=2,…,g. Then, by optimality of $\mathcal{R}^{*}$ and (p ^∗,m ^∗,V ^∗), we have

$$f\bigl(X_{\mathcal{R}^*}|\mathbf{p}^*,\mathbf{m}^*,\mathbf{V}^*\bigr)\geq f \bigl(X_{\mathcal{R}_\theta}|\theta=(\mathbf{p},\mathbf{m},\mathbf{V})\bigr) $$

which is also larger than the likelihood computed on R with (p,m,V) and the following assignments: x _i∈R ₁ for every i=1,…,r−g+1 and y _i∈R _i+1 for every i=1,…,g−1. This leads to

By assumption, any subset of M of size r contains at least r−(n−r+g−1)≥gd+1 original data points. Then, Lemma 1(b) can be applied and combined to the previous step to say

$$-\infty<2\log C \leq 2 L \bigl(X_{\mathcal{R}^*}|\mathbf{p}^*,\mathbf{m}^*,\mathbf{V}^*\bigr) \leq -dr\log(2\pi m_n)-d M_n^{-1}K $$

which involves that m _n and M _n are uniformly bounded and bounded away from zero.

(b) Let M be the modified data set where n−r+g observations have been replaced arbitrarily and denote by $\mathcal{R}^{*}$ and (p ^∗,m ^∗,V ^∗) the optimal set of untrimmed observations and the optimal parameters for M. By assumption, $\mathcal{R}^{*}$ contains at least g replacements. Then, either one cluster contains at least 2 replacements or each cluster contains at least one replacement. In the first case, let $R_{l}^{*}$ be a cluster with two replacements x and y. If each cluster contains at least one replacement, let $R_{l}^{*}$ be a cluster that contains at least d+1≥2 elements (such a cluster exists due to the general assumption r≥gd+1 and the pigeonhole principle). This cluster contains at least one replacement y and some other element x (replacement or not). In both cases, it is easy to see that $\operatorname{tr}(S^{*}_{l}) \geq \frac{1}{4}\|x-y\|^{2}$. Moreover, by (3.4) in García-Escudero et al. (2008), we know that the eigenvalues of matrices ${V^{*}_{j}}^{-1}$, j=1,…,g, denoted by λ _i,j for i=1,…,d and j=1,…,g, satisfy

$$(\lambda_{1,1},\ldots,\lambda_{d,1},\lambda_{1,2}\ldots,\lambda_{d,g})= \mathop{\mbox{argmin}}\limits_{\tilde{\lambda}_{i,j}}\sum_{j=1}^{g}p^{*}_{j}\sum_{i=1}^{d} \bigl(-\log \tilde{\lambda}_{i,j}+\lambda_i\bigl(S^{*}_{j}\bigr) \tilde{\lambda}_{i,j} \bigr) $$

(the absence of the weights in their expression is a typo). If $\mathbf{V}^{*}\in\mathcal{V}_{c}$, then $2\mathbf{V}^{*}\in\mathcal{V}_{c}$ and the previous equation leads to

since the eigenvalues are positive. Then, using $\lambda_{i}\bigl({V^{*}_{l}}^{-1}\bigr)\geq \lambda_{\mathrm{min}}\bigl({V^{*}_{l}}^{-1}\bigr)$ and $p^{*}_{l}\geq 2/r$, we have

$$0\leq d \log 2 - \frac{1}{2}\frac{2}{r}\lambda_{\mathrm{min}}\bigl({V^*_l}^{-1} \bigr)\sum_{i=1}^{d}\lambda_i \bigl(S^{*}_{l}\bigr) \leq d \log 2 - \frac{1}{r} \lambda_{\mathrm{min}}\bigl({V^*_l}^{-1}\bigr)\operatorname{tr} \bigl(S_{l}^{*}\bigr) $$

which implies

$$\lambda_{\mathrm{min}} \bigl({V^*_l}^{-1}\bigr)\|x-y\|^2\leq 4 \lambda_{\mathrm{min}} \bigl({V^*_l}^{-1}\bigr)\operatorname{tr} \bigl(S^{*}_{l}\bigr)\leq 4dr\log 2. $$

This shows that the smallest eigenvalues of ${V^{*}_{j}}^{-1}$, i.e. the inverse of the biggest eigenvalue of ${V^{*}_{j}}$, could tend toward zero if any replacement was chosen far away from the original observations and far away from the other replacements.

(c) Direct from (a) and (b). □

Proof of Lemma 2

Assume y ₁∈R _j. Proposition 4 of García-Escudero et al. (2008) implies

$$m_j=\frac{1}{|R_j|} \biggl(\sum_{x_i\in R_j}x_i+\sum_{y_i\in R_j}y_i \biggr)=\frac{1}{|R_j|} \biggl(\sum_{x_i\in R_j}x_i+\sum_{y_i\in R_j}y_1+\sum_{y_i\in R_j}(y_i-y_1) \biggr)$$

which is enough since (y _i−y ₁),i=2,…,q, are bounded. □

Proof of Proposition 2

(a) The proof follows that of Theorem 2(a) in Gallegos and Ritter (2009b) with some adaptations. While they use the fact that $\det(W_{\mathcal{R}^{*}})\rightarrow\infty$ when ∥y∥→∞, we use the fact that $\lambda_{\mathrm{max}}(W_{\mathcal{R}^{*}})\rightarrow\infty$. Then, we use Lemma 1(a) and the previous claim to show that the maximum of the log-likelihood function (1) tends to −∞ if the optimal solution does not trim the replacement. On the other hand, if it is trimmed off, the log-likelihood value obtained with p _j=1/g, m _j=0 and V _j=I _d for all j=1,…,g could be used as a finite lower bound for the trimmed log-likelihood function which would lead to a contradiction.

(b) The proof follows that of Theorem 3.5(b) in Gallegos and Ritter (2009a). Parts (α), construction of the data set, and (β), bounded behavior of the maximum likelihood function, are the same. In part (γ), we can use the same reasoning as in the proof of Proposition 1(b) to show that $\operatorname{tr}(S_{l}^{*})\rightarrow\infty$ if K ₁,K ₂→∞ (in replacement of (3.1)) and the pigeonhole principle with the general assumption r≥gd+1 to show that $W_{\mathcal{R}^{*}}\succeq W_{R_{j}}\succeq c_{F} I_{d}$ (in replacement of (3.2)). Then, Lemmas 1(a) and 2, the two first steps and the two previous claims allow to conclude as in Gallegos and Ritter (2009a).

(c) Direct from (a) and (b). □

Proof of Proposition 3

This proof follows that of Proposition 2 in Gallegos and Ritter (2009b). Let M be any admissible data set obtained from X _n by modifying at most r−q elements and let $\mathcal{R}^{*}$ and θ ^∗=(p ^∗,m ^∗,V ^∗) be the optimal partition and optimal parameters for M., i.e. $\mathcal{R}_{\theta^{*}}=\mathcal{R}^{*}$ and $\theta_{\mathcal{R}^{*}}=\theta^{*}$.

(α):

$M_{n}^{*}$ and $m_{n}^{*}$ are bounded and bounded away from zero by constants that depend only on X _n.

Since R ^∗ contains at most r−q replacements, it contains at least q≥gd+1 original observations. The proof finishes as that of Proposition 1(a).

(β):

If $R_{j}^{*}$ contains some original observations, then $m_{j}^{*}$ is bounded by a constant that depends only on X _n.

Let $x\in R_{j}^{*}\cap X_{n}$. We have $\operatorname{tr}(S_{j}^{*}) \geq \frac{1}{|R_{j}^{*}|}\|x-m_{j}^{*}\|^{2}$. Following the argument in the proof of Proposition 1(b), we show that $\|x-m_{j}^{*}\|^{2}\leq d r^{2} M_{n}^{*} \log 2 $ and the claim follows from the previous step.

(δ):

If it exists j∈{1,…,g} such that $0<p_{j}^{*}<1$, then

$$ L\bigl(X_{\mathcal{R}^*}|\mathbf{p}^*,\mathbf{m}^*,\mathbf{V}^* \bigr)< c_{d,r}+\frac{d r}{2}\log c -\frac{r}{2}\log \biggl( \frac{\operatorname{tr}(S_{\mathcal{R}^*})}{d} \biggr)^d. $$

(6)

By Lemma 1(a), and since $M^{*}_{n}\leq c m_{n}^{*}$ due to (ER), we have

(ϵ):

R ^∗ contains no modification with a sufficiently large norm.

The reasoning before (12) of Gallegos and Ritter (2009b) still holds in our case. This, with (3), implies that

$$\biggl(\frac{\operatorname{tr}(W_{\mathcal{R}^*})}{d} \biggr)^d\geq \biggl(\frac{\operatorname{tr}(W_\mathcal{T})}{d} \biggr)^d\geq g^2\max_{T\subseteq R\in \bigl(\substack{X_n\\r} \bigr)}\det(c W_{\mathcal{P}\cap R})$$

and then,

$$ \max_{T\subseteq R\in \bigl(\substack{X_n\\r} \bigr)} \log\det(S_{\mathcal{P}\cap R})\leq -2\log g -d\log c +\log \biggl(\frac{\operatorname{tr}(S_{\mathcal{R}^*})}{d} \biggr)^d. $$

(7)

With their notation $\bar{x}_{\mathcal{P}\cap R}$ and for all R⊆X _n of cardinality r, we have

Then,

We can use (7) and (6) to obtain

which contradicts the optimality of $\mathcal{R}^{*}$ and (p ^∗,m ^∗,V ^∗). Then R ^∗ contains no replacement with large norm and step (β) shows that the means remain bounded.

□

Proof of Lemma 3

The first half of the proof follows from that of Lemma 5 of Gallegos and Ritter (2009b). Then, we can use the linearity of the trace operator to obtain

as we can also use Lemma A1 of Gallegos and Ritter (2009b). Then

$$\operatorname{tr}(W_{\mathcal{T}}) \geq \operatorname{tr}(W_{\mathcal{T}\sqcap\mathcal{P}}) \biggl(1+ \frac{\kappa_\rho}{\operatorname{tr}(S_{\mathcal{T}\sqcap\mathcal{P}})}\min_{\substack{k,l\neq j\\P_h\cap T_k\neq\emptyset, h=j,l}} \|\bar{x}_{T_k\cap P_j}- \bar{x}_{T_k\cap P_l}\|^2 \biggr) $$

and the separation condition (4) leads to the conclusion. □

Proof of Proposition 4

(a) Direct from Proposition 3 and Lemma 3.

(b) Follows proof of Theorem 3(b) of Gallegos and Ritter (2009b) with Lemma 2.

(c) Follows proof of Theorem 3(c) of Gallegos and Ritter (2009b). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruwet, C., García-Escudero, L.A., Gordaliza, A. et al. On the breakdown behavior of the TCLUST clustering procedure. TEST 22, 466–487 (2013). https://doi.org/10.1007/s11749-012-0312-4

Download citation

Received: 04 November 2011
Accepted: 25 November 2012
Published: 13 December 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s11749-012-0312-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the breakdown behavior of the TCLUST clustering procedure

Abstract

Access this article

Similar content being viewed by others

Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

Estimating the number of clusters via a corrected clustering instability

Notes

References

Acknowledgements