Abstract
Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.
Similar content being viewed by others
References
Ankerst, M., Breuning, M.M., Kriegel, H., Sander, J., Optics: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD Int. Conf. on Manag. Data (SIGMOD-96), pp. 49–60 (1999)
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17, 71–80 (2007)
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Bensmail, H., Celeux, G., Raftery, A.E., Robert, C.P.: Inference in model-based cluster analysis. Stat. Comput. 7, 1–10 (1997)
Bezdek, J.C., Pal, N.R.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 190–193 (1995)
Binder, D.A.: Bayesian cluster analysis. Biometrika 65, 31–38 (1978)
Binder, D.A.: Approximations to bayesian clustering rules. Biometrika 68, 275–285 (1981)
Chang, W.C.: On using principal components before separating a mixture of two multivariate normal distributions. Appl. Stat. 32, 267–275 (1983)
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36, 441–459 (2001)
Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Proc. 1st US/Japan Conf. Front. Stat. Model., Bozdogan. Kluwer Academic, Norwell (1994)
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 57, 3–32 (1974)
Dy, J.G., Brodley, C.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases withe noise. In: Proc. 2nd Int. Conf. Knowl. Discov. Data Min. (KDD-96). AAAI Press, Menlo Park (1996)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)
Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Martens, M., Russwurm, H.J. (eds.) Food Research and Data Analysis, pp. 189–214. Appl. Sci., London (1983)
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Tech. Rep. 504, Univ. of Washington, Dep. of Stat. (2006)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Hubert, L.J., Schultz, J.W.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol 29, 190–241 (1976)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in data: an introduction to cluster analysis. Wiley, New York (1990)
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1650–1654 (2002)
McLachlan, G.J., Peel, D.: Robust Cluster Analysis via Mixtures of Multivariate t-Distributions, pp. 658–666. Springer, Berlin (1998),
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Discov. 2, 169–194 (1998)
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Menardi, G. Density-based Silhouette diagnostics for clustering methods. Stat Comput 21, 295–308 (2011). https://doi.org/10.1007/s11222-010-9169-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-010-9169-0