Density-based Silhouette diagnostics for clustering methods

Menardi, Giovanna

doi:10.1007/s11222-010-9169-0

Density-based Silhouette diagnostics for clustering methods

Published: 04 February 2010

Volume 21, pages 295–308, (2011)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Giovanna Menardi¹

677 Accesses
16 Citations
Explore all metrics

Abstract

Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ankerst, M., Breuning, M.M., Kriegel, H., Sander, J., Optics: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD Int. Conf. on Manag. Data (SIGMOD-96), pp. 49–60 (1999)
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17, 71–80 (2007)
Article MathSciNet Google Scholar
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Article MathSciNet MATH Google Scholar
Bensmail, H., Celeux, G., Raftery, A.E., Robert, C.P.: Inference in model-based cluster analysis. Stat. Comput. 7, 1–10 (1997)
Article Google Scholar
Bezdek, J.C., Pal, N.R.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 190–193 (1995)
Article Google Scholar
Binder, D.A.: Bayesian cluster analysis. Biometrika 65, 31–38 (1978)
Article MathSciNet MATH Google Scholar
Binder, D.A.: Approximations to bayesian clustering rules. Biometrika 68, 275–285 (1981)
Article MathSciNet Google Scholar
Chang, W.C.: On using principal components before separating a mixture of two multivariate normal distributions. Appl. Stat. 32, 267–275 (1983)
Article MATH Google Scholar
Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36, 441–459 (2001)
Article MathSciNet MATH Google Scholar
Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Proc. 1st US/Japan Conf. Front. Stat. Model., Bozdogan. Kluwer Academic, Norwell (1994)
Google Scholar
Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Article Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 57, 3–32 (1974)
Google Scholar
Dy, J.G., Brodley, C.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
MathSciNet Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases withe noise. In: Proc. 2nd Int. Conf. Knowl. Discov. Data Min. (KDD-96). AAAI Press, Menlo Park (1996)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)
Article Google Scholar
Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Martens, M., Russwurm, H.J. (eds.) Food Research and Data Analysis, pp. 189–214. Appl. Sci., London (1983)
Google Scholar
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Tech. Rep. 504, Univ. of Washington, Dep. of Stat. (2006)
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)
Article Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Hubert, L.J., Schultz, J.W.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol 29, 190–241 (1976)
MathSciNet MATH Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in data: an introduction to cluster analysis. Wiley, New York (1990)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1650–1654 (2002)
Article Google Scholar
McLachlan, G.J., Peel, D.: Robust Cluster Analysis via Mixtures of Multivariate t-Distributions, pp. 658–666. Springer, Berlin (1998),
Google Scholar
Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article MATH Google Scholar
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Discov. 2, 169–194 (1998)
Article Google Scholar
Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)
Article MathSciNet MATH Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics and Statistics, University of Trieste, P.le Europa, 1, Trieste, Italy
Giovanna Menardi

Authors

Giovanna Menardi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanna Menardi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Menardi, G. Density-based Silhouette diagnostics for clustering methods. Stat Comput 21, 295–308 (2011). https://doi.org/10.1007/s11222-010-9169-0

Download citation

Received: 23 February 2009
Accepted: 06 January 2010
Published: 04 February 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s11222-010-9169-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-based Silhouette diagnostics for clustering methods

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Density-based clustering with non-continuous data

A density invariant approach to clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Density-based Silhouette diagnostics for clustering methods

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Density-based clustering with non-continuous data

A density invariant approach to clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation