Skip to main content
Log in

Density-based Silhouette diagnostics for clustering methods

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Silhouette information evaluates the quality of the partition detected by a clustering technique. Since it is based on a measure of distance between the clustered observations, its standard formulation is not adequate when a density-based clustering technique is used. In this work we propose a suitable modification of the Silhouette information aimed at evaluating the quality of clusters in a density-based framework. It is based on the estimation of the data posterior probabilities of belonging to the clusters and may be used to measure our confidence about data allocation to the clusters as well as to choose the best partition among different ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ankerst, M., Breuning, M.M., Kriegel, H., Sander, J., Optics: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD Int. Conf. on Manag. Data (SIGMOD-96), pp. 49–60 (1999)

  • Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17, 71–80 (2007)

    Article  MathSciNet  Google Scholar 

  • Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Bensmail, H., Celeux, G., Raftery, A.E., Robert, C.P.: Inference in model-based cluster analysis. Stat. Comput. 7, 1–10 (1997)

    Article  Google Scholar 

  • Bezdek, J.C., Pal, N.R.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 190–193 (1995)

    Article  Google Scholar 

  • Binder, D.A.: Bayesian cluster analysis. Biometrika 65, 31–38 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Binder, D.A.: Approximations to bayesian clustering rules. Biometrika 68, 275–285 (1981)

    Article  MathSciNet  Google Scholar 

  • Chang, W.C.: On using principal components before separating a mixture of two multivariate normal distributions. Appl. Stat. 32, 267–275 (1983)

    Article  MATH  Google Scholar 

  • Cuevas, A., Febrero, M., Fraiman, R.: Cluster analysis: a further approach based on density estimation. Comput. Stat. Data Anal. 36, 441–459 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Cutler, A., Windham, M.P.: Information-based validity functionals for mixture analysis. In: Proc. 1st US/Japan Conf. Front. Stat. Model., Bozdogan. Kluwer Academic, Norwell (1994)

    Google Scholar 

  • Davies, D., Bouldin, D.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  • Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  • Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 57, 3–32 (1974)

    Google Scholar 

  • Dy, J.G., Brodley, C.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)

    MathSciNet  Google Scholar 

  • Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases withe noise. In: Proc. 2nd Int. Conf. Knowl. Discov. Data Min. (KDD-96). AAAI Press, Menlo Park (1996)

    Google Scholar 

  • Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. 27, 861–874 (2006)

    Article  Google Scholar 

  • Forina, M., Armanino, C., Lanteri, S., Tiscornia, E.: Classification of olive oils from their fatty acid composition. In: Martens, M., Russwurm, H.J. (eds.) Food Research and Data Analysis, pp. 189–214. Appl. Sci., London (1983)

    Google Scholar 

  • Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)

    Google Scholar 

  • Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97, 611–631 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Tech. Rep. 504, Univ. of Washington, Dep. of Stat. (2006)

  • Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    Article  Google Scholar 

  • Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)

    MATH  Google Scholar 

  • Hubert, L.J., Schultz, J.W.: Quadratic assignment as a general data analysis strategy. Br. J. Math. Stat. Psychol 29, 190–241 (1976)

    MathSciNet  MATH  Google Scholar 

  • Kaufman, L., Rousseeuw, P.J.: Finding Groups in data: an introduction to cluster analysis. Wiley, New York (1990)

    Google Scholar 

  • Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1650–1654 (2002)

    Article  Google Scholar 

  • McLachlan, G.J., Peel, D.: Robust Cluster Analysis via Mixtures of Multivariate t-Distributions, pp. 658–666. Springer, Berlin (1998),

    Google Scholar 

  • Rousseeuw, P.J.: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  • Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min. Knowl. Discov. 2, 169–194 (1998)

    Article  Google Scholar 

  • Stuetzle, W.: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif. 20, 25–47 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanna Menardi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Menardi, G. Density-based Silhouette diagnostics for clustering methods. Stat Comput 21, 295–308 (2011). https://doi.org/10.1007/s11222-010-9169-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-010-9169-0

Keywords

Navigation