On Validation of Hierarchical Clustering

  • Hans-Joachim Mucha
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)

Abstract

An automatic validation of hierarchical clustering based on resampling techniques is recommended that can be considered as a three level assessment of stability. The first and most general level is decision making about the appropriate number of clusters. The decision is based on measures of correspondence between partitions such as the adjusted Rand index. Second, the stability of each individual cluster is assessed based on measures of similarity between sets such as the Jaccard coefficient. In the third and most detailed level of validation, the reliability of the cluster membership of each individual observation can be assessed. The built-in validation is demonstrated on the wine data set from the UCI repository where both the number of clusters and the class membership are known beforehand.

Keywords

Hierarchical Cluster Cluster Membership Individual Cluster Rand Index Cluster Validation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. FOWLKES E.B. and MALLOWS, C.L. (1983): A Method for Comparing two Hierarchical Clusterings. JASA 78, 553–569.CrossRefMATHGoogle Scholar
  2. HENNIG, C. (2004): A General Robustness and Stability Theory for Cluster Analysis. Preprint, 7, Universität Hamburg.Google Scholar
  3. HUBERT, L.J. and ARABIE, P. (1985): Comparing Partitions. Journal of Classification, 2, 193–218.CrossRefGoogle Scholar
  4. JAIN, A.K. and DUBES, R.C. (1988): Algorithms for Clustering Data. Prentice Hall, New Jersey.MATHGoogle Scholar
  5. JUNG, Y., PARK, H., DU, D.-Z. and DRAKE, B.L. (2003): A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering. Journal of Global Optimization 25, 91–111.CrossRefMathSciNetGoogle Scholar
  6. LEBART, L., MORINEAU, A. and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley, New York.MATHGoogle Scholar
  7. MUCHA, H.-J. (1992): Clusteranalyse mit Mikrocomputern. Akademie Verlag, Berlin.MATHGoogle Scholar
  8. MUCHA, H.-J. (2004): Automatic Validation of Hierarchical Clustering. In: J. Antoch (Ed.): Proceedings in Computational Statistics, COMPSTAT 2004, 16th Symposium. Physica-Verlag, Heidelberg, 1535–1542.Google Scholar
  9. MUCHA, H.-J. (2006): Finding Meaningful and Stable Clusters Using Local Cluster Analysis. In: V. Batagelj, H.-H. Bock, A. Ferligoj and A. Ziberna (Eds.): Data Science and Classification, Springer, Berlin, 101–108.CrossRefGoogle Scholar
  10. MUCHA, H.-J. and HAIMERL, E. (2005): Automatic Validation of Hierarchical Cluster Analysis with Application in Dialectometry. In: C. Weihs and W. Gaul (Eds.): Classification-The Ubiquitous Challenge, Springer, Berlin, 513–520.CrossRefGoogle Scholar
  11. RAND, W.M. (1971): Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical Association, 66, 846–850.CrossRefGoogle Scholar
  12. WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Hans-Joachim Mucha
    • 1
  1. 1.Weierstraß-Institut für Angewandte Analysis und StochastikBerlinGermany

Personalised recommendations