On Validation of Hierarchical Clustering
An automatic validation of hierarchical clustering based on resampling techniques is recommended that can be considered as a three level assessment of stability. The first and most general level is decision making about the appropriate number of clusters. The decision is based on measures of correspondence between partitions such as the adjusted Rand index. Second, the stability of each individual cluster is assessed based on measures of similarity between sets such as the Jaccard coefficient. In the third and most detailed level of validation, the reliability of the cluster membership of each individual observation can be assessed. The built-in validation is demonstrated on the wine data set from the UCI repository where both the number of clusters and the class membership are known beforehand.
KeywordsHierarchical Cluster Cluster Membership Individual Cluster Rand Index Cluster Validation
Unable to display preview. Download preview PDF.
- HENNIG, C. (2004): A General Robustness and Stability Theory for Cluster Analysis. Preprint, 7, Universität Hamburg.Google Scholar
- MUCHA, H.-J. (2004): Automatic Validation of Hierarchical Clustering. In: J. Antoch (Ed.): Proceedings in Computational Statistics, COMPSTAT 2004, 16th Symposium. Physica-Verlag, Heidelberg, 1535–1542.Google Scholar
- WARD, J.H. (1963): Hierarchical Grouping Methods to Optimise an Objective Function. JASA, 58, 235–244.Google Scholar