Estimating the number of clusters from distributional results of partitioning a given data set
When estimating the optimal value of the number of clusters, C, of a given data set, one typically uses, for each candidate value of C, a single (final) result of the clustering algorithm. If distributional data of size T are used, these data come from Tdata sets obtained, e.g., by a bootstrapping technique. Here a new approach is introduced that utilizes distributional data generated by clustering the original data T times in the framework of cost function optimization and cluster validity indices. Results of this method are reported for model data (100 realizations) and gene expression data. The probability of correctly estimating the number of clusters was often higher compared to recently published results of several classical methods and a new statistical approach (Clest).
KeywordsCluster Algorithm Validity Index Cluster Validity Index Cluster Trial Cost Function Optimization
Unable to display preview. Download preview PDF.
- Theodoridis, S., Koutroumbas, K. (1999) Pattern Recognition. Academic Press, San DiegoGoogle Scholar