Cluster Stability Assessment Based on Theoretic Information Measures
Cluster validation to determine the right number of clusters is an important issue in clustering processes. In this work, a strategy to address the problem of cluster validation based on cluster stability properties is introduced. The stability index proposed is based on information measures taking into account the variation on some of these measures due to the variability in clustering solutions produced by different sample sets of the same problem. The experiments carried out on synthetic and real database show the effectiveness of the cluster stability index when the clustering algorithm is based on a data structure model adequate to the problem.
Keywordscluster validation stability indices information theory
- 2.Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Third SIAM International Conference on data Mining (2003)Google Scholar
- 7.Sugar, C.: Techniques for clustering and classification with applications to medical problems. PhD Dissertation Stanford University, Stanford (1998)Google Scholar
- 8.Sugar, C., Lenert, L., Olshen, R.: An application of cluster analysis to health services research: empirically defined health states for depression from the sf-12. Technical Report Stanford University, Stanford (1999)Google Scholar
- 10.Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. In: Brownstein, M., Khodursky, A. (eds.) Methods in Molecular Biology, pp. 159–182. Humana press (2003)Google Scholar
- 11.Mufti, G.B., Bertrand, P., Moubarki, L.E.: Determining the number of groups from measures of cluster validity. In: ASMDA 2005, pp. 404–414 (2005)Google Scholar