Cluster Stability Assessment Based on Theoretic Information Measures

  • Damaris Pascual
  • Filiberto Pla
  • J. Salvador Sánchez
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5197)


Cluster validation to determine the right number of clusters is an important issue in clustering processes. In this work, a strategy to address the problem of cluster validation based on cluster stability properties is introduced. The stability index proposed is based on information measures taking into account the variation on some of these measures due to the variability in clustering solutions produced by different sample sets of the same problem. The experiments carried out on synthetic and real database show the effectiveness of the cluster stability index when the clustering algorithm is based on a data structure model adequate to the problem.


cluster validation stability indices information theory 


  1. 1.
    Bouguessa, M., Wang, S., Sun, H.: An Objective approach to cluster validation. Pattern Recognition Letters 27, 1419–1430 (2006)CrossRefGoogle Scholar
  2. 2.
    Ertoz, L., Steinbach, M., Kumar, V.: Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data. In: Third SIAM International Conference on data Mining (2003)Google Scholar
  3. 3.
    Lange, T., Braun, M.L., Buhmann, J.M.: Stability-Based Validation of Clustering Solutions. Neural Computation 16, 1299–1323 (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)CrossRefGoogle Scholar
  5. 5.
    Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)CrossRefGoogle Scholar
  6. 6.
    Pascual, D., Pla, F., Sánchez, J.S.: Non Parametric Local Density-based Clustering for Multimodal Overlapping Distributions. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 671–678. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Sugar, C.: Techniques for clustering and classification with applications to medical problems. PhD Dissertation Stanford University, Stanford (1998)Google Scholar
  8. 8.
    Sugar, C., Lenert, L., Olshen, R.: An application of cluster analysis to health services research: empirically defined health states for depression from the sf-12. Technical Report Stanford University, Stanford (1999)Google Scholar
  9. 9.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Statist Soc. B 63, Part 2, 411–423 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. In: Brownstein, M., Khodursky, A. (eds.) Methods in Molecular Biology, pp. 159–182. Humana press (2003)Google Scholar
  11. 11.
    Mufti, G.B., Bertrand, P., Moubarki, L.E.: Determining the number of groups from measures of cluster validity. In: ASMDA 2005, pp. 404–414 (2005)Google Scholar
  12. 12.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)CrossRefzbMATHGoogle Scholar
  13. 13.
    Li, J.: Divergence measures based on Shannon entropy. IEEE Trans. on Information Theory 37(1), 145–151 (1991)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Damaris Pascual
    • 1
  • Filiberto Pla
    • 2
  • J. Salvador Sánchez
    • 2
  1. 1.Center for Pattern Recognition and Data MiningUniversidad de OrienteSantiago de CubaCuba
  2. 2.Dept. Llentguages i Sistemas InformáticsUniversitat Jaume ICastellóSpain

Personalised recommendations