Trail-and-Error Approach for Determining the Number of Clusters

  • Haojun Sun
  • Mei Sun
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3930)


Automatically determining the number of clusters is an important issue in cluster analysis. In this paper, we explore “trial-and-error” approach to determining the number of clusters in a given data set. The fuzzy clustering algorithm, FCM, is selected as the basic “trial” algorithm and cluster validity optimization responses to the “error” procedure. To improve the computation speed, we propose two strategies, eliminating and splitting, which allow the FCM-based algorithms more efficient. To improve existing validity measures, we make use of a new validity function that fits particularly data sets containing overlapping clusters. Experimental results are given to illustrate the performance of the new algorithms.


Cluster Center True Number Cluster Validity Random Initialization Validity Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)MATHGoogle Scholar
  2. 2.
    Rezae, M., Letlieveldt, B., Reiber, J.: A new cluster validity index for the fuzzy c-means. Pattern Recognition Letters 19, 237–246 (1998)CrossRefGoogle Scholar
  3. 3.
    Rhee, H., Oh, K.: A Validity Measure for Fuzzy Clustering and Its Use in Selecting Optimal Number of Clusters. Proceedings of IEEE, 1020–1025 (1996)Google Scholar
  4. 4.
    Bezdek, J.: Fuzzy mathematics in pattern classification. Ph.D. Dissertation, Cornell University (1973)Google Scholar
  5. 5.
    Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. Fuzzy Systems 1, 98–109 (1993)CrossRefGoogle Scholar
  6. 6.
    Sun, H., Wang, S., Jiang, Q.: A new validation index for determining the number of clusters in a data set. In: Proceedings of IJCNN, Washington, DC, USA, July 2001, pp. 1852–1857 (2001)Google Scholar
  7. 7.
    Pena, J., Lozano, J., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)CrossRefGoogle Scholar
  8. 8.
    Gonzalez, T.: Clustering to Minimize and Maximum Intercluster Distance. Theoretical Computer Science 38, 293–306 (1985)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Bezdek, J.: Chapter F6: Pattern Recognition. In: Handbook of Fuzzy Computation. IOP Publishing Ltd. (1998)Google Scholar
  10. 10.
    Pal, N., Bezdek, J.: On Cluster Validity for the Fuzzy C-Means Model. IEEE Trans. on Fuzzy Systems 3(3), 370–390 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Haojun Sun
    • 1
  • Mei Sun
    • 1
  1. 1.College of Mathematics and Computer ScienceUniversity of HebeiBaodingChina

Personalised recommendations