Advertisement

Initialization Dependence of Clustering Algorithms

  • Wim De Mulder
  • Stefan Schliebs
  • René Boel
  • Martin Kuiper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5507)

Abstract

It is well known that the clusters produced by a clustering algorithm depend on the chosen initial centers. In this paper we present a measure for the degree to which a given clustering algorithm depends on the choice of initial centers, for a given data set. This measure is calculated for four well-known offline clustering algorithms (k-means Forgy, k-means Hartigan, k-means Lloyd and fuzzy c-means), for five benchmark data sets. The measure is also calculated for ECM, an online algorithm that does not require the number of initial centers as input, but for which the resulting clusters can depend on the order that the input arrives. Our main finding is that this initialization dependence measure can also be used to determine the optimal number of clusters.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognition Letters 28, 965–973 (2007)CrossRefGoogle Scholar
  2. 2.
    Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recognition Letters 17, 451–455 (1996)CrossRefGoogle Scholar
  3. 3.
    Katsavounidis, I., Kuo, J., Zhen Zhang, C.-C.: A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters 1, 144–146 (1994)CrossRefGoogle Scholar
  4. 4.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters 25, 1293–1302 (2004)CrossRefGoogle Scholar
  5. 5.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  6. 6.
    Kasabov, N.: Evolving Connectionist Systems: The Knowledge Engineering Approach. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  7. 7.
    UC Machine Learning Repository, http://archive.ics.uci.edu/ml/
  8. 8.
  9. 9.
  10. 10.
    Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Wim De Mulder
    • 1
  • Stefan Schliebs
    • 2
  • René Boel
    • 1
  • Martin Kuiper
    • 3
  1. 1.SYSTeMS, Ghent UniversityGhentBelgium
  2. 2.Knowledge Engineering and Discovery Research InstituteAuckland University of TechnologyAucklandNew Zealand
  3. 3.Department of BiologyNorwegian University of Science and TechnologyTrondheimNorway

Personalised recommendations