Advertisement

K-Means Clustering Seeds Initialization Based on Centrality, Sparsity, and Isotropy

  • Pilsung Kang
  • Sungzoon Cho
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5788)

Abstract

K-Means is the most commonly used clustering algorithm. Despite its numerous advantages, it has a crucial drawback: the final cluster structure entirely relies on the choice of initial seeds. In this paper, a new seeds initialization algorithm based on centrality, sparsity, and isotropy is proposed. Preliminary experiments show that the proposed algorithm not only resulted in better clustering structures, but also accelerated the convergence.

Keywords

Cluster Algorithm Reconstruction Error Initial Seed Class Accuracy Pattern Recognition Letter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berkin, P.: A servey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Berlin (2006)CrossRefGoogle Scholar
  2. 2.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  3. 3.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar
  4. 4.
    Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  5. 5.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  6. 6.
    Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multiprocessors. In: Proceedings of the fifth ACM SIGKDD, Large-scale Parallel KDD Systems Workshop, San Diego, CA, USA, pp. 245–260 (1999)Google Scholar
  7. 7.
    Trujillo, M., Izquierdo, E.: Combining k-Means and semivariogram-based grid clustering. In: Proceedings of the 47th International Symposium ELMAR focused on Multimedia Systems and Applications, Zadar, Croatia, pp. 9–12 (2005)Google Scholar
  8. 8.
    He, J., Tan, A., Tan, C., Sung, S.: ART-C: A neural architecture for self-organization under constraints. In: Proceedings of International Joint Conference on Neural Networks (IJCNN 2002), Hawaii, USA, pp. 2550–2555 (2002)Google Scholar
  9. 9.
    Tou, J., Gonzalez, R.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)zbMATHGoogle Scholar
  10. 10.
    Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1(10), 144–146 (1994)CrossRefGoogle Scholar
  11. 11.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons Ltd., New York (1990)CrossRefzbMATHGoogle Scholar
  12. 12.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters 25(11), 1293–1302 (2004)CrossRefGoogle Scholar
  13. 13.
    Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognition Letters 28(8), 965–973 (2007)CrossRefGoogle Scholar
  14. 14.
    Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recognition 36(2), 451–461 (2003)CrossRefGoogle Scholar
  15. 15.
    Pen̄a, J., Lozano, J., Larran̄aga, P.: An empirical comparison of four initialization methods for the K-Means algorithm. Pattern Recognition Letters 20(10), 1027–1040 (1999)CrossRefGoogle Scholar
  16. 16.
    Mitra, P., Murthy, C., Pal, S.K.: Density-based multiscale data condensation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(6), 734–747 (2002)CrossRefGoogle Scholar
  17. 17.
    Kang, P., Cho, S.: Locally linear reconstruction for inatance-based learning. Pattern Recognition 41(11), 3507–3518 (2008)CrossRefzbMATHGoogle Scholar
  18. 18.
    Halkidi, M., Vazirgiannis, M., Batistakis, Y.: Quality scheme assessment in the clustering process. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 265–276. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Pilsung Kang
    • 1
  • Sungzoon Cho
    • 1
  1. 1.Department of Industrial engineeringSeoul National UniversitySeoulRepublic of Korea

Personalised recommendations