Skip to main content
Log in

Stable initialization scheme for K-means clustering

  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Though K-means is very popular for general clustering, its performance which generally converges to numerous local minima depends highly on initial cluster centers. In this paper a novel initialization scheme to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search which retrieves all points in a given data set whose nearest neighbor is a given query point. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. The application of proposed algorithm to K-means clustering algorithm is demonstrated. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967: 281–297.

    Google Scholar 

  2. Fukunaga K. Introduction to Statistical Pattern Recognition [M]. San Diego: Academic Press, 1990.

    Google Scholar 

  3. Jain A K, Dubes R C. Algorithms for Clustering Data[M]. Englewood Cliffs: Prentice Hall, 1988.

    Google Scholar 

  4. Duda R O, Hart P E. Pattern Classification and Scene Analysis [M]. New York: John Wiley and Sons, 1973.

    Google Scholar 

  5. Thiesson B, Meck C, Chickering D M, et al. Learning Mixtures of Bayesian Networks[R]. Redmond: Microsoft Research, 1997.

    Google Scholar 

  6. Bradley P S, Mangasarian O L, Street W N. Clustering via Concave Minimization[C]//Advances in Neural Information Processing System. Cambridge: MIT Press, 1997: 368–374.

    Google Scholar 

  7. Bradley P S, Fayyad U M. Refining Initial Points for K-Means Clustering[C]//Proceedings of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1998:91–99.

    Google Scholar 

  8. Penã J M, Lozano J A, Larrañaga P. An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm[J]. Pattern Recognition Letters, 1999, 20:1027–1040.

    Article  Google Scholar 

  9. Su T. Another Look at Non-Random Methods for Initializing K-Means Clustering[D]. Boston: Northeastern University, 2003.

    Google Scholar 

  10. Forgy E. Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications[J]. Biometrics, 1965, 21(3): 768.

    Google Scholar 

  11. Hochbaum D, Shmoys D B. A Best Possible Heuristic for the K-Center Problem[J]. Mathematics of Operations Research, 1985, 10(2): 180–184.

    Article  MATH  MathSciNet  Google Scholar 

  12. Greene D, Cunningham P. Producing Accurate Interpretable Clusters from High-Dimensional Data[C]//Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer-Verlag, 2005: 486–494.

    Google Scholar 

  13. Blake C L, Merz C J. UCI Repository of Machine Learning Database [EB/OL]. [2008-03-15]. htpp://www.ics.uci. edu/MLRepository.html.

  14. Korn F, Muthukrishnan S. Influence Sets Based on Reverse Nearest Neighbor Queries[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000: 201–212.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baowen Xu.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086), the Natural Science Foundation of Jiangsu Province (BK2006094), the Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University (KJS0714) and the Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Xu, B., Zhang, W. et al. Stable initialization scheme for K-means clustering. Wuhan Univ. J. Nat. Sci. 14, 24–28 (2009). https://doi.org/10.1007/s11859-009-0106-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-009-0106-z

Key words

CLC number

Navigation