Stable initialization scheme for K-means clustering

Xu, Junling; Xu, Baowen; Zhang, Weifeng; Zhang, Wei; Hou, Jun

doi:10.1007/s11859-009-0106-z

Stable initialization scheme for K-means clustering

Published: 21 January 2009

Volume 14, pages 24–28, (2009)
Cite this article

Wuhan University Journal of Natural Sciences

Junling Xu¹,
Baowen Xu^1,2,
Weifeng Zhang³,
Wei Zhang¹ &
…
Jun Hou¹

212 Accesses
13 Citations
Explore all metrics

Abstract

Though K-means is very popular for general clustering, its performance which generally converges to numerous local minima depends highly on initial cluster centers. In this paper a novel initialization scheme to select initial cluster centers for K-means clustering is proposed. This algorithm is based on reverse nearest neighbor (RNN) search which retrieves all points in a given data set whose nearest neighbor is a given query point. The initial cluster centers computed using this methodology are found to be very close to the desired cluster centers for iterative clustering algorithms. This procedure is applicable to clustering algorithms for continuous data. The application of proposed algorithm to K-means clustering algorithm is demonstrated. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

A Comprehensive Survey of Anomaly Detection Algorithms

Article 26 November 2021

Data clustering: application and trends

Article 27 November 2022

References

MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1967: 281–297.
Google Scholar
Fukunaga K. Introduction to Statistical Pattern Recognition [M]. San Diego: Academic Press, 1990.
Google Scholar
Jain A K, Dubes R C. Algorithms for Clustering Data[M]. Englewood Cliffs: Prentice Hall, 1988.
Google Scholar
Duda R O, Hart P E. Pattern Classification and Scene Analysis [M]. New York: John Wiley and Sons, 1973.
Google Scholar
Thiesson B, Meck C, Chickering D M, et al. Learning Mixtures of Bayesian Networks[R]. Redmond: Microsoft Research, 1997.
Google Scholar
Bradley P S, Mangasarian O L, Street W N. Clustering via Concave Minimization[C]//Advances in Neural Information Processing System. Cambridge: MIT Press, 1997: 368–374.
Google Scholar
Bradley P S, Fayyad U M. Refining Initial Points for K-Means Clustering[C]//Proceedings of the 15th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1998:91–99.
Google Scholar
Penã J M, Lozano J A, Larrañaga P. An Empirical Comparison of Four Initialization Methods for the K-Means Algorithm[J]. Pattern Recognition Letters, 1999, 20:1027–1040.
Article Google Scholar
Su T. Another Look at Non-Random Methods for Initializing K-Means Clustering[D]. Boston: Northeastern University, 2003.
Google Scholar
Forgy E. Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications[J]. Biometrics, 1965, 21(3): 768.
Google Scholar
Hochbaum D, Shmoys D B. A Best Possible Heuristic for the K-Center Problem[J]. Mathematics of Operations Research, 1985, 10(2): 180–184.
Article MATH MathSciNet Google Scholar
Greene D, Cunningham P. Producing Accurate Interpretable Clusters from High-Dimensional Data[C]//Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer-Verlag, 2005: 486–494.
Google Scholar
Blake C L, Merz C J. UCI Repository of Machine Learning Database [EB/OL]. [2008-03-15]. htpp://www.ics.uci. edu/MLRepository.html.
Korn F, Muthukrishnan S. Influence Sets Based on Reverse Nearest Neighbor Queries[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000: 201–212.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, 211189, Jiangsu, China
Junling Xu, Baowen Xu, Wei Zhang & Jun Hou
State Key Laboratory of Software Engineering, Wuhan University, Wuhan, 430072, Hubei, China
Baowen Xu
Department of Computer, Nanjing University of Posts and Telecommunications, Nanjing, 210003, Jiangsu, China
Weifeng Zhang

Authors

Junling Xu
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baowen Xu.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (60503020, 60503033, 60703086), the Natural Science Foundation of Jiangsu Province (BK2006094), the Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University (KJS0714) and the Research Foundation of Nanjing University of Posts and Telecommunications (NY207052, NY207082)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, J., Xu, B., Zhang, W. et al. Stable initialization scheme for K-means clustering. Wuhan Univ. J. Nat. Sci. 14, 24–28 (2009). https://doi.org/10.1007/s11859-009-0106-z

Download citation

Received: 18 February 2008
Published: 21 January 2009
Issue Date: February 2009
DOI: https://doi.org/10.1007/s11859-009-0106-z

Key words

CLC number

TP 391

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stable initialization scheme for K-means clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A Comprehensive Survey of Anomaly Detection Algorithms

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

Stable initialization scheme for K-means clustering

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

A Comprehensive Survey of Anomaly Detection Algorithms

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation