Abstract
This paper presents a new method for effectively selecting initial cluster centers in k-means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering (NBC) algorithm is used to search for high density neighborhoods. The new clustering algorithm NK-means integrates NBC into the k-means clustering process to improve the performance of the k-means algorithm while preserving the k-means efficiency. NBC is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random k-means and the refinement k-means algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jain, A., Murty, M., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Berkhin, P.: Survey of clustering data mining techniques. In: Technical Report, Accrue Software, Inc. (2002)
Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1, 144–146 (1994)
Pena, J., Lozano, J., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)
Tou, J., Gonzalez, R.: Pattern recognition principles. Addison- Wesley, Massachusetts (1974)
Bradley, P., Fayyad, U.: Refining initial points for kmeans clustering. In: Proceedings of 15th International Conference on Machine Learning (1998)
Meila, M., Heckerman, D.: An experimental comparison of several clustering and initialization methods. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (1998)
He, J., Lan, M., Tan, C., Sung, S., Low, H.: Initialization of cluster refinement algorithms: A review and comparative study. In: Proceedings of International Joint Conference on Neural Networks (2004)
Kaufman, L.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)
Zhou, S., Zhao, Y., Guan, J., Huang, J.: Nbc: A neighborhood based clustering algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 361–371. Springer, Heidelberg (2005)
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 1998 Int. Conf. Knowledge Discovery and Data Mining, KDD 1996 (1996)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ye, Y., Huang, J.Z., Chen, X., Zhou, S., Williams, G., Xu, X. (2006). Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_23
Download citation
DOI: https://doi.org/10.1007/11731139_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)