Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering

Ye, Yunming; Huang, Joshua Zhexue; Chen, Xiaojun; Zhou, Shuigeng; Williams, Graham; Xu, Xiaofei

doi:10.1007/11731139_23

Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering

Yunming Ye²²,
Joshua Zhexue Huang²³,
Xiaojun Chen²²,
Shuigeng Zhou²⁴,
Graham Williams²⁵ &
…
Xiaofei Xu²²

Conference paper

3191 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Abstract

This paper presents a new method for effectively selecting initial cluster centers in k-means clustering. This method identifies the high density neighborhoods from the data first and then selects the central points of the neighborhoods as initial centers. The recently published Neighborhood-Based Clustering (NBC) algorithm is used to search for high density neighborhoods. The new clustering algorithm NK-means integrates NBC into the k-means clustering process to improve the performance of the k-means algorithm while preserving the k-means efficiency. NBC is enhanced with a new cell-based neighborhood search method to accelerate the search for initial cluster centers. A merging method is employed to filter out insignificant initial centers to avoid too many clusters being generated. Experimental results on synthetic data sets have shown significant improvements in clustering accuracy in comparison with the random k-means and the refinement k-means algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A., Murty, M., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
Berkhin, P.: Survey of clustering data mining techniques. In: Technical Report, Accrue Software, Inc. (2002)
Google Scholar
Katsavounidis, I., Kuo, C., Zhang, Z.: A new initialization technique for generalized lloyd iteration. IEEE Signal Processing Letters 1, 144–146 (1994)
Article Google Scholar
Pena, J., Lozano, J., Larranaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)
Article Google Scholar
Tou, J., Gonzalez, R.: Pattern recognition principles. Addison- Wesley, Massachusetts (1974)
MATH Google Scholar
Bradley, P., Fayyad, U.: Refining initial points for kmeans clustering. In: Proceedings of 15th International Conference on Machine Learning (1998)
Google Scholar
Meila, M., Heckerman, D.: An experimental comparison of several clustering and initialization methods. In: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (1998)
Google Scholar
He, J., Lan, M., Tan, C., Sung, S., Low, H.: Initialization of cluster refinement algorithms: A review and comparative study. In: Proceedings of International Joint Conference on Neural Networks (2004)
Google Scholar
Kaufman, L.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)
Book MATH Google Scholar
Zhou, S., Zhao, Y., Guan, J., Huang, J.: Nbc: A neighborhood based clustering algorithm. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS, vol. 3518, pp. 361–371. Springer, Heidelberg (2005)
Chapter Google Scholar
Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 1998 Int. Conf. Knowledge Discovery and Data Mining, KDD 1996 (1996)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability (1967)
Google Scholar

Download references

Author information

Authors and Affiliations

Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, 518055, China
Yunming Ye, Xiaojun Chen & Xiaofei Xu
E-Business Technology Institute, University of Hong Kong, Pokfulam Road, Hong Kong
Joshua Zhexue Huang
Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, China
Shuigeng Zhou
Australian Taxation Office, Australia
Graham Williams

Authors

Yunming Ye
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Zhexue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shuigeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Graham Williams
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Nanyang Technological University, Singapore
Wee-Keong Ng
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Computer Engineering, Nanyang Technological University, 639798, Singapore, Singapore
Kuiyu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ye, Y., Huang, J.Z., Chen, X., Zhou, S., Williams, G., Xu, X. (2006). Neighborhood Density Method for Selecting Initial Cluster Centers in K-Means Clustering. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_23

Download citation

DOI: https://doi.org/10.1007/11731139_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics