On Careful Selection of Initial Centers for K-means Algorithm
K-means clustering algorithm is rich in literature and its success stems from simplicity and computational efficiency. The key limitation of K-means is that its convergence depends on the initial partition. Improper selection of initial centroids may lead to poor results. This paper proposes a method known as Deterministic Initialization using Constrained Recursive Bi-partitioning (DICRB) for the careful selection of initial centers. First, a set of probable centers are identified using recursive binary partitioning. Then, the initial centers for K-means algorithm are determined by applying a graph clustering on the probable centers. Experimental results demonstrate the efficacy and deterministic nature of the proposed method.
KeywordsClustering K-means algorithm Initialization Bi-partitioning
- 2.Han, J., Kamber, M.: Data Mining, Southeast Asia Edition: Concepts and Techniques. Morgan Kaufmann, Los Altos (2006)Google Scholar
- 5.Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. ICML 98, 91–99 (1998)Google Scholar
- 7.Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)Google Scholar
- 9.Ting, S., Jennifer, D.G.: In search of deterministic methods for initializing k-means and gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)Google Scholar
- 15.Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2), 107–145 (2001) Google Scholar