Extended K-means with an Efficient Estimation of the Number of Clusters
We present a non-hierarchal clustering algorithm that can determine the optimal number of clusters by using iterations of k-means and a stopping rule based on BIC. The procedure requires twice the computation of k-means. However, with no prior information about the number of clusters, our method is able to get the optimal clusters based on information theory instead of on a heuristic method.
Unable to display preview. Download preview PDF.
- 1.Huang, Zhexue: Extension to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values, Data Mining and Knowledge Discovery, 2 (1998) 283–304.Google Scholar
- 4.MacQueen, J.B.: “Some methods for Classi_cation and Analysis of Multivariate Observations,” Proc. Symp. Math. Statist. and Probability, 5th Berkeley, 1 (1967) 281–297.Google Scholar
- 5.Pelleg, Dan and Andrew Moore: X-means: Extending K-means with Efficient Estimation of the Number of Clusters, ICML-2000 (2000).Google Scholar
- 6.Pelleg, Dan and Andrew Moore: Accelerating Exact k-means Algorithms with Geometric Reasoning, KDD-99 (1999).Google Scholar
- 7.Schwarz, G.: Estimating the dimension of a model, Ann. Statist., 6-2: (1978) 461–464.Google Scholar
- 8.Vesanto, Juha and Johan Himberg and Esa Alhoniemi and Juha Parhankangas: Self-Organizing Map in Matlab: the SOM Toolbox, Proceedings of the Matlab DSP Conference 1999, Espoo, Finland, November (1999) 35–40.Google Scholar
- 9.Yang, Ming-Hsuan and Narenda Ahuja: A Data Partition Method for Parallel Self-Organizing Map, Proceeding of the 1999 IEEE International Joint Conference on Neural Networks (IJCNN 99), Washington DC, July (1999).Google Scholar