BK-means Algorithm with Minimal Performance Degradation Caused by Improper Initial Centroid
K-means algorithm has the performance degradation problem due to improper initial centroids. In order to solve the problem, we suggest BK-means (Balanced K-means) algorithm to cluster documents. This algorithm uses the value, α, to adjust each cluster weight which is first defined in this paper. We compared the algorithm to the general K-means algorithms on Reutor-21578. The experimental results show about 11% higher performance than that of the general K-means algorithm with the balanced F Measure (BFM).
KeywordsClustering Information Retrieval K-means BK-means Outlier
Unable to display preview. Download preview PDF.
- 1.Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics Philadelphia (2007)Google Scholar
- 2.Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)Google Scholar
- 3.Liu, T., Liu, S., Chen, Z., Ma, W.Y.: An evaluation on feature selection for text clustering. In: Proceedings of the Twentieth International Conference, pp. 488–495. AAAI Press, Washington, DC (2003)Google Scholar