Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm
K-means algorithm, the most classic partition-based clustering method, has its disadvantages. If there are outliers in the data sets, the K-means algorithm may lead to serious deviation of the mean value. In addition, random initialization is very sensitive to the input data parameters. In this paper, we propose initialization and outlier detection based on distance and density for the K-means algorithm (KMIDDO), an improvement method to optimize the initial center points, especially it has more effective in the case of outliers. What’s more, we extend an outlier detection method to improve the clustering effect. We hope the distance between every two center points is as far as possible and the density of the center points are as large as they can. In terms of initialization, we calculate the distance and density of points. In the outliers detection, we take the outliers as a single class based on the distance and density. Experiments are conducted to illustrate the effectiveness and accuracy of the proposed algorithms on several synthetic and real datasets.
KeywordsK-means Outlier detection Initial center points Clustering
This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61702218, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404.
- 1.Wang, J., Ke, Q., Li, S., Wang, J.: Approximate k-means via cluster closures (2017)Google Scholar
- 2.Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: International Conference on Future Information Technology and Management Engineering, pp. 476–480 (2010)Google Scholar
- 4.Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)Google Scholar
- 7.Ai, H., Li, W.: K-means initial clustering center optimal algorithm based on estimating density and refining initial. In: Information Science and Service Science and Data Mining, pp. 603–606 (2013)Google Scholar
- 9.Li, X., Lv, J., Li, L., Ao, F.: An angle and density-based method for key points detection. In: International Joint Conference on Neural Networks, pp. 3682–3688 (2016)Google Scholar
- 11.Suleman, A.: Assessing a Fuzzy Extension of Rand Index and Related Measures. IEEE Press (2017)Google Scholar
- 12.Coelho, G.P., Barbante, C.C., Boccato, L., Attux, R.R.F., Oliveira, J.R., Von Zuben, F.J.: Automatic feature selection for BCI: an analysis using the davies-bouldin index and extreme learning machines. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)Google Scholar
- 13.Chawla, S., Gionis, A.: K-means-: A unified approach to clustering and outlier detection (2013)Google Scholar