Advertisement

Cluster Center Initialization and Outlier Detection Based on Distance and Density for the K-Means Algorithm

  • Qi He
  • Zhenxiang ChenEmail author
  • Ke Ji
  • Lin Wang
  • Kun Ma
  • Chuan Zhao
  • Yuliang Shi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)

Abstract

K-means algorithm, the most classic partition-based clustering method, has its disadvantages. If there are outliers in the data sets, the K-means algorithm may lead to serious deviation of the mean value. In addition, random initialization is very sensitive to the input data parameters. In this paper, we propose initialization and outlier detection based on distance and density for the K-means algorithm (KMIDDO), an improvement method to optimize the initial center points, especially it has more effective in the case of outliers. What’s more, we extend an outlier detection method to improve the clustering effect. We hope the distance between every two center points is as far as possible and the density of the center points are as large as they can. In terms of initialization, we calculate the distance and density of points. In the outliers detection, we take the outliers as a single class based on the distance and density. Experiments are conducted to illustrate the effectiveness and accuracy of the proposed algorithms on several synthetic and real datasets.

Keywords

K-means Outlier detection Initial center points Clustering 

Notes

Acknowledgment

This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61702218, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404.

References

  1. 1.
    Wang, J., Ke, Q., Li, S., Wang, J.: Approximate k-means via cluster closures (2017)Google Scholar
  2. 2.
    Zhou, Y., Yu, H., Cai, X.: A novel k-means algorithm for clustering and outlier detection. In: International Conference on Future Information Technology and Management Engineering, pp. 476–480 (2010)Google Scholar
  3. 3.
    Xu, J., Han, J., Nie, F., Li, X.: Re-weighted discriminatively embedded \(k\)-means for multi-view clustering. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 26(6), 3016–3027 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ott, L., Pang, L., Ramos, F., Chawla, S.: On integrated clustering and outlier detection. In: Advances in Neural Information Processing Systems, pp. 1359–1367 (2014)Google Scholar
  5. 5.
    Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recognit. 71, 375–386 (2017)CrossRefGoogle Scholar
  6. 6.
    Jiang, F., Liu, G., Junwei, D., Sui, Y.: Initialization of k-modes clustering using outlier detection techniques. Inf. Sci. 332, 167–183 (2016)CrossRefGoogle Scholar
  7. 7.
    Ai, H., Li, W.: K-means initial clustering center optimal algorithm based on estimating density and refining initial. In: Information Science and Service Science and Data Mining, pp. 603–606 (2013)Google Scholar
  8. 8.
    Gan, G., Chen, K.: A soft subspace clustering algorithm with log-transformed distances. Big Data Inf. Anal. 1(1), 93–109 (2015)CrossRefGoogle Scholar
  9. 9.
    Li, X., Lv, J., Li, L., Ao, F.: An angle and density-based method for key points detection. In: International Joint Conference on Neural Networks, pp. 3682–3688 (2016)Google Scholar
  10. 10.
    Gan, G., Ng, K.P.: K-means Clustering with Outlier Removal. Elsevier Science Inc., New York (2017)CrossRefGoogle Scholar
  11. 11.
    Suleman, A.: Assessing a Fuzzy Extension of Rand Index and Related Measures. IEEE Press (2017)Google Scholar
  12. 12.
    Coelho, G.P., Barbante, C.C., Boccato, L., Attux, R.R.F., Oliveira, J.R., Von Zuben, F.J.: Automatic feature selection for BCI: an analysis using the davies-bouldin index and extreme learning machines. In: International Joint Conference on Neural Networks, pp. 1–8 (2012)Google Scholar
  13. 13.
    Chawla, S., Gionis, A.: K-means-: A unified approach to clustering and outlier detection (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Qi He
    • 1
    • 2
  • Zhenxiang Chen
    • 1
    • 2
    Email author
  • Ke Ji
    • 1
    • 2
  • Lin Wang
    • 1
    • 2
  • Kun Ma
    • 1
    • 2
  • Chuan Zhao
    • 1
    • 2
  • Yuliang Shi
    • 3
  1. 1.University of JinanJinanChina
  2. 2.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingJinanChina
  3. 3.Shandong UniversityJinanChina

Personalised recommendations