Advertisement

A Method to Estimate the Number of Clusters Using Gravity

  • Hui DuEmail author
  • Xiaoniu Wang
  • Mengyin Huang
  • Xiaoli Wang
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 891)

Abstract

The number of clusters is crucial to the correctness of the clustering. However, most available clustering algorithms have two main issues: (1) they need to specify the number of clusters by users; (2) they are easy to fall into local optimum because the selection of initial centers is random. To solve these problems, we propose a novel algorithm using gravity for auto determining the number of clusters, and this method can obtain the better initial centers. In the proposed algorithm, we firstly scatter some detectors on the data space uniformly and they can be moved according to the law of universal gravitation, and two detectors can be merged when the distance between them less than a given threshold. When all detectors no longer move, we take the number of detectors as the number of the clusters. Then, we utilize the finally obtained detectors as the initial center points. Finally, the experimental results show that the proposed method can automatically determine the number of clusters and generate better initial centers, thus the clustering accuracy is improved observably.

Keywords

Clustering Number of clusters Initial centers Gravity Detector 

Notes

Acknowledgment

This work is supported by the National Natural Science Foundation of China (No. 61472297 and No. 61402350 and No. 61662068).

References

  1. 1.
    Pena, J.M., Lozano, J.A., Larranaga, P.: An empirical comparison of four initialization methods for the K-means algorithm. Pattern Recognit. Lett. 20, 1027–1040 (1999)CrossRefGoogle Scholar
  2. 2.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, pp. 281–297. University of California Press (1967)Google Scholar
  3. 3.
    Estivill, C.V., Yang, J.: Fast and robust general purpose clustering algorithms. Data Min. Knowl. Discov. 8(2), 127–150 (2004)Google Scholar
  4. 4.
    Muchun, S.U., Chienhsing, C.H.O.U.: A modified version of the K-means algorithm with a distance based on cluster symmetry. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 674–680 (2001)CrossRefGoogle Scholar
  5. 5.
    Likas, A., Vlassis, M., Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2003)CrossRefGoogle Scholar
  6. 6.
    D’Urso, P., Giordani, P.: A robust fuzzy k-means clustering model for interval valued data. Comput. Stat. 21(2), 251–269 (2006)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chunsheng, H.U.A., Qian, C.H.E.N., et al.: RK-means clustering: K-means with reliability. IEICE Trans. Inf. Syst. E91D(1), 96–104 (2008)Google Scholar
  8. 8.
    Timmerman, M.E., Ceulemans, E., et al.: Subspace K-means clustering. Behav. Res. Methods 45(4), 1011–1023 (2013)CrossRefGoogle Scholar
  9. 9.
    Pelleg, D., Moore, A.: X-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734 (2000)Google Scholar
  10. 10.
    Hamerly, G., Elkan, C.: Learning the k in k-means. In: Proceedings of the 17th Annual Conference on Neural Information Processing Systems, pp. 281–288 (2003)Google Scholar
  11. 11.
    Fujita, A., Takahashi, D.Y., Patriota, A.G.: A non-parametric method to estimate the number of clusters. Comput. Stat. Data Anal. 73, 27–39 (2014)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Kolesnikov, A., Trichina, E., Kauranne, T.: Estimating the number of clusters in a numerical data set via quantization error modeling. Pattern Recognit. 48(3), 941–952 (2015)CrossRefGoogle Scholar
  13. 13.
    Tzortzis Likas, G.A.: The MinMax k-means clustering algorithm. Pattern Recognit. 47, 2505–2516 (2014)CrossRefGoogle Scholar
  14. 14.
    Fang, K.T., Shiu, W.C., Pan, J.X.: Uniform designs based on Latin squares. Stat. Sin. 9(3), 905–912 (1999)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Fang, K.T., Wang, Y.: Number-Theoretic Methods in Statistics. Chapman and Hall, London (1994)CrossRefGoogle Scholar
  16. 16.
    Zhang, L., Liang, Y., Jiang, J., Yu, R., Fang, K.T.: Uniform designs applied to nonlinear multivariate calibration by ANN. Anal. Chim. Acta 370(1), 65–77 (1998)CrossRefGoogle Scholar
  17. 17.
    Shang, F.H., Jiao, L.C.: Fast affinity propagation clustering: a multilevel approach. Pattern Recognit. 45, 474–486 (2012)CrossRefGoogle Scholar
  18. 18.
  19. 19.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intel. (PAMI) 1, 224–227 (1979)CrossRefGoogle Scholar
  20. 20.
  21. 21.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Hui Du
    • 1
    Email author
  • Xiaoniu Wang
    • 1
  • Mengyin Huang
    • 1
  • Xiaoli Wang
    • 1
  1. 1.College of Computer Science and EngineeringNorthwest Normal UniversityLanzhouChina

Personalised recommendations