Advertisement

A Method to Determine the Number of Clusters Based on Multi-validity Index

  • Ning Sun
  • Hong YuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11103)

Abstract

Cluster analysis is a method of unsupervised learning technology which is playing a more and more important role in data mining. However, one basic and difficult question for clustering is how to gain the number of clusters automatically. The traditional solution for the problem is to introduce a single validity index which may lead to failure because the index is bias to some specific condition. On the other hand, most of the existing clustering algorithms are based on hard partitioning which can not reflect the uncertainty of the data in the clustering process. To combat these drawbacks, this paper proposes a method to determine the number of clusters automatically based on three-way decision and multi-validity index which includes three parts: (1) the k-means clustering algorithm is devised to obtain the three-way clustering results; (2) multi-validity indexes are employed to evaluate the results and each evaluated result is weighed according to the mean similarity between the corresponding clustering result and the others based on the idea of the median partition in clustering ensemble; and (3) the comprehensive evaluation results are sorted and the best ranked k value is selected as the optional number of clusters. The experimental results show that the proposed method is better than the single evaluation method used in the fusion at determining the number of clusters automatically.

Keywords

Clustering Uncertainty Three-way decisions Number of clusters Multi-validity index 

Notes

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 61533020, 61751312 and 61379114.

References

  1. 1.
    Azimi, R., Ghayekhloo, M., Ghofrani, M., et al.: A novel clustering algorithm based on data transformation approaches. Expert Syst. Appl. Int. J. 76(C), 59–70 (2017)Google Scholar
  2. 2.
    Chen, H.P., Shen, X.J., Lv, Y.D.: A novel automatic fuzzy clustering algorithm based on soft partition and membership information. Neurocomputing 236, 104–112 (2016)Google Scholar
  3. 3.
    Cristofor, D., Simovici, D.: Finding median partitions using information-theoretical-based genetic algorithms. J. Univers. Comput. Sci. 8(2), 153–172 (2002)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. In: International Conference on Data Engineering, 2005, ICDE 2005. Proceedings. IEEE, pp. 341–352 (2005)Google Scholar
  5. 5.
    Huang, D., Wang, C., Lai, J., et al.: Clustering ensemble by decision weighting. JCAAI Trans. Intell. Syst. 11(3), 418–424 (2016)Google Scholar
  6. 6.
    Jaskowiak, P.A., Moulavi, D., Furtado, A.C.S.: On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016)CrossRefGoogle Scholar
  7. 7.
    Ling, H.L., Wu, J.S., Zhou, Y., et al.: How many clusters? A robust PSO-based local density model. Neurocomputing 207(C), 264–275 (2016)Google Scholar
  8. 8.
    Mok, P.Y., Huang, H.Q., Kwok, Y.L.: A robust adaptive clustering analysis method for automatic identification of clusters. Pattern Recogn. 45(8), 3017–3033 (2012)CrossRefGoogle Scholar
  9. 9.
    Naldi, M.C., Carvalho, A.C., Campello, R.J.: Cluster ensemble selection based on relative validity indexes. Data Min. Knowl. Discov. 27(2), 259–289 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Singhbiostatistics, V.: Ensemble clustering using semidefiniteprogramming. Mach. Learn. 79(1–2), 177–200 (2008)Google Scholar
  11. 11.
    Vega-Pons, S., Avesani, P.: On pruning the search space for clustering ensemble problems. Neurocomputing 150(1), 481–489 (2015)CrossRefGoogle Scholar
  12. 12.
    Yangtao, W., Lihui, C., Jianping, M.: Incremental fuzzy clustering with multiple medoids for large data. IEEE Trans. Fuzzy Syst. 22(6), 1557–1568 (2014)CrossRefGoogle Scholar
  13. 13.
    Wu, X., Kumar, V., Quinlan, J.R.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRefGoogle Scholar
  14. 14.
    Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pami 13(13), 841–847 (1991)CrossRefGoogle Scholar
  15. 15.
    Yao, Y.Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Yu, H., Liu, Z., Wang, G.: An automatic method to determine the number of clusters using decision-theoretic rough set. Int. J. Approximate Reasoning 55(1), 101–115 (2014)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Chongqing Key Laboratory of Computational IntelligenceChongqing University of Posts and TelecommunicationsChongqingPeople’s Republic of China

Personalised recommendations