A new initialization and performance measure for the rough k-means clustering

Abstract

A new initialization algorithm is proposed in this study to address the issue of random initialization in the rough k-means clustering algorithm refined by Peters. A new means to choose appropriate zeta values in Peters algorithm is proposed. Also, a new performance measure S/O [within-variance (S)/total-variance (O)] index has been introduced for the rough clustering algorithm. The performance criteria such as root-mean-square standard deviation, S/O index, and running time complexity are used to validate the performance of the proposed and random initialization with that of Peters. In addition, other popular initialization algorithms like k-means++, Peters Π, Bradley, and Ioannis are also herein compared. It is found that our proposed initialization algorithm has performed better than the existing initialization algorithms with Peters refined rough k-means clustering algorithm on different datasets with varying zeta values.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Arthur D, Vassilvitskii S (2007) K-Means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, pp 1027–1025. https://doi.org/10.1145/1283383.1283494

  2. Bhargava R, Tripathy BK, Tripathy A et al (2013) Rough intuitionistic fuzzy C-means algorithm and a comparative analysis. In: Compute 2013—6th ACM India computing convention: next generation computing paradigms and technologies

  3. Bradley PS, Fayyad UM (1998) Refining initial points for K-means clustering. In: ICML proceedings of the fifteenth international conference on machine learning, 24–27 July 1998, pp 91–99. ISBN:1-55860-556-8

  4. Bubeck S, Meila M, von Luxburg U (2009) How the initialization affects the stability of the k-means algorithm. ESAIM Probab Stat 16:436–452. https://doi.org/10.1051/ps/2012013

    MathSciNet  Article  MATH  Google Scholar 

  5. Darken C, Moody J (1990) Fast adaptive k-means clustering: some empirical results. In: IJCNN international joint conference on neural networks, San Diego, CA, USA, 17–21 June 1990, pp 233–238. https://doi.org/10.1109/IJCNN.1990.137720

  6. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell PAMI-1:224–227. https://doi.org/10.1109/TPAMI.1979.4766909

    Article  Google Scholar 

  7. Deng W, Zhao H, Yang X et al (2017a) Study on an improved adaptive PSO algorithm for solving multi-objective gate assignment. Appl Soft Comput J 59:288–302. https://doi.org/10.1016/j.asoc.2017.06.004

    Article  Google Scholar 

  8. Deng W, Zhao H, Zou L et al (2017b) A novel collaborative optimization algorithm in solving complex optimization problems. Soft Comput 21:4387–4398. https://doi.org/10.1007/s00500-016-2071-8

    Article  Google Scholar 

  9. Deng W, Xu J, Zhao H (2019) An improved ant colony optimization algorithm based on hybrid strategies for scheduling problem. IEEE Access 7:20281–20292. https://doi.org/10.1109/ACCESS.2019.2897580

    Article  Google Scholar 

  10. Fisher RA (1954) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188

    Article  Google Scholar 

  11. Forgy CL (1982) Rete: a fast algorithm for the many patterns/many object pattern match problem. Artif Intell 19:17–37. https://doi.org/10.1016/0004-3702(82)90020-0

    Article  Google Scholar 

  12. Gonzalez F (1985) Clustering to minimize intercluster distance. Theor Comput Sci 38:293–306

    MathSciNet  Article  Google Scholar 

  13. Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods. ACM SIGMOD Rec 31:19. https://doi.org/10.1145/601858.601862

    Article  MATH  Google Scholar 

  14. Han J, Kamber M, Pei J (2012) Data mining: concepts and techniques. Morgan Kaufmann Publishers, Waltham

    MATH  Google Scholar 

  15. Hu J, Li T, Wang H, Fujita H (2016) Hierarchical cluster ensemble model based on knowledge granulation. Knowl Based Syst 91:179–188. https://doi.org/10.1016/j.knosys.2015.10.006

    Article  Google Scholar 

  16. Jain AK, Dubes C (1988) Algorithms for clustering data_Jain.pdf. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  17. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323. https://doi.org/10.1145/331499.331504

    Article  Google Scholar 

  18. Katsavounidis I, Kuo CCJ, Zhang Z (1994) A new initialization technique for generalized Lloyd iteration. IEEE Signal Process Lett 1:144–146. https://doi.org/10.1109/97.329844

    Article  Google Scholar 

  19. Kim EH, Oh SK, Pedrycz W (2018) Design of reinforced interval Type-2 fuzzy C-means-based fuzzy classifier. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2017.2785244

    Article  Google Scholar 

  20. Lingras P, Peters G (2011) Rough clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1:64–72. https://doi.org/10.1002/widm.16

    Article  Google Scholar 

  21. Lingras P, Triff M (2016) Advances in rough and soft clustering: meta-clustering, dynamic clustering, data-stream clustering. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). 9920 LNAI, pp 3–22. https://doi.org/10.1007/978-3-319-47160-0_1

  22. Lingras P, West C (2004) Interval set clustering of web users with rough K-means. J Intell Inf Syst 23:5–16. https://doi.org/10.1023/B:JIIS.0000029668.88665.1a

    Article  MATH  Google Scholar 

  23. Lord E, Willems M, Lapointe FJ, Makarenkov V (2017) Using the stability of objects to determine the number of clusters in datasets. Inf Sci (NY) 393:29–46. https://doi.org/10.1016/j.ins.2017.02.010

    Article  Google Scholar 

  24. Maji P, Pal SK (2007) Rough set based generalized fuzzy C-means algorithm and quantitative indices. IEEE Trans Syst Man Cybern Part B 37:1529–1540. https://doi.org/10.1109/TSMCB.2007.906578

    Article  Google Scholar 

  25. Mitra S, Banka H (2007) Application of rough sets in pattern recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). pp 151–169

  26. Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356. https://doi.org/10.1007/BF01001956

    Article  MATH  Google Scholar 

  27. Pawlak Z, Skowron A (2007) Rough sets: some extensions 177:28–40. https://doi.org/10.1016/j.ins.2006.06.006

    Article  Google Scholar 

  28. Peters G (2006) Some refinements of rough k-means clustering. Pattern Recognit 39:1481–1491. https://doi.org/10.1016/j.patcog.2006.02.002

    Article  MATH  Google Scholar 

  29. Peters G (2014) Rough clustering utilizing the principle of indifference. Inf Sci (NY) 277:358–374. https://doi.org/10.1016/j.ins.2014.02.073

    MathSciNet  Article  Google Scholar 

  30. Peters G (2015) Is there any need for rough clustering? Pattern Recognit Lett 53:31–37. https://doi.org/10.1016/j.patrec.2014.11.003

    Article  Google Scholar 

  31. Peters G, Crespo F, Lingras P, Weber R (2013) Soft clustering—fuzzy and rough approaches and their extensions and derivatives. Int J Approx Reason 54:307–322. https://doi.org/10.1016/j.ijar.2012.10.003

    MathSciNet  Article  Google Scholar 

  32. Stetco A, Zeng XJ, Keane J (2015) Fuzzy C-means++: fuzzy C-means with effective seeding initialization. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2015.05.014

    Article  Google Scholar 

  33. Su T, Dy JG (2006) In search of deterministic methods for initializing K-means and Gaussian mixture clustering. Intell Data Anal 11:319–338. https://doi.org/10.3233/ida-2007-11402

    Article  Google Scholar 

  34. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353. https://doi.org/10.1016/S0019-9958(65)90241-X

    Article  MATH  Google Scholar 

  35. Zhang K (2019) A three-way c-means algorithm. Appl Soft Comput J. https://doi.org/10.1016/j.asoc.2019.105536

    Article  Google Scholar 

  36. Zhang T, Ma F, Yue D et al (2019) Interval Type-2 fuzzy local enhancement based rough k-means clustering considering imbalanced clusters. IEEE Trans Fuzzy Syst. https://doi.org/10.1109/tfuzz.2019.2924402

    Article  Google Scholar 

  37. Zhao H, Liu H, Xu J, Deng W (2019a) Performance prediction using high-order differential mathematical morphology gradient spectrum entropy and extreme learning machine. IEEE Trans Instrum Meas. https://doi.org/10.1109/TIM.2019.2948414

    Article  Google Scholar 

  38. Zhao H, Zheng J, Xu J, Deng W (2019b) Fault diagnosis method based on principal component analysis and broad learning system. IEEE Access 7:99263–99272. https://doi.org/10.1109/access.2019.2929094

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Punniyamoorthy Murugesan.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Murugesan, V.P., Murugesan, P. A new initialization and performance measure for the rough k-means clustering. Soft Comput 24, 11605–11619 (2020). https://doi.org/10.1007/s00500-019-04625-9

Download citation

Keywords

  • Rough k-means
  • Zeta values
  • Davies–Bouldin index
  • Root-mean-square standard deviation