Soft Computing

, Volume 22, Issue 6, pp 1921–1931 | Cite as

A novel cluster validity index for fuzzy C-means algorithm

  • Shuling Yang
  • Kangshun Li
  • Zhengping Liang
  • Wei Li
  • Yu Xue
Methodologies and Application


To overcome the main problem of the cluster number in many clustering applications, a new clustering approach with improved morphology similarity distance and the novel cluster validity index is proposed in this paper. An optimized morphology similarity distance based on the Standard Euclidean distance and ReliefF algorithm is used to create a new validity index, which can balance the intra-cluster consistency and inter-cluster consistency. The proposed validity index is combined with fuzzy C-means to produce a creative algorithm simply named the OMS-OSC algorithm. Experimental results obtained using different artificial data sets and real-world data sets show that the new algorithm can not only yield good performance but also detect the correct cluster number.


Clustering applications Optimized morphology similarity distance New validity index Fuzzy C-means Cluster number 



This work is supported by the National Natural Science Foundation of China with the Grant Nos. 61573157, 61561024 and 61562038, the Fund of Natural Science Foundation of Guangdong Province of China with the Grant No. 2014A030313454, the Key Project of Natural Statistical Science and Research with the Grant No. 2015LZ30.

Compliance with ethical standards

Conflict of interest

The authors declares that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.


  1. Bezdek JC (1974) Numerical taxonomy with fuzzy sets. J Cybern 1(1):57–71MathSciNetzbMATHGoogle Scholar
  2. Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern 28:301–315CrossRefGoogle Scholar
  3. Bezdek R (2010) A cluster validity index for fuzzy clustering. Fuzzy Sets Syst 161:3014–3025MathSciNetCrossRefzbMATHGoogle Scholar
  4. Cui HY, Xie MZ, Cai YL, Huang X, Liu YJ (2014) Cluster validity index for adaptive clustering algorithms. Inst Eng Technol 8(13):2256–2263Google Scholar
  5. Cui LZ, Li GH, Lin QZ, Chen JY, Lu N (2016) Adaptive differential evolution algorithm with novel mutation strategies in multiple sub-populations. Comput Oper Res 67:155–173MathSciNetCrossRefzbMATHGoogle Scholar
  6. Ester M, Kriegel H, Sander J, Xu X (1996) On knowledge discovery and data mining. In: 2nd international conference. ACM, pp 226–231Google Scholar
  7. Fu ZJ, Ren K, Shu JG, Sun XM, Huang FX (2015) Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans Parallel Distributed Sys 27:2546–2559CrossRefGoogle Scholar
  8. Fukuyama Y, Sugeno M (1989) A new method of choosing the number of clusters for fuzzy C-means method. In: Proceedings Of the 5th fuzzy system symposium, Japanese, pp 247–250Google Scholar
  9. Gu B, Sheng VS (2016) A robust regularization path algorithm for \(\nu \)-support vector classification. IEEE Trans Neural Netw Learn Syst 1:1–8Google Scholar
  10. Gu B, Sheng VS, Wang ZJ, Ho D, Osman S, Li S (2015) Incremental learning for \(\nu \)-support vector regression. Neural Netw 67:140–150CrossRefGoogle Scholar
  11. Gu B, Sun XM, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst 1:1–11Google Scholar
  12. Hinneburg A, Keim D (1998) An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, ACM, New York, pp 58–65Google Scholar
  13. Horiguchi Y, Suzuki T, Sawaragi T, Nakanishi H, Takimoto T (2016) Dominant pattern extraction from train driver’s eye-gaze data using Markov cluster algorithm. In: Joint 8th international conference on soft computing and intelligent systems and 17th international symposium on advanced intelligent systems, pp 116–122Google Scholar
  14. Kaufman L, Rousseeuw JP (1990) Finding groups in data: an introduction to cluster analysis. Wiley, HobokenCrossRefzbMATHGoogle Scholar
  15. Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43CrossRefGoogle Scholar
  16. Kim Y, Kim D, Lee D, Lee K (2004) A cluster validation index for GK cluster analysis based on relative degree of sharing. Inf Sci 168:225–242MathSciNetCrossRefzbMATHGoogle Scholar
  17. Kira K, Rendell LA (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, vol 48, pp 249–256Google Scholar
  18. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: ECML-94 Proceeding of the European conference on machine learning on machine learning. SpringerGoogle Scholar
  19. Kononenko I, Robnik-Sikonja M (2003) Theoretical and empirical analysis of ReliefF and RReliefF. In: Machine learning vol 53. Springer, pp 23–69Google Scholar
  20. Li B, Wang M, Li XL, Tan SQ, Huang JW (2015a) A strategy of clustering modification directions in spatial image steganography. IEEE Trans Inf Forensics Secur 10(9):1905–1917CrossRefGoogle Scholar
  21. Li H, Li C, Hu J, Fan XD (2015b) A resampling based clustering algorithm for replicated gene expression data. IEEE/ACM Trans Comput Biol Bioinform 12(6):1295–1303CrossRefGoogle Scholar
  22. Li J, Li XL, Yang B, Sun XM (2015c) Segmentation-based image copy–move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRefGoogle Scholar
  23. Li K, Zhang C, Chen Z, Chen Y (2014) Development of a weighted fuzzy C-means clustering algorithm based on JADE. Int J Numer Anal Model Ser B 5:113–122MathSciNetzbMATHGoogle Scholar
  24. Li Z, Yuan JS, Zhang WH (2009) Fuzzy C-mean algorithm with morphology similarity distance. In: Sixth international conference on fuzzy systems and knowledge discovery. pp 90–94Google Scholar
  25. Liang ZP, Sun JT, Lin QZ, Du ZH, Chen JY, Ming Z (2016) A novel multiple rule sets data classification algorithm based on ant colony algorithm. Appl Soft Comput 38:1000–1011CrossRefGoogle Scholar
  26. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Stat 1:281–297Google Scholar
  27. McDonough AL, Batavia M, Chen FC, Kwon S, Ziai J (2001) The validity and reliability of the GAITRite systems measurements: a preliminary evaluation. Arch Phys Med Rehabil 82:419–425CrossRefGoogle Scholar
  28. Pal NR, Bezdek JC (1995) On cluster validity for the fuzzy C-means model. IEEE Trans Fuzzy Syst 3(3):370–379CrossRefGoogle Scholar
  29. Raymond TN, Han JW (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th international conference on very large data bases. pp 144–155Google Scholar
  30. Roubens M (1978) Pattern classification problems and fuzzy sets. Fuzzy Sets Syst 1:239–253MathSciNetCrossRefzbMATHGoogle Scholar
  31. Saad MF, Adel MA (2012) Validity index and number of clusters. Int J Comput Sci Issues 9(1):52–57Google Scholar
  32. Wen ZW, Li RJ (2010) Fuzzy C-means clustering algorithm based on improved PSO. Appl Res Comput 27:2520–2522Google Scholar
  33. Xie JY, Hone K, Xie WX, Gao XB, Shi Y, Liu XH (2013) Extending twin support vector machine classifier for multi-category classification problems. Intell Data Anal 17(4):649–664Google Scholar
  34. Xie JY, Gao HC, Xie WX, Liu XH, Grant PW (2016) Robust clustering by detecting density peaks and assigning points based on fuzzy weighted \(k\)-nearest neighbors. Inf Sci 354:19–40CrossRefGoogle Scholar
  35. Xie XLL, Beni G (1991) A validity measure for fuzzy clustering. Trans Pattern Anal Mach Intell 13:841–847CrossRefGoogle Scholar
  36. Zhang Q, Yu SP, Zhou DS, Wei XP (2015) An efficient method of key-frame extraction based on a cluster algorithm. J Hum Kinet 39:5–13Google Scholar
  37. Zheng YH, Jeon B, Xu DH, Wu QM, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973Google Scholar
  38. Zhu CJ, Zhang Y (2012) Research of improved fuzzy C-mean clustering algorithm. J Henan Univ (Nat Sci) 42:92–95Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Shuling Yang
    • 1
  • Kangshun Li
    • 1
    • 2
  • Zhengping Liang
    • 3
  • Wei Li
    • 1
    • 4
  • Yu Xue
    • 5
  1. 1.College of Mathematics and InformaticsSouth China Agricultural UniversityGuangzhouChina
  2. 2.Shenzhen Saudi Statistician Company LimitedShenzhenChina
  3. 3.College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina
  4. 4.School of Information EngineeringJiangxi University of Science and TechnologyGanzhouChina
  5. 5.School of Computer and SoftwareNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations