Skip to main content

A Review of Cluster Validation with an Example of Type-2 Fuzzy Application in R

  • Chapter
  • First Online:
Book cover Advances in Type-2 Fuzzy Sets and Systems

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 301))

  • 1396 Accesses

Abstract

Interval valued type-2 fuzziness can be represented by means of membership functions obtained with upper and lower values of the level of fuzziness. These upper and lower values for the level of fuzziness in FCM algorithm were obtained in our previous studies. A particular application of Interval valued type-2 fuzziness is shown for cluster validity analysis in this chapter. For this purpose, we introduce a brief taxonomy for cluster validity indices to clarify the contribution of our novel approach. To provide reproducibility of our technique, the source code is written in freely available language ‘R’ and can be found on our web site.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ‘R’ SW can be downloaded from http://cran.r-project.org/ web site.

  2. 2.

    See http://cran.r-project.org/web/packages/cluster/index.html.

  3. 3.

    http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/e1071/html/cmeans.html.

  4. 4.

    Euclidian distance is defined as square root of \( \sum\limits_{i} {(x_{i} - y_{i} )^{2} } \) and Manhattan is \( \sum\limits_{i} {abs(x_{i} - y_{i} )} . \)

  5. 5.

    Kim et al. [27], used similar intuition and suggested that the optimal number of clusters can be found by minimizing the change in cluster center with respect to the number of clusters.

  6. 6.

    \( l_{\infty } \) norm is defined as \( \lim_{p \to \infty } l_{p} \) where \( l_{p} \) is p-norm. Since p-norm is given as \( \left\| {v_{c}^{{m_{u} }} - v_{c}^{{m_{l} }} } \right\|_{p} = \sum\limits_{i = 1}^{nv} {((v_{c,i}^{{m_{u} }} )^{p} - (v_{c,i}^{{m_{l} }} )^{p} )^{\frac{1}{p}} } \), hence, \( \lim_{p \to \infty } \left\| {v_{c}^{{m_{u} }} - v_{c}^{{m_{l} }} } \right\|_{\infty } = \max_{i = 1}^{nv} \left| {v_{c,i}^{{m_{u} }} - v_{c,i}^{{m_{l} }} } \right| \).

  7. 7.

    Iris data is available in R SW. Wine data can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/wine/ manually or by using R SW as it is shown in “iris and wine data ex.r” script file.

  8. 8.

    http://www.ics.uci.edu/~mlearn/databases.html

  9. 9.

    This software can be downloaded from http://cran.r-project.org/. There are also several good documents related to this statistical computing environment and there are more than 3500 packages prepared already.

References

  1. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17. World Scientific Publishing Co., New Jersey (2002)

    Google Scholar 

  2. Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. In: Brownstein, M.J., Kohodursky, A. (eds.) Methods in Molecular Biology, pp. 159–182 Humana Press, Clifton, (2003)

    Google Scholar 

  3. Bezdek, JC.: Fuzzy mathematics in pattern classification. Dissertation, Applied Mathematics Center, Cornell University, Ithaca, (1973)

    Google Scholar 

  4. Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybernet. 3, 58–72 (1974)

    MathSciNet  Google Scholar 

  5. Bezdek, JC.: Mathematical models for systematics and taxonomy. In: Estabrook, G. (ed.) Proceeding of 8th International Conference on Numerical Taxonomy, pp. 143–166. Freeman, San Francisco (1975)

    Google Scholar 

  6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)

    Book  MATH  Google Scholar 

  7. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Sig. Process 83, 825–833 (2003)

    Article  MATH  Google Scholar 

  8. Bolshakova, N., Azuaje, F., Cunningham, P.: A knowledge-driven approach to cluster validity assessment. Bioinformatics 21, 2546–2547 (2005)

    Article  Google Scholar 

  9. Boudraa, A.O.: Dynamic estimation of number of clusters in data set. Electron. Lett. 35, 1606–1608 (1999)

    Google Scholar 

  10. Bouguessa, M., Wang, S., Sun, H.: An objective approach to cluster validation. Pattern Recogn. Lett. 27, 1419–1430 (2006)

    Article  Google Scholar 

  11. Breckenridge, J.: Replicating cluster analysis: method, consistency and validity. Multivar. Behav. Res. 24, 147–161 (1989)

    Article  Google Scholar 

  12. Brock, G.N., Pihur, V., Datta, S., Datta, S.: clValid: an R package for cluster validation. J. Stat. Softw. 251, 22 (2008). http://www.jstatsoft.org/v25/i04

    Google Scholar 

  13. Celikyilmaz, A., Türksen, I.B.: Enhanced fuzzy system models with improved fuzzy clustering algorithm. IEEE Trans. Fuzzy Syst. 16, 779–794 (2008)

    Article  Google Scholar 

  14. Celikyilmaz, A., Türksen, I.B.: Validation criteria for enhanced fuzzy clustering. Pattern Recogn. Lett. 29, 97–108 (2008)

    Article  Google Scholar 

  15. Chen, M.Y., Linkens, A.: Rule-base self-generation and simplification for data-driven fuzzy models. J. Fuzzy Sets Syst. 142, 243–265 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  16. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  17. Dudoit, S., Fridlyand, J.: A prediction based re-sampling method for estimating the number of clusters in a data set. Genome Biol. 3, 1–21 (2002)

    Article  Google Scholar 

  18. Dunn, J.C.: Well separated clusters and fuzzy partitions. J. Cybern. 4(1974), 95–104 (1974)

    MathSciNet  Google Scholar 

  19. Falasconi, M., Gutierrez, A., Pardo, M., Sberveglieri, G., Marco, S.: A stability based validity method for fuzzy clustering. Pattern Recogn. 43, 1292–1305 (2010)

    Article  MATH  Google Scholar 

  20. Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of Fifth Fuzzy Systems Symposium, pp. 247–250 (in Japanese) (1989)

    Google Scholar 

  21. Gan, G., Chaoqun, M., Jianhong, W.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia, ASA, Alexandria (2007)

    Google Scholar 

  22. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity methods part I. ACM SIGMOD Rec. 31, 40–45 (2002)

    Article  Google Scholar 

  23. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity methods part II. ACM SIGMOD Rec. 31, 19–27 (2002)

    Article  Google Scholar 

  24. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Intell. Inf. Syst. J. 17, 107–145 (2001). Kluwer Pulishers

    Article  MATH  Google Scholar 

  25. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)

    Article  Google Scholar 

  26. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  27. Kim, D.W., Lee, K.H., Lee, D.: On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recogn. 37, 2009–2025 (2004)

    Article  Google Scholar 

  28. Kwon, S.K.: Cluster validity index for fuzzy clustering. Electron. Lett. 34, 2176–2177 (1998)

    Article  Google Scholar 

  29. Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability based validation of clustering solutions. Neural Comput. 16, 1299–1323 (2004)

    Article  MATH  Google Scholar 

  30. Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13, 2573–2593 (2001)

    Article  MATH  Google Scholar 

  31. Mufti, G.B., Bertrand, P., El Moubarki, L.: Determining the number of groups from measures of cluster validity. In: Proceedings of ASMDA2005, pp. 404–414 (2005)

    Google Scholar 

  32. Ozkan, I., Türksen, I.B.: Entropy assessment of type-2 fuzziness. IEEE Int. Conf. Fuzzy Syst. 2, 1111–1115 (2004)

    Google Scholar 

  33. Ozkan, I., Turksen, I.B.: Upper and lower values for the level of fuzziness in FCM. Inf. Sci. 177, 5143–5152 (2007)

    Article  MATH  Google Scholar 

  34. Ozkan, I., Turksen, I.B.: MiniMax ɛ-stable cluster validity index for type-2 fuzziness. Inf. Sci. 184, 64–74 (2007)

    Article  Google Scholar 

  35. Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379 (1995)

    Article  Google Scholar 

  36. Pascual, D., Pla, F., Sánchez, J.S.: Cluster validation using information stability measures. Pattern Recogn. Lett. 31, 454–461 (2010)

    Article  Google Scholar 

  37. Rezaee, M.R., Lelieveldt, B.P.F., Reiber, J.H.C.: A new cluster validity index for the fuzzy c-means. Pattern Recogn. Lett. 19, 237–246 (1998)

    Article  MATH  Google Scholar 

  38. Salem, S.A., Nandi, A.K.: Development of assessment criteria for clustering algorithms. Pattern Anal. Appl. 12, 79–98 (2009)

    Article  MathSciNet  Google Scholar 

  39. Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. J. 27,379–423 and 27, 623–656 (in two parts) (1948)

    Google Scholar 

  40. Sugar, C., James, G.: Finding the number of clusters in a data set: an information theoretic approach. J. Am. Stat. Assoc. 98, 750–763 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  41. Sugar, C., Lenert, L., Olshen, R.: An application of cluster analysis to health services research: empirically defined health states for depression from the sf-12. Technical Report, Stanford University, Stanford (1999)

    Google Scholar 

  42. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B Part 2, 63, 411–423 (2001)

    Google Scholar 

  43. Tibshirani, R., Walther, G.: Cluster validation by prediction strength. J. Comput. Graph. Stat. 14, 511–528 (2005)

    Article  MathSciNet  Google Scholar 

  44. Volkovich, Z., Barzily, Z., Morozensky, L.: A statistical model of cluster stability. Pattern Recogn. 41, 2174–2188 (2008)

    Article  MATH  Google Scholar 

  45. Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158, 2095–2117 (2007)

    Article  MATH  Google Scholar 

  46. Wu, K.L., Yang, M.S., Hsieh, J.N.: Robust cluster validity indexes. Pattern Recogn. 42, 2541–2550 (2009)

    Article  MATH  Google Scholar 

  47. Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)

    Article  Google Scholar 

  48. Yu, J., Cheng, Q., Huang, H.: Analysis of the weighting exponent in the FCM. IEEE Trans. Syst. Man Cybern. B 34, 634–639 (2004)

    Article  Google Scholar 

  49. Zhang, Y., Wang, W., Zhang, X., Li, Y.: A cluster validity index for fuzzy clustering. Inf. Sci. 178, 1205–1218 (2008)

    Article  MATH  Google Scholar 

  50. Yue, S., Wang, J.S., Wu, T., Wang, H.: A new separation measure for improving the effectiveness of validity indices. Inf. Sci. 180, 748–764 (2010)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was partially supported by the Natural Science and Engineering Research Council (NSERC) Grant (RPGIN 7698-05) to University of Toronto. Also, partial support is provided by Hacettepe University and TOBB Economics and Technology University. Their support is greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ibrahim Ozkan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Ozkan, I., Burhan Türkşen, I. (2013). A Review of Cluster Validation with an Example of Type-2 Fuzzy Application in R. In: Sadeghian, A., Mendel, J., Tahayori, H. (eds) Advances in Type-2 Fuzzy Sets and Systems. Studies in Fuzziness and Soft Computing, vol 301. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6666-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6666-6_14

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-6665-9

  • Online ISBN: 978-1-4614-6666-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics