A Review of Cluster Validation with an Example of Type-2 Fuzzy Application in R

Ozkan, Ibrahim; Burhan Türkşen, I.

doi:10.1007/978-1-4614-6666-6_14

Ibrahim Ozkan⁴ &
I. Burhan Türkşen^5,6

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 301))

1396 Accesses

Abstract

Interval valued type-2 fuzziness can be represented by means of membership functions obtained with upper and lower values of the level of fuzziness. These upper and lower values for the level of fuzziness in FCM algorithm were obtained in our previous studies. A particular application of Interval valued type-2 fuzziness is shown for cluster validity analysis in this chapter. For this purpose, we introduce a brief taxonomy for cluster validity indices to clarify the contribution of our novel approach. To provide reproducibility of our technique, the source code is written in freely available language ‘R’ and can be found on our web site.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
‘R’ SW can be downloaded from http://cran.r-project.org/ web site.
2.
See http://cran.r-project.org/web/packages/cluster/index.html.
3.
http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/e1071/html/cmeans.html.
4.
Euclidian distance is defined as square root of \( \sum\limits_{i} {(x_{i} - y_{i} )^{2} } \) and Manhattan is \( \sum\limits_{i} {abs(x_{i} - y_{i} )} . \)
5.
Kim et al. [27], used similar intuition and suggested that the optimal number of clusters can be found by minimizing the change in cluster center with respect to the number of clusters.
6.
\( l_{\infty } \) norm is defined as \( \lim_{p \to \infty } l_{p} \) where \( l_{p} \) is p-norm. Since p-norm is given as \( \left\| {v_{c}^{{m_{u} }} - v_{c}^{{m_{l} }} } \right\|_{p} = \sum\limits_{i = 1}^{nv} {((v_{c,i}^{{m_{u} }} )^{p} - (v_{c,i}^{{m_{l} }} )^{p} )^{\frac{1}{p}} } \), hence, \( \lim_{p \to \infty } \left\| {v_{c}^{{m_{u} }} - v_{c}^{{m_{l} }} } \right\|_{\infty } = \max_{i = 1}^{nv} \left| {v_{c,i}^{{m_{u} }} - v_{c,i}^{{m_{l} }} } \right| \).
7.
Iris data is available in R SW. Wine data can be downloaded from http://archive.ics.uci.edu/ml/machine-learning-databases/wine/ manually or by using R SW as it is shown in “iris and wine data ex.r” script file.
8.
http://www.ics.uci.edu/~mlearn/databases.html
9.
This software can be downloaded from http://cran.r-project.org/. There are also several good documents related to this statistical computing environment and there are more than 3500 packages prepared already.

References

Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, vol. 7, pp. 6–17. World Scientific Publishing Co., New Jersey (2002)
Google Scholar
Ben-Hur, A., Guyon, I.: Detecting stable clusters using principal component analysis. In: Brownstein, M.J., Kohodursky, A. (eds.) Methods in Molecular Biology, pp. 159–182 Humana Press, Clifton, (2003)
Google Scholar
Bezdek, JC.: Fuzzy mathematics in pattern classification. Dissertation, Applied Mathematics Center, Cornell University, Ithaca, (1973)
Google Scholar
Bezdek, J.C.: Cluster validity with fuzzy sets. J. Cybernet. 3, 58–72 (1974)
MathSciNet Google Scholar
Bezdek, JC.: Mathematical models for systematics and taxonomy. In: Estabrook, G. (ed.) Proceeding of 8th International Conference on Numerical Taxonomy, pp. 143–166. Freeman, San Francisco (1975)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Book MATH Google Scholar
Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Sig. Process 83, 825–833 (2003)
Article MATH Google Scholar
Bolshakova, N., Azuaje, F., Cunningham, P.: A knowledge-driven approach to cluster validity assessment. Bioinformatics 21, 2546–2547 (2005)
Article Google Scholar
Boudraa, A.O.: Dynamic estimation of number of clusters in data set. Electron. Lett. 35, 1606–1608 (1999)
Google Scholar
Bouguessa, M., Wang, S., Sun, H.: An objective approach to cluster validation. Pattern Recogn. Lett. 27, 1419–1430 (2006)
Article Google Scholar
Breckenridge, J.: Replicating cluster analysis: method, consistency and validity. Multivar. Behav. Res. 24, 147–161 (1989)
Article Google Scholar
Brock, G.N., Pihur, V., Datta, S., Datta, S.: clValid: an R package for cluster validation. J. Stat. Softw. 251, 22 (2008). http://www.jstatsoft.org/v25/i04
Google Scholar
Celikyilmaz, A., Türksen, I.B.: Enhanced fuzzy system models with improved fuzzy clustering algorithm. IEEE Trans. Fuzzy Syst. 16, 779–794 (2008)
Article Google Scholar
Celikyilmaz, A., Türksen, I.B.: Validation criteria for enhanced fuzzy clustering. Pattern Recogn. Lett. 29, 97–108 (2008)
Article Google Scholar
Chen, M.Y., Linkens, A.: Rule-base self-generation and simplification for data-driven fuzzy models. J. Fuzzy Sets Syst. 142, 243–265 (2004)
Article MathSciNet MATH Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Article Google Scholar
Dudoit, S., Fridlyand, J.: A prediction based re-sampling method for estimating the number of clusters in a data set. Genome Biol. 3, 1–21 (2002)
Article Google Scholar
Dunn, J.C.: Well separated clusters and fuzzy partitions. J. Cybern. 4(1974), 95–104 (1974)
MathSciNet Google Scholar
Falasconi, M., Gutierrez, A., Pardo, M., Sberveglieri, G., Marco, S.: A stability based validity method for fuzzy clustering. Pattern Recogn. 43, 1292–1305 (2010)
Article MATH Google Scholar
Fukuyama, Y., Sugeno, M.: A new method of choosing the number of clusters for the fuzzy c-means method. In: Proceedings of Fifth Fuzzy Systems Symposium, pp. 247–250 (in Japanese) (1989)
Google Scholar
Gan, G., Chaoqun, M., Jianhong, W.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia, ASA, Alexandria (2007)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity methods part I. ACM SIGMOD Rec. 31, 40–45 (2002)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity methods part II. ACM SIGMOD Rec. 31, 19–27 (2002)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Intell. Inf. Syst. J. 17, 107–145 (2001). Kluwer Pulishers
Article MATH Google Scholar
Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Kim, D.W., Lee, K.H., Lee, D.: On cluster validity index for estimation of the optimal number of fuzzy clusters. Pattern Recogn. 37, 2009–2025 (2004)
Article Google Scholar
Kwon, S.K.: Cluster validity index for fuzzy clustering. Electron. Lett. 34, 2176–2177 (1998)
Article Google Scholar
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability based validation of clustering solutions. Neural Comput. 16, 1299–1323 (2004)
Article MATH Google Scholar
Levine, E., Domany, E.: Resampling method for unsupervised estimation of cluster validity. Neural Comput. 13, 2573–2593 (2001)
Article MATH Google Scholar
Mufti, G.B., Bertrand, P., El Moubarki, L.: Determining the number of groups from measures of cluster validity. In: Proceedings of ASMDA2005, pp. 404–414 (2005)
Google Scholar
Ozkan, I., Türksen, I.B.: Entropy assessment of type-2 fuzziness. IEEE Int. Conf. Fuzzy Syst. 2, 1111–1115 (2004)
Google Scholar
Ozkan, I., Turksen, I.B.: Upper and lower values for the level of fuzziness in FCM. Inf. Sci. 177, 5143–5152 (2007)
Article MATH Google Scholar
Ozkan, I., Turksen, I.B.: MiniMax ɛ-stable cluster validity index for type-2 fuzziness. Inf. Sci. 184, 64–74 (2007)
Article Google Scholar
Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379 (1995)
Article Google Scholar
Pascual, D., Pla, F., Sánchez, J.S.: Cluster validation using information stability measures. Pattern Recogn. Lett. 31, 454–461 (2010)
Article Google Scholar
Rezaee, M.R., Lelieveldt, B.P.F., Reiber, J.H.C.: A new cluster validity index for the fuzzy c-means. Pattern Recogn. Lett. 19, 237–246 (1998)
Article MATH Google Scholar
Salem, S.A., Nandi, A.K.: Development of assessment criteria for clustering algorithms. Pattern Anal. Appl. 12, 79–98 (2009)
Article MathSciNet Google Scholar
Shannon, C.E.: The mathematical theory of communication. Bell Syst. Tech. J. 27,379–423 and 27, 623–656 (in two parts) (1948)
Google Scholar
Sugar, C., James, G.: Finding the number of clusters in a data set: an information theoretic approach. J. Am. Stat. Assoc. 98, 750–763 (2003)
Article MathSciNet MATH Google Scholar
Sugar, C., Lenert, L., Olshen, R.: An application of cluster analysis to health services research: empirically defined health states for depression from the sf-12. Technical Report, Stanford University, Stanford (1999)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B Part 2, 63, 411–423 (2001)
Google Scholar
Tibshirani, R., Walther, G.: Cluster validation by prediction strength. J. Comput. Graph. Stat. 14, 511–528 (2005)
Article MathSciNet Google Scholar
Volkovich, Z., Barzily, Z., Morozensky, L.: A statistical model of cluster stability. Pattern Recogn. 41, 2174–2188 (2008)
Article MATH Google Scholar
Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158, 2095–2117 (2007)
Article MATH Google Scholar
Wu, K.L., Yang, M.S., Hsieh, J.N.: Robust cluster validity indexes. Pattern Recogn. 42, 2541–2550 (2009)
Article MATH Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847 (1991)
Article Google Scholar
Yu, J., Cheng, Q., Huang, H.: Analysis of the weighting exponent in the FCM. IEEE Trans. Syst. Man Cybern. B 34, 634–639 (2004)
Article Google Scholar
Zhang, Y., Wang, W., Zhang, X., Li, Y.: A cluster validity index for fuzzy clustering. Inf. Sci. 178, 1205–1218 (2008)
Article MATH Google Scholar
Yue, S., Wang, J.S., Wu, T., Wang, H.: A new separation measure for improving the effectiveness of validity indices. Inf. Sci. 180, 748–764 (2010)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work was partially supported by the Natural Science and Engineering Research Council (NSERC) Grant (RPGIN 7698-05) to University of Toronto. Also, partial support is provided by Hacettepe University and TOBB Economics and Technology University. Their support is greatly appreciated.

Author information

Authors and Affiliations

Department of Economics, Hacettepe University, Beytepe, 06800, Ankara, Turkey
Ibrahim Ozkan
TOBB ETU, Ankara, Turkey
I. Burhan Türkşen
Department of Mechanical and Industrial Engineering, University of Toronto, Ontario, M5S3G8, Canada
I. Burhan Türkşen

Authors

Ibrahim Ozkan
View author publications
You can also search for this author in PubMed Google Scholar
I. Burhan Türkşen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ibrahim Ozkan .

Editor information

Editors and Affiliations

Ryerson University, Victoria Street 350, Toronto, M5B 2K3, Ontario, Canada
Alireza Sadeghian
Hughes Aircraft Electrical Engineering B, Department of Electrical Engineering - S, University of Southern California, 3740 McClintock Ave., Los Angeles, 90089-2564, California, USA
Jerry M. Mendel
Ryerson University, Victoria Street 350, Toronto, M5B 2K3, Ontario, Canada
Hooman Tahayori

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ozkan, I., Burhan Türkşen, I. (2013). A Review of Cluster Validation with an Example of Type-2 Fuzzy Application in R. In: Sadeghian, A., Mendel, J., Tahayori, H. (eds) Advances in Type-2 Fuzzy Sets and Systems. Studies in Fuzziness and Soft Computing, vol 301. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6666-6_14

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6666-6_14
Published: 25 June 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6665-9
Online ISBN: 978-1-4614-6666-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics