Sankhya B

, Volume 80, Issue 1, pp 19–36 | Cite as

Similar Coefficient of Cluster for Discrete Elements

  • Tai VoVan
  • Thao Nguyen Trang


This article proposes a new concept called Cluster Similar Coefficient (CSC) for discrete elements. CSC is not only used as a criterion to build cluster by hierarchical and non-hierarchical approaches but also to evaluate the quality of established clusters quality. Based on CSC, we also propose four algorithms: to determine the suitable number of clusters, to analyze the non-fuzzy clusters, to analyze the fuzzy clusters and to build clusters with given CSC. The proposed algorithms are performed by Matlab procedures that would allow users to perform efficiently and conveniently in practice. The numerical examples demonstrate suitability and advantages of using CSC as a criterion to build the clusters in comparing with others.

Keywords and phrases

Cluster Hierarchical Non-hierarchical Similar coefficient Distance 

AMS (2000) subject classification

Primary 62H30 Secondary 68T10 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ayala-Ramirez, V., Obara-Kepowicz, M., Sanchez-Yanez, R.E. and Jaime-Rivas, R. (2003). Bayesian texture classification method using a random sampling scheme. In IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2065–2069.Google Scholar
  2. Babuška, R. (2012). Fuzzy modeling for control, vol. 12. Springer Science & Business Media.Google Scholar
  3. Ball, G.H. and Hall, I. (1965). A novel method of data analysis and pattern classification. Isodata, A novel method of data analysis and pattern classification. Tch. Report 5RI, Project 5533.Google Scholar
  4. Bock, H.H. (1974). Automatic classification. Vandenhoeck and Ruprechat.Google Scholar
  5. Bora, D.J. and Gupta, A.K. (2014). Impact of exponent parameter value for the partition matrix on the performance of fuzzy c means algorithm. arXiv:1406.4007.
  6. Brodatz, P. (1966). Textures: a photographic album for artists and designers. Dover Publications, New York.Google Scholar
  7. Cannon, R.L., Dave, J.V. and Bezdek, J.C. (1986). Efficient implementation of the fuzzy c-means clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 8, 248–255.Google Scholar
  8. Celebi, E. and Alpkocak, A. (2000). Clustering of texture features for content-based image retrieval. In Advances in Information Systems, pp. 216–225. Springer, Berlin.Google Scholar
  9. Defays, D. (1977). An efficient algorithm for a complete link method. Comput. J. 20, 364–366.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Dunn, J.C. (1974). Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, vol. 96, pp. 226–231.Google Scholar
  12. Fadili, M.J., Ruan, S., Bloyet, D. and Mazoyer, B. (2001). On the number of clusters and the fuzziness index for unsupervised FCA application to BOLD fMRI time series. Med. Image Anal. 5, 55–67.CrossRefGoogle Scholar
  13. Ganti, V., Gehrke, J. and Ramakrishnan, R. (1999). CACTUS–clustering categorical data using summaries. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge discovery and Data Mining, pp. 73–83. ACM.Google Scholar
  14. Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R. P., Silbiger, M. S. and Bezdek, J. C. (1992). A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain. IEEE Trans. Neural Netw. 3, 672–682.CrossRefGoogle Scholar
  15. Haralick, R.M. (1979). Statistical and structural approaches to texture. Proc. IEEE 67, 786–804.CrossRefGoogle Scholar
  16. Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classif. 2, 193–218.CrossRefzbMATHGoogle Scholar
  17. Hung, W.L. and Yang, J.H. (2015). Automatic clustering algorithm for fuzzy data. J. Appl. Stat. 42, 1503–1518.MathSciNetCrossRefGoogle Scholar
  18. Jain, A.K. and Dubes, R.C. (1988). Algorithms for clustering data. Prentice-Hall, Englewood Cliffs.zbMATHGoogle Scholar
  19. Johnson, R.A. and Wichern, D.W. (1992). Applied multivariate statistical analysis, 4. Prentice-Hall, Englewood Cliffs.zbMATHGoogle Scholar
  20. Kaufman, L. and Rousseeuw, P. (1987). Clustering by means of medoids. North-Holland, Amsterdam.Google Scholar
  21. Keinosuke, F. (1990). Introduction to statistical pattern recognition. Academic Press, New York.zbMATHGoogle Scholar
  22. Kohonen, T. (2012). Self-organization and associative memory, vol. 8. Springer Science & Business Media.Google Scholar
  23. Lauritzen, S.L. (1995). The EM algorithm for graphical association models with missing data. Comput. Stat. Data Anal. 19, 191–201.CrossRefzbMATHGoogle Scholar
  24. Li, J. and Wang, J.Z. (2008). Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Mach. Intell. 30, 985–1002.CrossRefGoogle Scholar
  25. Lissack, T. and Fu, K.S. (1976). Error estimation in pattern recognition via distance between posterior density functions. IEEE Trans. Inf. Theory 22, 34–45.MathSciNetCrossRefzbMATHGoogle Scholar
  26. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and probability, vol. 1, pp. 281–297. Oakland.Google Scholar
  27. Martinez, W.L. and Martinez, A.R. (2007). Computational Statistics Handbook with MATLAB, 2nd edn. Chapman & Hall/CRC Computer Science & Data Analysis. CRC Press, Boca Raton.zbMATHGoogle Scholar
  28. Pal, N.R. and Bezdek, J.C. (1995). On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3, 370–379.CrossRefGoogle Scholar
  29. Popat, K. and Picard, R.W. (1997). Cluster-based probability model and its application to image and texture processing.
  30. Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc., 66.
  31. Sheikholeslami, G., Chatterjee, S. and Zhang, A. (1998). Wavecluster: a multi-resolution clustering approach for very large spatial databases. VLDB 98, 428–439.Google Scholar
  32. Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. Comput. J. 16, 30–34.MathSciNetCrossRefGoogle Scholar
  33. Sneath, P.H.A. and Sokal, R.R. (1973). Numerical taxonomy. The principles and practice of numerical classification.Google Scholar
  34. Vo Van, T. and Pham-Gia, T. (2010). Clustering probability distributions. J. Appl. Stat. 37, 1891–1910.MathSciNetCrossRefGoogle Scholar
  35. Webb, A.R. (2003). Statistical pattern recognition. Wiley, New York.zbMATHGoogle Scholar
  36. Wong, A.K.C. and Wang, D.C.C. (1979). DECA: A discrete-valued data clustering algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 1, 342–349.Google Scholar
  37. Xie, X.L. and Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13, 841–847.Google Scholar
  38. Yu, J., Cheng, Q. and Huang, H. (2004). Analysis of the weighting exponent in the FCM. IEEE Trans. Syst. Man Cybern. B Cybern. 34, 634–639.CrossRefGoogle Scholar
  39. Zhang, Y., Wang, J.Z. and Li, J. (2015). Parallel massive clustering of discrete distributions. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 11, 49.Google Scholar

Copyright information

© Indian Statistical Institute 2018

Authors and Affiliations

  1. 1.Department of MathematicsCan Tho UniversityCan ThoVietnam
  2. 2.Division of Computational Mathematics and Engineering, Institute for Computational ScienceTon Duc Thang UniversityHo Chi MinhVietnam
  3. 3.Faculty of Mathematics and StatisticsTon Duc Thang UniversityHo Chi Minh CityVietnam

Personalised recommendations