Advertisement

Improvement of the Simplified Silhouette Validity Index

  • Artur StarczewskiEmail author
  • Krzysztof Przybyszewski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)

Abstract

The fundamental issue of data clustering is an evaluation of results of clustering algorithms. Lots of methods have been proposed for cluster validation. The most popular approach is based on internal cluster validity indices. Among this kind of indices, the Silhouette index and its computationally simpled version, i.e. the Simplified Silhouette, are frequently used. In this paper modification of the Simplified Silhouette index is proposed. The suggested approach is based on using an additional component, which improves clusters validity assessment. The performance of the new cluster validity indices has been demonstrated for artificial and real datasets, where the PAM clustering algorithm has been applied as the underlying clustering technique.

Keywords

Clustering Cluster validity index PAM clustering technique 

References

  1. 1.
    Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Prez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn. 46, 243–256 (2013)CrossRefGoogle Scholar
  2. 2.
    Bilski, J., Smoląg, J.: Parallel architectures for learning the RTRN and Elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. 26(9), 2561–2570 (2015)CrossRefGoogle Scholar
  3. 3.
    Bilski, J., Wilamowski, B.M.: Parallel learning of feedforward neural networks without error backpropagation. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9692, pp. 57–69. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39378-0_6CrossRefGoogle Scholar
  4. 4.
    Bologna, G., Hayashi, Y.: Characterization of symbolic rules embedded in deep DIMLP networks: a challenge to transparency of deep learning. J. Artif. Intell. Soft Comput. Res. 7(4), 265–286 (2017).  https://doi.org/10.1515/jaiscr-2017-0019CrossRefGoogle Scholar
  5. 5.
    Bradley, P., Fayyad, U.: Refining initial points for k-means clustering. In: Proceedings of the Fifteenth International Conference on Knowledge Discovery and Data Mining, pp. 9–15. AAAI Press, New York (1998)Google Scholar
  6. 6.
    Chang, O., Constante, P., Gordon, A., Singana, M.: A novel deep neural network that uses space-time features for tracking and recognizing a moving object. J. Artif. Intell. Soft Comput. Res. 7(2), 125–136 (2017).  https://doi.org/10.1515/jaiscr-2017-0009CrossRefGoogle Scholar
  7. 7.
    Cpałka, K., Rebrova, O., Nowicki, R., Rutkowski, L.: On design of flexible neuro-fuzzy systems for nonlinear modelling. Int. J. Gen. Syst. 42(6), 706–720 (2013)zbMATHCrossRefGoogle Scholar
  8. 8.
    Cpałka, K., Rutkowski, L.: Flexible Takagi-Sugeno fuzzy systems. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, IJCNN (2005)Google Scholar
  9. 9.
    Devi, V.S., Meena, L.: Parallel MCNN (PMCNN) with application to prototype selection on large and streaming data. J. Artif. Intell. Soft Comput. Res. 7(3), 155–169 (2017).  https://doi.org/10.1515/jaiscr-2017-0011CrossRefGoogle Scholar
  10. 10.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)CrossRefGoogle Scholar
  11. 11.
    Gabryel, M.: A bag-of-features algorithm for applications using a NoSQL database. Inf. Softw. Technol. 639, 332–343 (2016)CrossRefGoogle Scholar
  12. 12.
    Gabryel, M., Grycuk, R., Korytkowski, M., Holotyak, T.: Image indexing and retrieval using GSOM algorithm. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2015. LNCS (LNAI), vol. 9119, pp. 706–714. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19324-3_63CrossRefGoogle Scholar
  13. 13.
    Gałkowski, T.: Kernel estimation of regression functions in the boundary regions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2013. LNCS (LNAI), vol. 7895, pp. 158–166. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38610-7_15CrossRefGoogle Scholar
  14. 14.
    Galkowski, T., Pawlak, M.: Nonparametric estimation of edge values of regression functions. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 49–59. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39384-1_5CrossRefGoogle Scholar
  15. 15.
    Hruschka, E.R., de Castro, L.N., Campello, R.J.: Evolutionary algorithms for clustering gene-expression data. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 403–406. IEEE (2004)Google Scholar
  16. 16.
    Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  17. 17.
    Ke, Y., Hagiwara, M.: An English neural network that learns texts, finds hidden knowledge, and answers questions. J. Artif. Intell. Soft Comput. Res. 7(4), 229–242 (2017).  https://doi.org/10.1515/jaiscr-2017-0016CrossRefGoogle Scholar
  18. 18.
    Lago-Fernández, L.F., Corbacho, F.: Normality-based validation for crisp clustering. Pattern Recogn. 43(3), 782–795 (2010)zbMATHCrossRefGoogle Scholar
  19. 19.
    Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
  20. 20.
    Liu, H., Gegov, A., Cocea, M.: Rule based networks: an efficient and interpretable representation of computational models. J. Artif. Intell. Soft Comput. Res. 7(2), 111–123 (2017).  https://doi.org/10.1515/jaiscr-2017-0008CrossRefGoogle Scholar
  21. 21.
    Meng, X., van Dyk, D.: The EM algorithm - an old folk-song sung to a fast new tune. J. Roy. Stat. Soc. Ser. B (Methodol.) 59(3), 511–567 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)zbMATHCrossRefGoogle Scholar
  23. 23.
    Pal, N.R., Bezdek, J.C.: On cluster validity for the fuzzy c-means model. IEEE Trans. Fuzzy Syst. 3(3), 370–379 (1995)CrossRefGoogle Scholar
  24. 24.
    Park, H.S., Jun, C.H.: A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 36(2), 3336–3341 (2009)CrossRefGoogle Scholar
  25. 25.
    Rohlf, F.: Single-link clustering algorithms. In: Krishnaiah, P.R, Kanal, L.N. (eds.) Handbook of Statistics, vol. 2, pp. 267–284 (1982)Google Scholar
  26. 26.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)zbMATHCrossRefGoogle Scholar
  27. 27.
    Rutkowski L, Cpałka K.: Compromise approach to neuro-fuzzy systems. In: Sincak, P., Vascak, J., Kvasnicka, V., Pospichal, J. (eds.) Intelligent Technologies - Theory and Applications. New Trends in Intelligent Technologies. Frontiers in Artificial Intelligence and Applications, vol. 76, pp. 85–90 (2002)Google Scholar
  28. 28.
    Rutkowski, L., Cpałka, K.: A neuro-fuzzy controller with a compromise fuzzy reasoning. Control Cybern. 31(2), 297–308 (2002)zbMATHGoogle Scholar
  29. 29.
    Saha, S., Bandyopadhyay, S.: Some connectivity based cluster validity indices. Appl. Soft Comput. 12(5), 1555–1565 (2012)CrossRefGoogle Scholar
  30. 30.
    Sameh, A.S., Asoke, K.N.: Development of assessment criteria for clustering algorithms. Pattern Anal. Appl. 12(1), 79–98 (2009)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Serdah, A.M., Ashour, W.M.: Clustering large-scale data based on modified affinity propagation algorithm. J. Artif. Intell. Soft Comput. Res. 6(1), 23–33 (2016).  https://doi.org/10.1515/jaiscr-2016-0003CrossRefGoogle Scholar
  32. 32.
    Shieh, H.-L.: Robust validity index for a modified subtractive clustering algorithm. Appl. Soft Comput. 22, 47–59 (2014)CrossRefGoogle Scholar
  33. 33.
    Starczewski, A.: A new validity index for crisp clusters. Pattern Anal. Appl. 20(3), 687–700 (2017)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Starczewski, A., Krzyżak, A.: A modification of the silhouette index for the improvement of cluster validity assessment. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2016. LNCS (LNAI), vol. 9693, pp. 114–124. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39384-1_10CrossRefGoogle Scholar
  35. 35.
    Starczewski, A., Krzyżak, A.: Improvement of the validity index for determination of an appropriate data partitioning. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10246, pp. 159–170. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59060-8_16CrossRefGoogle Scholar
  36. 36.
    Wu, K.L., Yang, M.S., Hsieh, J.N.: Robust cluster validity indexes. Pattern Recogn. 42, 2541–2550 (2009)zbMATHCrossRefGoogle Scholar
  37. 37.
    Vendramin, L., Campello, R.J., Hruschka, E.R.: Relative clustering validity criteria: a comparative overview. Stat. Anal. Data Min. 3(4), 209–235 (2010)MathSciNetGoogle Scholar
  38. 38.
    Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng. 92, 77–89 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Computational IntelligenceCzęstochowa University of TechnologyCzęstochowaPoland
  2. 2.Information Technology InstituteUniversity of Social SciencesŁódźPoland
  3. 3.Clark UniversityWorcesterUSA

Personalised recommendations