Advertisement

New Internal Clustering Evaluation Index Based on Line Segments

  • Juan Carlos Rojas Thomas
  • Matilde Santos PeñasEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11871)

Abstract

This work proposes a new internal clustering evaluation index, based on line segments as central elements of the clusters. The data dispersion is calculated as the average of the distances of the cluster to the respective line segment. It also defines a new measure of distance based on a line segment that connects the centroids of the clusters, from which an approximation of the edges of their geometries is obtained. The proposed index is validated with a series of experiments on 10 artificial data sets that are generated with different cluster characteristics, such as size, shape, noise and dimensionality, and on 8 real data sets. In these experiments, the performance of the new index is compared with 12 representative indices of the literature, surpassing all of them. These results allow to conclude the effectiveness of the proposal and shows the appropriateness of including geometric properties in the definition of internal indexes.

Keywords

Clustering evaluation Internal indexes Geometry Line segment 

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recogn 46(1), 243–256 (2013)CrossRefGoogle Scholar
  3. 3.
    Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)CrossRefGoogle Scholar
  4. 4.
    Brun, M., et al.: Model-based evaluation of clustering validation measures. Pattern Recogn. 40(3), 807–824 (2007)CrossRefGoogle Scholar
  5. 5.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. Theory Methods 3(1), 1–27 (1974)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24(12), 1650–1654 (2002)CrossRefGoogle Scholar
  7. 7.
    Davies, D., Bouldin, D.: A cluster separation measure. IEEE PAMI 1(2), 224–227 (1979)Google Scholar
  8. 8.
    Xie, S.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 8, 841–847 (1991)CrossRefGoogle Scholar
  9. 9.
    Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern 4(1), 95–104 (1974)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Chou, C-H., Mu-Chun S., Lai, E.: A new cluster validity measure for clusters with different densities. In: IASTED International Conference on Intelligent Systems and Control (2003)Google Scholar
  11. 11.
    Hubert, L.J., Levin, J.R.: A general statistical framework for assessing categorical clustering in free recall. Psychol. Bull. 83(6), 1072 (1976)CrossRefGoogle Scholar
  12. 12.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar
  13. 13.
    Baker, F.B., Hubert, L.J.: Measuring the power of hierarchical cluster analysis. J. Am. Stat. Assoc. 70, 31–38 (1975)CrossRefGoogle Scholar
  14. 14.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: part II. ACM Sigmod Rec. 31(3), 19–27 (2002)CrossRefGoogle Scholar
  15. 15.
    Thomas, J.C.R.: A new clustering algorithm based on k-means using a line segment as prototype. In: San Martin, C., Kim, S.-W. (eds.) CIARP 2011. LNCS, vol. 7042, pp. 638–645. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-25085-9_76CrossRefGoogle Scholar
  16. 16.
    Dua, D. Graff, C.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine, CA (2019). http://archive.ics.uci.edu/ml
  17. 17.
    Rojas-Thomas, J.C., Santos M., Mora, M., Duro, N.: Performance analysis of clustering internal validation indexes with asymmetric clusters. IEEE Lat. Am. Trans. (5) (2019, in press) Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Juan Carlos Rojas Thomas
    • 1
  • Matilde Santos Peñas
    • 2
    Email author
  1. 1.Departamento de Informática y AutomáticaUNEDMadridSpain
  2. 2.Facultad de InformáticaUniversidad Complutense de MadridMadridSpain

Personalised recommendations