Skip to main content
Log in

Gabriel graph-based connectivity and density for internal validity of clustering

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Clustering has an important role in data mining field. However, there is a large variety of clustering algorithms and each could generate quite different results depending on input parameters. In the research literature, several cluster validity indices have been proposed to evaluate clustering results and find the partition that best fits the input dataset. However, these validity indices may fail to achieve satisfactory results, especially in case of clusters with arbitrary shapes. In this paper, we propose a new cluster validity index for density-based, arbitrarily shaped clusters. Our new index is based on the density and connectivity relations extracted among the data points, based on the proximity graph, Gabriel graph. The incorporation of the connectivity and density relations allows achieving the best clustering results in the case of clusters with any shape, size or density. The experimental results on synthetic and real datasets, using the well-known neighborhood-based clustering (NBC) algorithm and the DBSCAN (density-based spatial clustering of applications with noise) algorithm, illustrate the superiority of the proposed index over some classical and recent indices and show its effectiveness for the evaluation of clustering algorithms and the selection of their appropriate parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Boca Raton, Florida (2005)

    MATH  Google Scholar 

  2. Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Books in Biology. W.H. Freeman and Company, San Francisco (1973)

    MATH  Google Scholar 

  3. Chou, C.H., Su, M.C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004)

    MathSciNet  Google Scholar 

  4. Barbarà, D., Jajodia, S.: Applications of Data Mining in Computer Security, pp. 78–99. Kluwer Academic Publishers, Boston, MA (2002)

    MATH  Google Scholar 

  5. Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel K-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34, 564–573 (2013)

    Google Scholar 

  6. Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    MATH  Google Scholar 

  7. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: part II. Newslett. ACM SIG MOD Record. 31(3), 19–27 (2002). https://doi.org/10.1145/601858.601862

    Article  MATH  Google Scholar 

  8. Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Google Scholar 

  9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)

    MATH  Google Scholar 

  10. Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recognit. 46, 243–256 (2013)

    Google Scholar 

  11. Batagelj, V., Bren, M.: Comparing resemblance measure. J. Classif. 12, 73–90 (1995)

    MathSciNet  MATH  Google Scholar 

  12. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013)

    Google Scholar 

  13. Zhou, S., Xu, Z.: A novel internal validity index based on the cluster centre and the nearest neighbor cluster. Appl. Soft Comput. J. 71, 78–88 (2018). https://doi.org/10.1016/j.asoc.2018.06.033

    Article  Google Scholar 

  14. Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

    MathSciNet  MATH  Google Scholar 

  15. Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)

    Google Scholar 

  16. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  17. Lee, Y., Lee, J.H., Jun, C.H.: Validation measures of bi cluster solutions. Ind. Eng. Manag. Syst. 8(2), 101–108 (2009)

    MathSciNet  Google Scholar 

  18. Lee, S., Jeong, Y., Kim, J., Jeong, M.K.: A new clustering validity index for arbitrary shape of clusters. Pattern Recognit. Lett. 112, 263–269 (2018). https://doi.org/10.1016/j.patrec.2018.08.005

    Article  Google Scholar 

  19. Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA (2014)

  20. Gabriel, K.R., Sokal, R.R.: New statistical approach to geographic variation analysis. Syst. Zool. 18(3), 259–278 (1969)

    Google Scholar 

  21. Zhou, S., Zhao, Y., Guan, J., Huang, J.: NBC: A neighborhood-based clustering algorithm. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 361–371. (2005)

  22. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)

    MATH  Google Scholar 

  23. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    MATH  Google Scholar 

  24. Zhang, D., Ji, M., Yang, J., Zhang, Y., Xie, F.: A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets Syst. 253, 122–137 (2014)

    MathSciNet  Google Scholar 

  25. Yang, X., Song, Q., Cao, A.: A new cluster validity for data clustering. Neural Process. Lett. 23(3), 325–344 (2006)

    Google Scholar 

  26. Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)

    Google Scholar 

  27. Yang, J., Lee, I.: Cluster validity through graph based boundary analysis. In: IKE, pp. 204–210. (2004)

  28. Pal, N., Biswas, J.: Cluster validation using graph theoretic concepts. Pattern Recognit. 30(6), 847–857 (1997)

    Google Scholar 

  29. Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), pp. 187–194, California, USA (2001)

  30. Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recognit. Lett. 29, 773–786 (2008)

    Google Scholar 

  31. Žalik, K.R., Žalik, B.: Validity index for clusters of different sizes and densities. Pattern Recognit. Lett. 32, 221–234 (2011)

    MATH  Google Scholar 

  32. Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)

    Google Scholar 

  33. Carpineto, C., Romano, G.: Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2315–2326 (2012)

  34. Yeh, C.C., Yang, M.S.: Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. Appl. Soft Comput. 57, 225–234 (2017)

    Google Scholar 

  35. Zaidi, F., Melançon, G.: Evaluating the quality of clustering algorithms using cluster path lengths, 10th Industrial Conference (ICDM), pp. 42–56, Berlin, Germany (2010)

  36. Almeida, H., NetoZaki Jr., D.G.W.M.M.J.: Towards a better quality metric for graph cluster evaluation. J. Inf. Data Manag. 3(3), 378–393 (2012)

    Google Scholar 

  37. Urquhart, R.: Graph theoretical clustering based on limited neighbourhood sets. Pattern Recognit. 15(3), 173–187 (1982)

    MATH  Google Scholar 

  38. Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A graph-theoretic approach to non parametric cluster analysis. IEEE Trans. Comput. 25, 936–944 (1976)

    MathSciNet  MATH  Google Scholar 

  39. Liu, D., Novovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on Delaunay triangulation. Pattern Recognit. Lett. 29, 1261–1273 (2008)

    Google Scholar 

  40. Inkaya, T., Kayalıgil, S., Özdemirel, N.E.: A new density-based clustering approach in graph theoretic context. Int. J. Comput. Sci. Inf. Technol. 5(2), 117–135 (2010)

    Google Scholar 

  41. UCI machine learning repository, https://archive.ics.uci.edu/ml/ (2018)

  42. Franti, P.: Speech and image processing unit, clustering datasets. University of Eastern Finland, School of Computing (2015)

    Google Scholar 

  43. Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1996)

  44. Zhu, E., Ma, R.: An effective partitional clustering algorithm based on new clustering validity index. Appl. Soft Comput. 71, 608–621 (2018)

    Google Scholar 

  45. Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatima Boudane.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boudane, F., Berrichi, A. Gabriel graph-based connectivity and density for internal validity of clustering. Prog Artif Intell 9, 221–238 (2020). https://doi.org/10.1007/s13748-020-00209-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-020-00209-z

Keywords

Navigation