Abstract
Clustering has an important role in data mining field. However, there is a large variety of clustering algorithms and each could generate quite different results depending on input parameters. In the research literature, several cluster validity indices have been proposed to evaluate clustering results and find the partition that best fits the input dataset. However, these validity indices may fail to achieve satisfactory results, especially in case of clusters with arbitrary shapes. In this paper, we propose a new cluster validity index for density-based, arbitrarily shaped clusters. Our new index is based on the density and connectivity relations extracted among the data points, based on the proximity graph, Gabriel graph. The incorporation of the connectivity and density relations allows achieving the best clustering results in the case of clusters with any shape, size or density. The experimental results on synthetic and real datasets, using the well-known neighborhood-based clustering (NBC) algorithm and the DBSCAN (density-based spatial clustering of applications with noise) algorithm, illustrate the superiority of the proposed index over some classical and recent indices and show its effectiveness for the evaluation of clustering algorithms and the selection of their appropriate parameters.
Similar content being viewed by others
References
Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, Boca Raton, Florida (2005)
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Books in Biology. W.H. Freeman and Company, San Francisco (1973)
Chou, C.H., Su, M.C., Lai, E.: A new cluster validity measure and its application to image compression. Pattern Anal. Appl. 7, 205–220 (2004)
Barbarà, D., Jajodia, S.: Applications of Data Mining in Computer Security, pp. 78–99. Kluwer Academic Publishers, Boston, MA (2002)
Sarma, T.H., Viswanath, P., Reddy, B.E.: Speeding-up the kernel K-means clustering method: a prototype based hybrid approach. Pattern Recogn. Lett. 34, 564–573 (2013)
Hartigan, J.A., Wong, M.A.: A K-Means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: part II. Newslett. ACM SIG MOD Record. 31(3), 19–27 (2002). https://doi.org/10.1145/601858.601862
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study of cluster validity indices. Pattern Recognit. 46, 243–256 (2013)
Batagelj, V., Bren, M.: Comparing resemblance measure. J. Classif. 12, 73–90 (1995)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013)
Zhou, S., Xu, Z.: A novel internal validity index based on the cluster centre and the nearest neighbor cluster. Appl. Soft Comput. J. 71, 78–88 (2018). https://doi.org/10.1016/j.asoc.2018.06.033
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, 224–227 (1979)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
Lee, Y., Lee, J.H., Jun, C.H.: Validation measures of bi cluster solutions. Ind. Eng. Manag. Syst. 8(2), 101–108 (2009)
Lee, S., Jeong, Y., Kim, J., Jeong, M.K.: A new clustering validity index for arbitrary shape of clusters. Pattern Recognit. Lett. 112, 263–269 (2018). https://doi.org/10.1016/j.patrec.2018.08.005
Moulavi, D., Jaskowiak, P.A., Campello, R.J.G.B., Zimek, A., Sander, J.: Density-based clustering validation. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia, PA (2014)
Gabriel, K.R., Sokal, R.R.: New statistical approach to geographic variation analysis. Syst. Zool. 18(3), 259–278 (1969)
Zhou, S., Zhao, Y., Guan, J., Huang, J.: NBC: A neighborhood-based clustering algorithm. In: Proceedings of the 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 361–371. (2005)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Zhang, D., Ji, M., Yang, J., Zhang, Y., Xie, F.: A novel cluster validity index for fuzzy clustering based on bipartite modularity. Fuzzy Sets Syst. 253, 122–137 (2014)
Yang, X., Song, Q., Cao, A.: A new cluster validity for data clustering. Neural Process. Lett. 23(3), 325–344 (2006)
Rojas-Thomas, J.C., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)
Yang, J., Lee, I.: Cluster validity through graph based boundary analysis. In: IKE, pp. 204–210. (2004)
Pal, N., Biswas, J.: Cluster validation using graph theoretic concepts. Pattern Recognit. 30(6), 847–857 (1997)
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: finding the optimal partitioning of a data set. In: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), pp. 187–194, California, USA (2001)
Halkidi, M., Vazirgiannis, M.: A density-based cluster validity approach using multi-representatives. Pattern Recognit. Lett. 29, 773–786 (2008)
Žalik, K.R., Žalik, B.: Validity index for clusters of different sizes and densities. Pattern Recognit. Lett. 32, 221–234 (2011)
Thalamuthu, A., Mukhopadhyay, I., Zheng, X., Tseng, G.: Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics 22(19), 2405–2412 (2006)
Carpineto, C., Romano, G.: Consensus clustering based on a new probabilistic rand index with application to subtopic retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2315–2326 (2012)
Yeh, C.C., Yang, M.S.: Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. Appl. Soft Comput. 57, 225–234 (2017)
Zaidi, F., Melançon, G.: Evaluating the quality of clustering algorithms using cluster path lengths, 10th Industrial Conference (ICDM), pp. 42–56, Berlin, Germany (2010)
Almeida, H., NetoZaki Jr., D.G.W.M.M.J.: Towards a better quality metric for graph cluster evaluation. J. Inf. Data Manag. 3(3), 378–393 (2012)
Urquhart, R.: Graph theoretical clustering based on limited neighbourhood sets. Pattern Recognit. 15(3), 173–187 (1982)
Koontz, W.L.G., Narendra, P.M., Fukunaga, K.: A graph-theoretic approach to non parametric cluster analysis. IEEE Trans. Comput. 25, 936–944 (1976)
Liu, D., Novovskiy, G.V., Sourina, O.: Effective clustering and boundary detection algorithm based on Delaunay triangulation. Pattern Recognit. Lett. 29, 1261–1273 (2008)
Inkaya, T., Kayalıgil, S., Özdemirel, N.E.: A new density-based clustering approach in graph theoretic context. Int. J. Comput. Sci. Inf. Technol. 5(2), 117–135 (2010)
UCI machine learning repository, https://archive.ics.uci.edu/ml/ (2018)
Franti, P.: Speech and image processing unit, clustering datasets. University of Eastern Finland, School of Computing (2015)
Ester, M., Kriegel, H. P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp. 226–231 (1996)
Zhu, E., Ma, R.: An effective partitional clustering algorithm based on new clustering validity index. Appl. Soft Comput. 71, 608–621 (2018)
Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Boudane, F., Berrichi, A. Gabriel graph-based connectivity and density for internal validity of clustering. Prog Artif Intell 9, 221–238 (2020). https://doi.org/10.1007/s13748-020-00209-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-020-00209-z