Abstract
One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 255–323 (1999)
Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7), 0036.1–0036.21 (2002)
Yao, Z., Choi, B.: Automatically discovering the number of clusters in web page datasets. In: Proceedings of the 2005 International Conference on Data Mining, pp. 3–9 (2005)
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6–17 (2002)
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation 16(6), 1299–1323 (2004)
Zhang, J., Modestino, J.W.: A model-fitting approach to cluster validation with application to stochastic model-based image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 1009–1017 (1990)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Elsevier, Amsterdam (2003)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Chichester (2004)
Machado, J.B., Amaral, W.C., Campello, R.: Design of obf-ts fuzzy models based on multiple clustering validity criteria. In: Proc. of the 19th IEEE International Conference on Tools with Artificial Intelligence (2007)
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Transactions on PAMI 1, 224–227 (1979)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistic 3, 1–27 (1974)
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3, 32–57 (1973)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part i. SIGMOD Rec. 31(2), 40–45 (2002)
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: Finding the optimal partitioning of a data set. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 187–194 (2001)
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. on Systems, Man, and Cybernetics B 28(3), 301–315 (1998)
Shim, Y., Chung, J., Choi, I.C.: A comparison study of cluster validity indices using a nonhierarchical clustering algorithm. In: Proc. of the 2005 Int. Conf. on Comp. Intelligence for Modelling, Control and Automation, and Int. Conf. on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2005) (2005)
Kittler, J., Hataf, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Duin, R.P.W.: The combining classifier: to train or not to train? In: Proceedings of the International Conference on Pattern Recognition, Quebec, Canada (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kryszczuk, K., Hurley, P. (2010). Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12127-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12126-5
Online ISBN: 978-3-642-12127-2
eBook Packages: Computer ScienceComputer Science (R0)