Estimation of the Number of Clusters Using Multiple Clustering Validity Indices

Kryszczuk, Krzysztof; Hurley, Paul

doi:10.1007/978-3-642-12127-2_12

Estimation of the Number of Clusters Using Multiple Clustering Validity Indices

Krzysztof Kryszczuk¹⁹ &
Paul Hurley¹⁹

Conference paper

1479 Accesses
26 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5997))

Abstract

One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 255–323 (1999)
Article Google Scholar
Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7), 0036.1–0036.21 (2002)
Article Google Scholar
Yao, Z., Choi, B.: Automatically discovering the number of clusters in web page datasets. In: Proceedings of the 2005 International Conference on Data Mining, pp. 3–9 (2005)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6–17 (2002)
Google Scholar
Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation 16(6), 1299–1323 (2004)
Article MATH Google Scholar
Zhang, J., Modestino, J.W.: A model-fitting approach to cluster validation with application to stochastic model-based image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 1009–1017 (1990)
Article Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Elsevier, Amsterdam (2003)
Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Chichester (2004)
Book MATH Google Scholar
Machado, J.B., Amaral, W.C., Campello, R.: Design of obf-ts fuzzy models based on multiple clustering validity criteria. In: Proc. of the 19th IEEE International Conference on Tools with Artificial Intelligence (2007)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)
Article Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Transactions on PAMI 1, 224–227 (1979)
Google Scholar
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistic 3, 1–27 (1974)
MathSciNet Google Scholar
Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3, 32–57 (1973)
Article MATH MathSciNet Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part i. SIGMOD Rec. 31(2), 40–45 (2002)
Article Google Scholar
Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: Finding the optimal partitioning of a data set. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 187–194 (2001)
Google Scholar
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. on Systems, Man, and Cybernetics B 28(3), 301–315 (1998)
Article Google Scholar
Shim, Y., Chung, J., Choi, I.C.: A comparison study of cluster validity indices using a nonhierarchical clustering algorithm. In: Proc. of the 2005 Int. Conf. on Comp. Intelligence for Modelling, Control and Automation, and Int. Conf. on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2005) (2005)
Google Scholar
Kittler, J., Hataf, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Article Google Scholar
Duin, R.P.W.: The combining classifier: to train or not to train? In: Proceedings of the International Conference on Pattern Recognition, Quebec, Canada (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Zurich Research Laboratory, Switzerland
Krzysztof Kryszczuk & Paul Hurley

Authors

Krzysztof Kryszczuk
View author publications
You can also search for this author in PubMed Google Scholar
Paul Hurley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Informatics Science, Nile University, 12677, Giza, Egypt
Neamat El Gayar
Centre for Vision, Speech and Signal Processing, University of Surrey, GU2 7XH, Guildford, Surrey, UK
Josef Kittler
Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, 09123, Cagliari, Italy
Fabio Roli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kryszczuk, K., Hurley, P. (2010). Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-12127-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12126-5
Online ISBN: 978-3-642-12127-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics