Skip to main content

Estimation of the Number of Clusters Using Multiple Clustering Validity Indices

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5997))

Abstract

One of the challenges in unsupervised machine learning is finding the number of clusters in a dataset. Clustering Validity Indices (CVI) are popular tools used to address this problem. A large number of CVIs have been proposed, and reports that compare different CVIs suggest that no single CVI can always outperform others. Following suggestions found in prior art, in this paper we formalize the concept of using multiple CVIs for cluster number estimation in the framework of multi-classifier fusion. Using a large number of datasets, we show that decision-level fusion of multiple CVIs can lead to significant gains in accuracy in estimating the number of clusters, in particular for high-dimensional datasets with large number of clusters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  2. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31(3), 255–323 (1999)

    Article  Google Scholar 

  3. Dudoit, S., Fridlyand, J.: A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology 3(7), 0036.1–0036.21 (2002)

    Article  Google Scholar 

  4. Yao, Z., Choi, B.: Automatically discovering the number of clusters in web page datasets. In: Proceedings of the 2005 International Conference on Data Mining, pp. 3–9 (2005)

    Google Scholar 

  5. Ben-Hur, A., Elisseeff, A., Guyon, I.: A stability based method for discovering structure in clustered data. In: Pacific Symposium on Biocomputing, pp. 6–17 (2002)

    Google Scholar 

  6. Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Computation 16(6), 1299–1323 (2004)

    Article  MATH  Google Scholar 

  7. Zhang, J., Modestino, J.W.: A model-fitting approach to cluster validation with application to stochastic model-based image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 1009–1017 (1990)

    Article  Google Scholar 

  8. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 2nd edn. Elsevier, Amsterdam (2003)

    Google Scholar 

  9. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, Chichester (2004)

    Book  MATH  Google Scholar 

  10. Machado, J.B., Amaral, W.C., Campello, R.: Design of obf-ts fuzzy models based on multiple clustering validity criteria. In: Proc. of the 19th IEEE International Conference on Tools with Artificial Intelligence (2007)

    Google Scholar 

  11. Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)

    Article  Google Scholar 

  12. Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179 (1985)

    Article  Google Scholar 

  13. Davies, D.L., Bouldin, D.W.: A clustering separation measure. IEEE Transactions on PAMI 1, 224–227 (1979)

    Google Scholar 

  14. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistic 3, 1–27 (1974)

    MathSciNet  Google Scholar 

  15. Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3, 32–57 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  16. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part i. SIGMOD Rec. 31(2), 40–45 (2002)

    Article  Google Scholar 

  17. Halkidi, M., Vazirgiannis, M.: Clustering validity assessment: Finding the optimal partitioning of a data set. In: Proceedings of the 2001 IEEE International Conference on Data Mining, pp. 187–194 (2001)

    Google Scholar 

  18. Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. on Systems, Man, and Cybernetics B 28(3), 301–315 (1998)

    Article  Google Scholar 

  19. Shim, Y., Chung, J., Choi, I.C.: A comparison study of cluster validity indices using a nonhierarchical clustering algorithm. In: Proc. of the 2005 Int. Conf. on Comp. Intelligence for Modelling, Control and Automation, and Int. Conf. on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC 2005) (2005)

    Google Scholar 

  20. Kittler, J., Hataf, M., Duin, R., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)

    Article  Google Scholar 

  21. Duin, R.P.W.: The combining classifier: to train or not to train? In: Proceedings of the International Conference on Pattern Recognition, Quebec, Canada (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kryszczuk, K., Hurley, P. (2010). Estimation of the Number of Clusters Using Multiple Clustering Validity Indices. In: El Gayar, N., Kittler, J., Roli, F. (eds) Multiple Classifier Systems. MCS 2010. Lecture Notes in Computer Science, vol 5997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12127-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12127-2_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12126-5

  • Online ISBN: 978-3-642-12127-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics