A Robust Methodology for Comparing Performances of Clustering Validity Criteria

Vendramin, Lucas; Campello, Ricardo J. G. B.; Hruschka, Eduardo R.

doi:10.1007/978-3-540-88190-2_29

Lucas Vendramin³,
Ricardo J. G. B. Campello³ &
Eduardo R. Hruschka³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5249))

Included in the following conference series:

Brazilian Symposium on Artificial Intelligence

1216 Accesses
1 Citations

Abstract

Many different clustering validity measures exist that are very useful in practice as quantitative criteria for evaluating the quality of data partitions. However, it is a hard task for the user to choose a specific measure when he or she faces such a variety of possibilities. The present paper introduces an alternative, robust methodology for comparing clustering validity measures that has been especially designed to get around some conceptual flaws of the comparison paradigm traditionally adopted in the literature. An illustrative example involving the comparison of the performances of four well-known validity measures over a collection of 7776 data partitions of 324 different data sets is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kaufman, L., Rousseeuw, P.: Finding Groups in Data. Wiley, Chichester (1990)
Book MATH Google Scholar
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis, 4th edn. Arnold (2001)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
MATH Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 17, 107–145 (2001)
Article MATH Google Scholar
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2), 159–179 (1985)
Article Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Article Google Scholar
Calinski, R.B., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics 3, 1–27 (1974)
Article MathSciNet MATH Google Scholar
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Article MathSciNet MATH Google Scholar
Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Trans. on Systems, Man and Cybernetics − B 28(3), 301–315 (1998)
Article Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1650–1654 (2002)
Article Google Scholar
Casella, G., Berger, R.L.: Statistical Inference, 2nd edn. Duxbury Press (2001)
Google Scholar
Milligan, G.W.: A monte carlo study of thirdy internal criterion measures for cluster analysis. Psychometrika 46(2), 187–199 (1981)
Article MATH Google Scholar
Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78, 553–569 (1983)
Article MATH Google Scholar
Milligan, G.W., Cooper, M.C.: A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research 21, 441–458 (1986)
Article Google Scholar
Triola, M.F.: Elementary Statistics. Addison Wesley Longman (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Sciences, University of São Paulo at São Carlos SCC/ICMC/USP, C.P. 668, São Carlos, SP, 13560-970, Brazil
Lucas Vendramin, Ricardo J. G. B. Campello & Eduardo R. Hruschka

Authors

Lucas Vendramin
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo R. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Systems Engineering and Computer Science - COPPE, Federal University of Rio de Janeiro (UFRJ), Brazil
Gerson Zaverucha
Department of Automation and Systems, Federal University of Santa Catarina, CEP 88.040-900, Brazil
Augusto Loureiro da Costa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vendramin, L., Campello, R.J.G.B., Hruschka, E.R. (2008). A Robust Methodology for Comparing Performances of Clustering Validity Criteria. In: Zaverucha, G., da Costa, A.L. (eds) Advances in Artificial Intelligence - SBIA 2008. SBIA 2008. Lecture Notes in Computer Science(), vol 5249. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88190-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-88190-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88189-6
Online ISBN: 978-3-540-88190-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics