The area under the ROC curve as a measure of clustering quality

Jaskowiak, Pablo A.; Costa, Ivan G.; Campello, Ricardo J. G. B.

doi:10.1007/s10618-022-00829-0

The area under the ROC curve as a measure of clustering quality

Published: 26 April 2022

Volume 36, pages 1219–1245, (2022)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Pablo A. Jaskowiak ORCID: orcid.org/0000-0002-6377-3372¹,
Ivan G. Costa² &
Ricardo J. G. B. Campello³

1342 Accesses
14 Citations
8 Altmetric
Explore all metrics

Abstract

The area under the receiver operating characteristics (ROC) Curve, referred to as AUC, is a well-known performance measure in the supervised learning domain. Due to its compelling features, it has been employed in a number of studies to evaluate and compare the performance of different classifiers. In this work, we explore AUC as a performance measure in the unsupervised learning domain, more specifically, in the context of cluster analysis. In particular, we elaborate on the use of AUC as an internal/relative measure of clustering quality, which we refer to as Area Under the Curve for Clustering (AUCC). We show that the AUCC of a given candidate clustering solution has an expected value under a null model of random clustering solutions, regardless of the size of the dataset and, more importantly, regardless of the number or the (im)balance of clusters under evaluation. In addition, we elaborate on the fact that, in the context of internal/relative clustering validation as we consider, AUCC is actually a linear transformation of the Gamma criterion from Baker and Hubert (1975), for which we also formally derive a theoretical expected value for chance clusterings. We also discuss the computational complexity of these criteria and show that, while an ordinary implementation of Gamma can be computationally prohibitive and impractical for most real applications of cluster analysis, its equivalence with AUCC actually unveils a much more efficient algorithmic procedure. Our theoretical findings are supported by experimental results. These results show that, in addition to an effective and robust quantitative evaluation provided by AUCC, visual inspection of the ROC curves themselves can be useful to further assess a candidate clustering solution from a broader, qualitative perspective as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance Measures in Discrete Supervised Classification

Suboptimal Comparison of Partitions

Article 11 July 2019

Selecting the Number of Clusters K with a Stability Trade-off: An Internal Validation Criterion

Notes

In fact, non-random classifiers can also exhibit such a performance (Flach 2010).
This result was originally and preliminarily described in (Jaskowiak 2015). An equivalent result, involving the relation between AUC and the 1954 Goodman-Kruskal’s rank correlation, was recently rediscovered by Higham and Higham (2019) in an unrelated context, involving measures of resolution in meta-cognitive studies.
Assuming that (a) all dissimilarities \(||\cdot ||\) are given in advance (otherwise an additional dissimilarity cost would be required – \(O(n^2d)\) in case of Euclidean distance, where d is the dimension of the data space), and (b) cluster sizes are balanced (all proportional to n/k, possibly differing by a constant factor) (Vendramin et al. 2010).
Apart from the cost to obtain the dissimilarity matrix, \({\mathbf {D}}\), which is also required by Gamma.
Note that \(C_m\) is not necessarily different from \(C_l\), they may or may not be the same cluster in partition \({\mathcal {C}}_k\).
This dataset consists of 9 clusters, with 50 objects each, obtained from normal distributions with variance equal to 4.5, centered at (0, 0), (0, 20), (0, 40), (20, 0), (20, 20), (20, 40), (40, 0), (40, 20), and (40, 40).

References

Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr 12(5):613
Article Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recognit 46(1):243–256
Article Google Scholar
Baker FB, Hubert LJ (1975) Measuring the power of hierarchical cluster analysis. J Am Stat Assoc 70(349):31–38
Article MATH Google Scholar
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst, Man Cybern, Part B 28(3):301–315
Article Google Scholar
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30(7):1145–1159
Article Google Scholar
Brock G, Pihur V, Datta S, Datta S (2008) clValid: an R package for cluster validation. J Stat Softw 25(4):1–22
Article Google Scholar
Calinski R, Harabasz J (1974) A dentrite method for cluster analysis. Commun Stat 3:1–27
MATH Google Scholar
Ceriani L, Verme P (2012) The origins of the gini index: extracts from variabilità e mutabilità (1912) by corrado gini. J Econ Inequal 10(3):421–443
Article Google Scholar
Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining the relevant number of clusters in a data set. J Stat Softw 61(6):1–36
Article Google Scholar
Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1:224–227
Article Google Scholar
Desgraupes B (2016) clusterCrit: clustering indices. R package version 1(2):7
Google Scholar
Dunn J (1974) Well separated clusters and optimal fuzzy partitions. J Cybern 4:95–104
Article MathSciNet MATH Google Scholar
Everitt B (1974) Cluster analysis. Heinemann educational for the social science research council London
Färber I, Günnemann S, Kriegel H-P, Kröger P, Müller E, Schubert E, Seidl T, Zimek A (2010). On using class-labels in evaluation of clusterings. In: MultiClust: 1st international workshop on discovering, summarizing and using multiple clusterings, Washington, DC
Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. Technical report
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874
Article MathSciNet Google Scholar
Flach P, Hernández-Orallo J, Ferri C (2011) A coherent interpretation of AUC as a measure of aggregated classification performance. In: International Conference on Machine Learning — ICML
Flach PA (2010) Encyclopedia of machine learning, Chapter ROC Analysis, pp. 869–875. Boston, MA: Springer US
Giancarlo R, Lo Bosco G, Pinello L, Utro F (2013) A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data analysis. BMC Bioinformatics 14(Suppl 1):S6
Gini C (1912) Variabilità e mutabilità. Tipogr. di P, Cuppini
Google Scholar
Goodman L, Kruskal W (1954) Measures of association for cross-classifications. J Am Stat Assoc 49:732–764
MATH Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17(2–3):107–145
Article MATH Google Scholar
Halkidi M, Vazirgiannis M (2008) A density-based cluster validity approach using multi-representatives. Pattern Recognit Lett 29:773–786
Article Google Scholar
Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186
Article MATH Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Article Google Scholar
Hennig C (2015) Pattern recognition letters. What are the true clusters?, 64, 53–62
Hennig C, Meila M, Murtagh F, Rocci R (2015) Handbook of cluster analysis. CRC Press
Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71–91
Article MathSciNet MATH Google Scholar
Higham PA, Higham DP (2019) New improved gamma: enhancing the accuracy of Goodman-Kruskal’s gamma using ROC curves. Behav Res Methods 51(1):108–125
Article Google Scholar
Hill RS (1980) A stopping rule for partitioning dendrograms. Botanical Gazette 141:321–324
Article Google Scholar
Hruschka ER, Campello RJGB, Castro LN (2004) Improving the efficiency of a clustering genetic algorithm. In: Ibero-American conference on artificial intelligence – IBERAMIA 3315: 861–870
Hruschka ER, Campello RJGB, de Castro LN (2006) Evolving clusters in gene-expression data. Inf Sci 176(13):1898–1927
Article MathSciNet Google Scholar
Huang J, Ling CX (2005) Using auc and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article MATH Google Scholar
Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 10:1072–1080
Article Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall
Jaskowiak PA (2015) On the evaluation of clustering results: measures, ensembles, and gene expression data analysis. Ph. D. thesis, University of São Paulo, Brazil (https://doi.org/10.11606/T.55.2016.tde-23032016-111454)
Jaskowiak PA, Campello RJGB, Costa IG (2012). Evaluating correlation coefficients for clustering gene expression profiles of cancer. In: 7th Brazilian symposium on bioinformatics (BSB2012), Volume 7409 of LNCS, pp. 120–131. Springer / Berlin Heidelberg
Jaskowiak PA, Campello RJGB, Costa IG (2014) On the selection of appropriate distances for gene expression data clustering. BMC bioinformatics 15 Suppl 2(Suppl 2):S2
Jaskowiak PA, Campello RJGB, Costa Filho IG (2013) Proximity measures for clustering gene expression microarray data: a validation methodology and a comparative analysis. IEEE/ACM Trans Comput Biol Bioinf 10(4):845–857
Article Google Scholar
Jaskowiak PA, Moulavi D, Furtado ACS, Campello RJGB, Zimek A, Sander J (2016) On strategies for building effective ensembles of relative clustering validity criteria. Knowl Inf Syst 47(2):329–354
Article Google Scholar
Kim B, Lee H, Kang P (2018) Integrating cluster validity indices based on data envelopment analysis. Appl Soft Comput 64:94–108
Article Google Scholar
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: 5th Berkeley symposium on mathematics. Statistics, and probabilistics 1:281–297
Majnik M, Bosnić Z (2013) Roc analysis of classifiers in machine learning: a survey. Intell Data Anal 17(3):531–558
Article Google Scholar
Mason SJ, Graham NE (2002) Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: statistical significance and interpretation. Q J Royal Meteorol Soc 128(584):2145–2166
Article Google Scholar
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal and Mach Intell 24(12):1650–1654
Article Google Scholar
Milligan GW (1981) A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 46(2):187–199
Article MATH Google Scholar
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179
Article Google Scholar
Moulavi D, Jaskowiak PA, Campello RJGB, Zimek A, Sander J (2014) Density-based clustering validation. In: Proceedings of the 14th SIAM international conference on data mining (SDM), Philadelphia, PA, pp. 839–847
Nguyen T, Viehman J, Yeboah D, Olbricht GR, Obafemi-Ajayi T (2020) Statistical comparative analysis and evaluation of validation indices for clustering optimization. In: 2020 IEEE symposium series on computational intelligence (SSCI), pp. 3081–3090
Pakhira MK, Bandyopadhyay S, Maulik U (2004) Validity index for crisp and fuzzy clusters. Pattern Recognit 37:487–501
Article MATH Google Scholar
Pearson K (1895) Contributions to the mathematical theory of evolution. iii. regression, heredity, and panmixia. Proc Royal Soc London 59:69–71
Google Scholar
Provost F, Fawcett T (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proceedings of the third international conference on knowledge discovery and data mining, pp. 43–48. AAAI Press
Provost FJ, Fawcett T, Kohavi R (1998). The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the fifteenth international conference on machine learning, ICML ’98, San Francisco, CA, USA, pp. 445–453. Morgan Kaufmann Publishers Inc
Ratkowsky DA, Lance GN (1978) A criterion for determining the number of groups in a classification. Aust Comput J 10:115–117
Google Scholar
Romano S, Vinh NX, Bailey J, Verspoor K (2016) Adjusting for chance clustering comparison measures. J Mach Learn Res 17(1):4635–4666
MathSciNet MATH Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article MATH Google Scholar
Spackman KA (1989) Signal detection theory: Valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning, San Francisco, CA, USA, pp. 160–163. Morgan Kaufmann Publishers Inc
Vendramin L, Campello RJGB, Hruschka ER (2009) On the comparison of relative clustering validation criteria. In: Proceedings of the 9th SIAM international conference on data mining (SDM), Sparks, NV, pp. 733–744
Vendramin L, Campello RJGB, Hruschka ER (2010) Relative clustering validity criteria: a comparative overview. Stat Anal Data Min 3(4):209–235
Article MathSciNet MATH Google Scholar
Vendramin L, Jaskowiak PA, Campello RJGB (2013) On the combination of relative clustering validity criteria. In: Proceedings of the 25th International conference on scientific and statistical database management (SSDBM), Baltimore, MD, pp. 4:1–12
Xu R, Wunsch D, Wunsch D II (2009) Clustering. IEEE Press
Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL (2001) Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10):977–987
Article Google Scholar
Zhou S, Liu F, Song W (2021) Estimating the optimal number of clusters via internal validity index. Neural Process Lett 53(2):1013–1034
Article Google Scholar

Download references

Acknowledgements

This project was partially funded by Brazilian research agencies FAPESP (Process #2011/04247-5) and CNPq (#302161/2017-1). Ivan G. Costa was supported by the Interdisciplinary Center for Clinical Research (IZKF) Faculty of Medicine at the RWTH Aachen.

Author information

Authors and Affiliations

Federal University of Santa Catarina (UFSC), Joinville, SC, Brazil
Pablo A. Jaskowiak
Institute for Computational Genomics, RWTH Aachen University Medical Faculty, Aachen, Germany
Ivan G. Costa
School of Mathematical and Physical Sciences, University of Newcastle, Callaghan, Australia
Ricardo J. G. B. Campello

Authors

Pablo A. Jaskowiak
View author publications
You can also search for this author in PubMed Google Scholar
Ivan G. Costa
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo J. G. B. Campello
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pablo A. Jaskowiak.

Additional information

Responsible editor: Albrecht Zimmermann and Peggy Cellier.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jaskowiak, P.A., Costa, I.G. & Campello, R.J.G.B. The area under the ROC curve as a measure of clustering quality. Data Min Knowl Disc 36, 1219–1245 (2022). https://doi.org/10.1007/s10618-022-00829-0

Download citation

Received: 04 May 2021
Accepted: 16 March 2022
Published: 26 April 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10618-022-00829-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The area under the ROC curve as a measure of clustering quality

Abstract

Access this article

Similar content being viewed by others

Performance Measures in Discrete Supervised Classification

Suboptimal Comparison of Partitions

Selecting the Number of Clusters K with a Stability Trade-off: An Internal Validation Criterion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The area under the ROC curve as a measure of clustering quality

Abstract

Access this article

Similar content being viewed by others

Performance Measures in Discrete Supervised Classification

Suboptimal Comparison of Partitions

Selecting the Number of Clusters K with a Stability Trade-off: An Internal Validation Criterion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation