Skip to main content

Understanding the Rand Index

  • Conference paper
  • First Online:
Advanced Studies in Classification and Data Science

Abstract

The Rand index continues to be one of the most popular indices for assessing agreement between two partitions. The Rand index combines two sources of information, object pairs put together, and object pairs assigned to different clusters, in both partitions. Via a decomposition of the Rand index into four asymmetric indices, we show that in many situations object pairs that were assigned to different clusters have considerable impact on the value of the overall Rand index.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Albatineh, A.N., Niewiadomska-Bugaj, M., Mihalko, D.: On similarity indices and correction for chance agreement. J. Classif. 23, 301–313 (2006)

    Article  MathSciNet  Google Scholar 

  • Albatineh, A.N., Niewiadomska-Bugaj, M.: Correcting Jaccard and other similarity indices for chance agreement in cluster analysis. Adv. Data Anal. Classif. 5, 179–200 (2011)

    Article  MathSciNet  Google Scholar 

  • Anderson, D.T., Bezdek, J.C., Popescu, M., Keller, J.M.: Comparing fuzzy, probabilistic, and possibilistic partitions. IEEE Trans. Fuzzy Syst. 18, 906–917 (2010)

    Article  Google Scholar 

  • Baulieu, F.B.: A classification of presence/absence based dissimilarity coefficients. J. Classif. 6, 233–246 (1989)

    Article  MathSciNet  Google Scholar 

  • Brun, M., Sima, C., Hua, J., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R.: Model-based evaluation of clustering validation measures. Pattern Recogn. 40, 807–824 (2007)

    Article  Google Scholar 

  • Dubey, A.K., Gupta, U., Jain, S.: Analysis of k-means clustering approach on the breast cancer Wisconsin dataset. Int. J. Comput. Assist. Radiol. Surg. 11, 2033–2047 (2016)

    Article  Google Scholar 

  • Fowlkes, E.B., Mallows, C.L.: A method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 553–569 (1983)

    Article  Google Scholar 

  • Gower, J.C., Warrens, M.J.: Similarity, dissimilarity, and distance, measures of. Wiley StatsRef: Statistics Reference Online (2017)

    Google Scholar 

  • Heiser, W.J., Warrens, M.J.: Families of relational statistics for 2×2 tables. In: Kaul, H., Mulder, H.M. (eds.) Advances in Interdisciplinary Applied Discrete Mathematics, pp. 25–52. World Scientific, Singapore (2010)

    Chapter  Google Scholar 

  • Hennig, C., Meilă, M., Murtagh, F., Rocci, R.: Handbook of Cluster Analysis. Chapman and Hall/CRC, New York (2015)

    Book  Google Scholar 

  • Hubert, L.J., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)

    Article  Google Scholar 

  • Huo, Z., Ding, Y., Liu, S., Oesterreich, S., Tseng, G.: Meta-analytic framework for sparse K-means to identify disease subtypes in multiple transcriptomic studies. J. Am. Stat. Assoc. 111, 27–52 (2016)

    Article  MathSciNet  Google Scholar 

  • Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)

    Article  Google Scholar 

  • Katiyar, P., Divine, M.R., Kohlhofer, U., Quintanilla-Martinez, L., Schölkopf, B., Pichler, B.J., Disselhorst, J.A.: Spectral clustering predicts tumor tissue heterogeneity using dynamic 18F-FDG PET: a complement to the standard compartmental modeling approach. J. Nucl. Med. 57, 651–657 (2016)

    Google Scholar 

  • Kaufman, L., Rousseeuw, P.: Finding groups in data: an introduction to cluster analysis. Wiley, New York (1990)

    Book  Google Scholar 

  • Kumar, V.: Cluster analysis: basic concepts and algorithms. In: Tan, P., Steinbach, M., Kumar, V. (eds.) Introduction to Data Mining, pp. 487–568. Pearson Education, New York (2005)

    Google Scholar 

  • Luo, C., Pang, W., Wang, Z.: Semi-supervised clustering on heterogeneous information networks. In: Tseng, V.S., Ho, T.B., Zhou, Z., Chen, A.L.P., Kao, H. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 548–559. Springer, Berlin (2014)

    Chapter  Google Scholar 

  • Meilă, M.: Comparing clusterings. An information based distance. J. Multivar. Anal. 98, 873–895 (2007)

    Article  MathSciNet  Google Scholar 

  • Milligan, G.W.: Clustering validation: results and implications for applied analyses. In: Arabie, P., Hubert, L.J., De Soete, G. (eds.) Clustering and Classification, pp. 341–375. World Scientific, River Edge (1996)

    Chapter  Google Scholar 

  • Milligan, G.W., Cooper, M.C.: A study of the comparability of external criteria for hierarchical cluster analysis. Multivar. Behav. Res. 21, 441–458 (1986)

    Article  Google Scholar 

  • Pfitzner, D., Leibbrandt, R., Powers, D.: Characterization and evaluation of similarity measures for pairs of clusterings. Knowl. Inf. Syst. 19, 361–394 (2009)

    Article  Google Scholar 

  • Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)

    Article  Google Scholar 

  • Rezaei, M., Fränti, P.: Set matching measures for external cluster validity. IEEE Trans. Knowl. Data Eng. 28, 2173–2186 (2016)

    Article  Google Scholar 

  • Severiano, A., Pinto, F.R., Ramirez, M., Carrio, J.A.: Adjusted Wallace coefficient as a measure of congruence between typing methods. J. Clin. Microbiol. 49, 3997–4000 (2011)

    Article  Google Scholar 

  • Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. Univ. Kansas Sci. Bull. 38, 1409–1438 (1958)

    Google Scholar 

  • Steinley, D.: Properties of the Hubert-Arabie adjusted Rand index. Psychol. Method 9, 386–396 (2004)

    Article  Google Scholar 

  • Steinley, D., Brusco, M.J., Hubert, L.J.: The variance of the adjusted Rand index. Psychol. Methods 21, 261–272 (2016)

    Article  Google Scholar 

  • Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clustering comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  MATH  Google Scholar 

  • Wallace, D.L.: Comment on a method for comparing two hierarchical clusterings. J. Am. Stat. Assoc. 78, 569–576 (1983)

    Google Scholar 

  • Warrens, M.J.: On the indeterminacy of resemblance measures for binary (presence/absence) data. J. Classif. 25, 125–136 (2008)

    Article  MathSciNet  Google Scholar 

  • Warrens, M.J.: Bounds of resemblance measures for binary (presence/absence) variables. J. Classif. 25, 195–208 (2008)

    Article  MathSciNet  Google Scholar 

  • Warrens, M.J.: On similarity coefficients for 2 × 2 tables and correction for chance. Psychometrika 73, 487–502 (2008)

    Article  MathSciNet  Google Scholar 

  • Warrens, M.J.: On the equivalence of Cohen’s kappa and the Hubert-Arabie adjusted Rand index. J. Classif. 25, 177–183 (2008)

    Article  MathSciNet  Google Scholar 

  • Warrens, M.J.: On association coefficients for 2 × 2 tables and properties that do not depend on the marginal distributions. Psychometrika 73, 777–789 (2008)

    Article  MathSciNet  Google Scholar 

  • Warrens, M.J.: Similarity measures for 2 × 2 tables. J. Intell. Fuzzy Syst. 36, 3005–3018 (2019)

    Article  Google Scholar 

  • Zeng, S., Huang, R., Kang, Z., Sang, N.: Image segmentation using spectral clustering of Gaussian mixture models. Neurocomputing 144, 346–356 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthijs J. Warrens .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Warrens, M.J., van der Hoef, H. (2020). Understanding the Rand Index. In: Imaizumi, T., Okada, A., Miyamoto, S., Sakaori, F., Yamamoto, Y., Vichi, M. (eds) Advanced Studies in Classification and Data Science. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Singapore. https://doi.org/10.1007/978-981-15-3311-2_24

Download citation

Publish with us

Policies and ethics