Advertisement

Advances in Data Analysis and Classification

, Volume 13, Issue 1, pp 303–323 | Cite as

sARI: a soft agreement measure for class partitions incorporating assignment probabilities

  • Abby FlyntEmail author
  • Nema Dean
  • Rebecca Nugent
Regular Article
  • 135 Downloads

Abstract

Agreement indices are commonly used to summarize the performance of both classification and clustering methods. The easy interpretation/intuition and desirable properties that result from the Rand and adjusted Rand indices, has led to their popularity over other available indices. While more algorithmic clustering approaches like k-means and hierarchical clustering produce hard partition assignments (assigning observations to a single cluster), other techniques like model-based clustering include information about the certainty of allocation of objects through class membership probabilities (soft partitions). To assess performance using traditional indices, e.g., the adjusted Rand index (ARI), the soft partition is mapped to a hard set of assignments, which commonly overstates the certainty of correct assignments. This paper proposes an extension of the ARI, the soft adjusted Rand index (sARI), with similar intuition and interpretation but also incorporating information from one or two soft partitions. It can be used in conjunction with the ARI, comparing the similarities of hard to soft, or soft to soft partitions to the similarities of the mapped hard partitions. Simulation study results support the intuition that in general, mapping to hard partitions tends to increase the measure of similarity between partitions. In applications, the sARI more accurately reflects the cluster boundary overlap commonly seen in real data.

Keywords

Adjusted Rand index Model-based clustering Mixture models Soft partition Posterior probabilities Class membership probabilities 

Mathematics Subject Classification

62H30 91C20 62H86 

References

  1. Amodio S, D’Ambrosio A, Iorio C, Siciliano R (2015) Adjusted concordance index, an extension of the adjusted rand index to fuzzy partitions. arXiv preprint arXiv:1509.00803
  2. Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3): 803–821Google Scholar
  3. Bezdek JC (1981) Objective function clustering. In: Pattern recognition with fuzzy objective function algorithms. Springer, Boston, MA, pp 43–93Google Scholar
  4. Campello RJGB (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841CrossRefGoogle Scholar
  5. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38MathSciNetzbMATHGoogle Scholar
  6. Downton M, Brennan T (1980) Comparing classifications: an evaluation of several coefficients of partition agreement. Classif Soc Bull 4(4):53–54Google Scholar
  7. Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRefzbMATHGoogle Scholar
  8. Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569CrossRefzbMATHGoogle Scholar
  9. Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefzbMATHGoogle Scholar
  10. Fraley C, Raftery AE (2007) Model-based methods of classification: using the mclust software in chemometrics. J Stat Softw 18(6):1–13CrossRefGoogle Scholar
  11. Hartigan JA (1975) Clustering algorithms. Wiley, New YorkzbMATHGoogle Scholar
  12. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefzbMATHGoogle Scholar
  13. Huellermeyer E, Rifqi M, Henzgen S, Senge R (2012) Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Trans Fuzzy Syst 20(3):546–556CrossRefGoogle Scholar
  14. Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et du jura. Bull de la Société Vaudoise des Sciences Naturelles 37(142):547–579Google Scholar
  15. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297Google Scholar
  16. McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New YorkzbMATHGoogle Scholar
  17. McLachlan G, Peel D (2004) Finite mixture models. Wiley, New YorkzbMATHGoogle Scholar
  18. McNicholas PD (2016) Model-based clustering. J Classif 33(3):331–373MathSciNetCrossRefzbMATHGoogle Scholar
  19. Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering. Springer, BerlinzbMATHGoogle Scholar
  20. Morey LC, Agresti A (1984) The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educ Psychol Meas 44(1):33–37CrossRefGoogle Scholar
  21. Qiu W, Joe H (2006) Separation index and partial membership for clustering. Comput Stat Data Anal 50(3):585–603MathSciNetCrossRefzbMATHGoogle Scholar
  22. Qiu W, Joe H (2015) clusterGeneration: random cluster generation (with specified degree of separation). R package version 1.3.4. https://CRAN.R-project.org/package=clusterGeneration
  23. R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
  24. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRefGoogle Scholar
  25. Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R J 8(1):289CrossRefGoogle Scholar
  26. Steinley D (2004) Properties of the Hubert–Arabie adjusted Rand index. Psychol Methods 9(3):386CrossRefGoogle Scholar
  27. Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244MathSciNetCrossRefGoogle Scholar
  28. Wolfe JH (1963) Object cluster analysis of social areas. Ph.D. thesis, University of CaliforniaGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of MathematicsBucknell UniversityLewisburgUSA
  2. 2.School of Mathematics and StatisticsUniversity of GlasgowGlasgowUK
  3. 3.Department of Statistics and Data ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations