Advances in Data Analysis and Classification

, Volume 5, Issue 3, pp 179–200

Correcting Jaccard and other similarity indices for chance agreement in cluster analysis

  • Ahmed N. Albatineh
  • Magdalena Niewiadomska-Bugaj
Regular Article

DOI: 10.1007/s11634-011-0090-y

Cite this article as:
Albatineh, A.N. & Niewiadomska-Bugaj, M. Adv Data Anal Classif (2011) 5: 179. doi:10.1007/s11634-011-0090-y

Abstract

Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.

Keywords

Similarity indices Matching counts matrix Correction for chance agreement Jaccard index Cluster analysis Comparing partitions 

Mathematics Subject Classification (2000)

62H30 

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Ahmed N. Albatineh
    • 1
  • Magdalena Niewiadomska-Bugaj
    • 2
  1. 1.Department of Epidemiology and BiostatisticsFlorida International UniversityMiamiUSA
  2. 2.Department of StatisticsWestern Michigan UniversityKalamazooUSA

Personalised recommendations