Correcting Jaccard and other similarity indices for chance agreement in cluster analysis
- First Online:
- Cite this article as:
- Albatineh, A.N. & Niewiadomska-Bugaj, M. Adv Data Anal Classif (2011) 5: 179. doi:10.1007/s11634-011-0090-y
- 454 Downloads
Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.