, Volume 5, Issue 3, pp 179-200
Date: 10 Jun 2011

Correcting Jaccard and other similarity indices for chance agreement in cluster analysis

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approximations such that approximate correction is possible. A second approach is based on Taylor series expansion. A simulation study illustrates the effectiveness of the resulting correction of similarity indices using structured and unstructured data generated from bivariate normal distributions.