Skip to main content
Log in

Dissimilarity and similarity measures for comparing dendrograms and their applications

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this paper we propose a new index Z for measuring the dissimilarity between two hierarchical clusterings (or dendrograms). This index is a metric since it satisfies the axioms of non-negativity, symmetry and triangle inequality. A desirable property of this index is that it can be decomposed into the contributions pertaining to each stage of the hierarchies. We show the relations of such components with the currently used criteria for comparing two partitions. We obtain a global similarity index as the complement to one of the suggested dissimilarity and we derive its adjustment for agreement due to chance. We obtain similarity indexes pertaining to each stage of the hierarchies as the complement to one of the additive parts of the global distance Z. We consider the use of the proposed distance for more than two dendrograms and its use for the consensus of classifications and variable selection in cluster analysis. A series of simulation experiments and an application to a real data set are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Albatineh AN, Niewiadomska-Bugaj M, Mihalko D (2006) On similarity indexes and correction for chance agreement. J Classif 23: 301–313

    Article  MathSciNet  Google Scholar 

  • Albatineh AN, Niewiadomska-Bugaj M (2011) Correcting Jaccard and other similarity indexes for chance agreement in cluster analysis. Adv Data Anal Classif 5: 179–200

    Article  MathSciNet  Google Scholar 

  • Baker FB (1974) Stability of two hierarchical grouping techniques. Case I: sensitivity to data errors. JASA 69: 440–445

    Google Scholar 

  • Brusco MJ, Steinley D (2008) A binary integer program to maximize the agreement between partitions. J Classif 25: 185–193

    Article  MathSciNet  MATH  Google Scholar 

  • Day WHE (1985) Optimal algorithms for comparing trees with labeled leaves. J Classif 2: 7–28

    Article  MATH  Google Scholar 

  • Day WHE (1986) Foreword: comparison and consensus of classification. J Classif 3: 183–185

    Article  Google Scholar 

  • Denoeud L (2008) Transfer distance between partitions. Adv Data Anal Classif 2: 279–294

    Article  MathSciNet  Google Scholar 

  • Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. JASA 78: 553–569

    MATH  Google Scholar 

  • Fowlkes EB, Gnanadesikan R, Kettenring JR (1988) Variable selection in clustering. J Classif 5: 205–228

    Article  MathSciNet  Google Scholar 

  • Fraiman R, Justel A, Svarc M (2008) Selection of variables for cluster analysis and classification rules. JASA 103: 1294–1303

    MathSciNet  MATH  Google Scholar 

  • Gordon AD, Vichi M (1998) Partitions of partitions. J Classif 15: 265–285

    Article  MATH  Google Scholar 

  • Hubert LJ, Arabie P (1985) Comparing Partitions. J Classif 2: 193–218

    Article  Google Scholar 

  • Krieger AM, Green PE (1999) A generalized Rand-index methods for consensus clusterings of separate partitions of the same data base. J Classif 16: 63–89

    Article  Google Scholar 

  • Lapointe FJ, Legendre P (1995) Comparison tests for dendrograms: a comparative evaluation. J Classif 12: 265–282

    Article  Google Scholar 

  • Meila M (2007) Comparing clustering. An information based distance. J Multivar Anal 98: 873–895

    Article  MathSciNet  MATH  Google Scholar 

  • Mesa H, Restrepo G (2008) On dendrograms and topology. Commun Math Comput Chem 60: 371–384

    MathSciNet  MATH  Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering methods. JASA 66: 846–850

    Google Scholar 

  • Reilly C, Wang C, Ritherford M (2005) A rapid method for the comparison of cluster analyses. Stat Sin 15: 19–33

    MATH  Google Scholar 

  • Restrepo G, Mesa H, Llanos EJ (2007) Three dissimilarity measures to contrast dendrograms. J Chem Inf Model 47: 761–770

    Article  Google Scholar 

  • Rohlf FJ (1982) Consensus indexes for comparing classifications. Math Biosci 59: 131–144

    Article  MathSciNet  Google Scholar 

  • Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11: 33–40

    Article  Google Scholar 

  • Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38: 1409–1438

    Google Scholar 

  • Steinley D, Brusco MJ (2008) Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika 73: 125–144

    Article  MathSciNet  MATH  Google Scholar 

  • Tadesse MG, Sha N, Vannucci N (2005) Bayesian variable selection in clustering high dimensional data. JASA 100: 602–617

    MathSciNet  MATH  Google Scholar 

  • Wallace DL (1983) Comment on the paper “A method for comparing two hierarchical clusterings”. JASA 78: 569–578

    Google Scholar 

  • Wang S, Zhu S (2008) Variable selection for model based high dimensional clustering and its application to microarray data. Biometrics 64: 440–448

    Article  MathSciNet  MATH  Google Scholar 

  • Warrens MJ (2008) On the equivalence of Cohen’s Kappa and the Hubert-Arabie adjusted Rand index. J Classif 25: 177–183

    Article  MathSciNet  MATH  Google Scholar 

  • Waterman MS, Smith TF (1978) On the similarity of dendrograms. J Theor Biol 73: 789–800

    Article  MathSciNet  Google Scholar 

  • Youness G, Saporta G (2010) Comparing partitions of two sets of units based on the same variables. Adv Data Anal Classif 4: 53–64

    Article  MathSciNet  Google Scholar 

  • Zani S (1986) Some measures for the comparison of data matrices. In: Proceedings of the XXXIII meeting of the Italian Statistical Society Bari, Italy, pp 157–169

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isabella Morlini.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Morlini, I., Zani, S. Dissimilarity and similarity measures for comparing dendrograms and their applications. Adv Data Anal Classif 6, 85–105 (2012). https://doi.org/10.1007/s11634-012-0106-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-012-0106-2

Keywords

Mathematics Subject Classification

Navigation