Date: 26 Jun 2008
Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
In agglomerative hierarchical clustering, pair-group methods suffer from a problem of non-uniqueness when two or more distances between different clusters coincide during the amalgamation process. The traditional approach for solving this drawback has been to take any arbitrary criterion in order to break ties between distances, which results in different hierarchical classifications depending on the criterion followed. In this article we propose a variable-group algorithm that consists in grouping more than two clusters at the same time when ties occur. We give a tree representation for the results of the algorithm, which we call a multidendrogram, as well as a generalization of the Lance andWilliams’ formula which enables the implementation of the algorithm in a recursive way.
The authors thank A. Arenas for discussion and helpful comments. This work was partially supported by DGES of the Spanish Government Project No. FIS2006–13321–C02–02 and by a grant of Universitat Rovira i Virgili.
ARNAU, V., MARS, S., and MARÍN, I. (2005), “Iterative Cluster Analysis of Protein Interaction Data,” Bioinformatics, 21(3), 364–378.CrossRef
BACKELJAU, T., DE BRUYN, L., DE WOLF, H., JORDAENS, K., VAN DONGEN, S., and WINNEPENNINCKX, B. (1996), “Multiple UPGMA and Neighbor-Joining Trees and the Performance of Some Computer Packages,” Molecular Biology and Evolution, 13(2), 309–313.
GORDON, A.D. (1999), Classification (2nd ed.), London/Boca Raton, FL:Chapman & Hall/CRC.MATH
HART, G. (1983), “The Occurrence of Multiple UPGMA Phenograms,” in Numerical Taxonomy, ed. J. Felsenstein, Berlin Heidelberg: Springer-Verlag, pp. 254–258.
LANCE, G.N., and WILLIAMS, W.T. (1966), “A Generalized Sorting Strategy for Computer Classifications,” Nature, 212, 218.CrossRef
MACCUISH, J., NICOLAOU, C., and MACCUISH, N.E. (2001), “Ties in Proximity and Clustering Compounds,” Journal of Chemical Information and Computer Sciences, 41, 134–146.CrossRef
SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy: The Principles and Practice of Numerical Classification, San Francisco: W. H. Freeman and Company.MATH
VAN DER KLOOT, W.A., SPAANS, A.M.J., and HEISER, W.J. (2005), “Instability of Hierarchical Cluster Analysis Due to Input Order of the Data: The Permu CLUSTER Solution,” Psychological Methods, 10(4), 468–476.CrossRef
- Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms
Journal of Classification
Volume 25, Issue 1 , pp 43-65
- Cover Date
- Print ISSN
- Online ISSN
- Additional Links
- Agglomerative methods
- Cluster analysis
- Hierarchical classification
- Lance and Williams’ formula
- Ties in proximity
- Industry Sectors