Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering

Abstract

Agglomerative hierarchical clustering can be implemented with several strategies that differ in the way elements of a collection are grouped together to build a hierarchy of clusters. Here we introduce versatile linkage, a new infinite system of agglomerative hierarchical clustering strategies based on generalized means, which go from single linkage to complete linkage, passing through arithmetic average linkage and other clustering methods yet unexplored such as geometric linkage and harmonic linkage. We compare the different clustering strategies in terms of cophenetic correlation, mean absolute error, and also tree balance and space distortion, two new measures proposed to describe hierarchical trees. Unlike the β-flexible clustering system, we show that the versatile linkage family is space-conserving.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Aeberhard, S., Coomans, D., De Vel, O. (1992). Comparison of classifiers in high dimensional settings. Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep. no. 92-02.

  2. Belbin, L., Faith, D.P., Milligan, G.W. (1992). A comparison of two approaches to beta-flexible clustering. Multivariate Behavioral Research, 27(3), 417–433.

    Article  Google Scholar 

  3. Bradley, P.E. (2010). Mumford dendrograms. The Computer Journal, 53(4), 393–404.

    Article  Google Scholar 

  4. Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the Baire metric. Journal of Classification, 29(2), 118–143.

    MathSciNet  Article  Google Scholar 

  5. Day, W.H.E., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.

    Article  Google Scholar 

  6. Dubien, J.L., & Warde, W.D. (1979). A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canadian Journal of Statistics, 7, 29–38.

    MathSciNet  Article  Google Scholar 

  7. Fernández, A., & Gómez, S. (2008). Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification, 25(1), 43–65.

    MathSciNet  Article  Google Scholar 

  8. Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  9. Gómez, S., & Fernández, A. (2018). MultiDendrograms: a hierarchical clustering tool (Version 5.0). http://deim.urv.cat/~sergio.gomez/multidendrograms.php.

  10. Gordon, A.D. (1999). Classification, 2nd edn. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  11. Hart, G. (1983). The occurrence of multiple UPGMA phenograms. In J. Felsenstein (Ed.) Numerical taxonomy (pp. 254–258). Berlin: Springer.

  12. Jossinet, J. (1996). Variability of impedivity in normal and pathological breast tissue. Medical and Biological Engineering and Computing, 34(5), 346–350.

    Article  Google Scholar 

  13. Lance, G.N., & Williams, W.T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.

    Article  Google Scholar 

  14. Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.

    Article  Google Scholar 

  15. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  16. Little, M.A., McSharry, P.E., Hunter, E.J., Spielman, J., Ramig, L.O. (2009). Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Biomedical Engineering, 56(4), 1015–1022.

    Article  Google Scholar 

  17. Morgan, B.J.T., & Ray, A.P.G. (1995). Non-uniqueness and inversions in cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 44(1), 117–134.

    MATH  Google Scholar 

  18. Murtagh, F. (1985). Multidimensional clustering algorithms. In Compstat lectures. Vienna: Physica-Verlag.

  19. Murtagh, F., & Contreras, P. (2017a). Algorithms for hierarchical clustering: an overview, ii. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(6), e1219.

  20. Murtagh, F., & Contreras, P. (2017b). Clustering through high dimensional data scaling: applications and implementations. Archives of Data Science, Series A, 2(1), 1–16.

  21. Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.

    MathSciNet  Article  Google Scholar 

  22. Sneath, P.H.A., & Sokal, R.R. (1973). Numerical taxonomy: the principles and practice of numerical classification. San Francisco: W. H. Freeman and Company.

    MATH  Google Scholar 

  23. Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38, 1409–1438.

    Google Scholar 

  24. Sokal, R.R., & Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11(2), 33–40.

    Article  Google Scholar 

  25. Ward, J.H. Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alberto Fernández.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fernández, A., Gómez, S. Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering. J Classif 37, 584–597 (2020). https://doi.org/10.1007/s00357-019-09339-z

Download citation

Keywords

  • Hierarchical clustering
  • Versatile linkage
  • Space distortion
  • Tree balance
  • Multidendrogram