Skip to main content
Log in

Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering

Journal of Classification Aims and scope Submit manuscript

Abstract

Agglomerative hierarchical clustering can be implemented with several strategies that differ in the way elements of a collection are grouped together to build a hierarchy of clusters. Here we introduce versatile linkage, a new infinite system of agglomerative hierarchical clustering strategies based on generalized means, which go from single linkage to complete linkage, passing through arithmetic average linkage and other clustering methods yet unexplored such as geometric linkage and harmonic linkage. We compare the different clustering strategies in terms of cophenetic correlation, mean absolute error, and also tree balance and space distortion, two new measures proposed to describe hierarchical trees. Unlike the β-flexible clustering system, we show that the versatile linkage family is space-conserving.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Aeberhard, S., Coomans, D., De Vel, O. (1992). Comparison of classifiers in high dimensional settings. Dept. Math. Statist., James Cook Univ., North Queensland, Australia, Tech. Rep. no. 92-02.

  • Belbin, L., Faith, D.P., Milligan, G.W. (1992). A comparison of two approaches to beta-flexible clustering. Multivariate Behavioral Research, 27(3), 417–433.

    Article  Google Scholar 

  • Bradley, P.E. (2010). Mumford dendrograms. The Computer Journal, 53(4), 393–404.

    Article  Google Scholar 

  • Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the Baire metric. Journal of Classification, 29(2), 118–143.

    Article  MathSciNet  Google Scholar 

  • Day, W.H.E., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24.

    Article  Google Scholar 

  • Dubien, J.L., & Warde, W.D. (1979). A mathematical comparison of the members of an infinite family of agglomerative clustering algorithms. Canadian Journal of Statistics, 7, 29–38.

    Article  MathSciNet  Google Scholar 

  • Fernández, A., & Gómez, S. (2008). Solving non-uniqueness in agglomerative hierarchical clustering using multidendrograms. Journal of Classification, 25(1), 43–65.

    Article  MathSciNet  Google Scholar 

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • Gómez, S., & Fernández, A. (2018). MultiDendrograms: a hierarchical clustering tool (Version 5.0). http://deim.urv.cat/~sergio.gomez/multidendrograms.php.

  • Gordon, A.D. (1999). Classification, 2nd edn. Boca Raton: Chapman & Hall/CRC.

    Book  Google Scholar 

  • Hart, G. (1983). The occurrence of multiple UPGMA phenograms. In J. Felsenstein (Ed.) Numerical taxonomy (pp. 254–258). Berlin: Springer.

  • Jossinet, J. (1996). Variability of impedivity in normal and pathological breast tissue. Medical and Biological Engineering and Computing, 34(5), 346–350.

    Article  Google Scholar 

  • Lance, G.N., & Williams, W.T. (1966). A generalized sorting strategy for computer classifications. Nature, 212, 218.

    Article  Google Scholar 

  • Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.

    Article  Google Scholar 

  • Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.

  • Little, M.A., McSharry, P.E., Hunter, E.J., Spielman, J., Ramig, L.O. (2009). Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Transactions on Biomedical Engineering, 56(4), 1015–1022.

    Article  Google Scholar 

  • Morgan, B.J.T., & Ray, A.P.G. (1995). Non-uniqueness and inversions in cluster analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 44(1), 117–134.

    MATH  Google Scholar 

  • Murtagh, F. (1985). Multidimensional clustering algorithms. In Compstat lectures. Vienna: Physica-Verlag.

  • Murtagh, F., & Contreras, P. (2017a). Algorithms for hierarchical clustering: an overview, ii. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(6), e1219.

  • Murtagh, F., & Contreras, P. (2017b). Clustering through high dimensional data scaling: applications and implementations. Archives of Data Science, Series A, 2(1), 1–16.

  • Shannon, C.E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27, 379–423.

    Article  MathSciNet  Google Scholar 

  • Sneath, P.H.A., & Sokal, R.R. (1973). Numerical taxonomy: the principles and practice of numerical classification. San Francisco: W. H. Freeman and Company.

    MATH  Google Scholar 

  • Sokal, R.R., & Michener, C.D. (1958). A statistical method for evaluating systematic relationships. The University of Kansas Science Bulletin, 38, 1409–1438.

    Google Scholar 

  • Sokal, R.R., & Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11(2), 33–40.

    Article  Google Scholar 

  • Ward, J.H. Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alberto Fernández.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fernández, A., Gómez, S. Versatile Linkage: a Family of Space-Conserving Strategies for Agglomerative Hierarchical Clustering. J Classif 37, 584–597 (2020). https://doi.org/10.1007/s00357-019-09339-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-019-09339-z

Keywords

Navigation