Skip to main content
Log in

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

A general scheme for divisive hierarchical clustering algorithms is proposed. It is made of three main steps: first a splitting procedure for the subdivision of clusters into two subclusters, second a local evaluation of the bipartitions resulting from the tentative splits and, third, a formula for determining the node levels of the resulting dendrogram. A set of 12 such algorithms is presented and compared to their agglomerative counterpart (when available). These algorithms are evaluated using the Goodman-Kruskal correlation coefficient. As a global criterion it is an internal goodness-of-fit measure based on the set order induced by the hierarchy compared to the order associated with the given dissimilarities. Applied to a hundred random data tables and to three real life examples, these comparisons are in favor of methods which are based on unusual ratio-type formulas to evaluate the intermediate bipartitions, namely the Silhouette formula, the Dunn's formula and the Mollineda et al. formula. These formulas take into account both the within cluster and the between cluster mean dissimilarities. Their use in divisive algorithms performs very well and slightly better than in their agglomerative counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • BOLEY, D. (1998), “Principal Directions Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.

    Article  Google Scholar 

  • CUNNINGHAM, K.M., and OGILVIE, J.C. (1972), “Evaluation Of Hierarchical Grouping Techniques : A Preliminary Study”, Computer Journal, 15(3), 209–213.

    Article  Google Scholar 

  • DUNN, J.C. (1974), “Well Separated Clusters and Optimal Fuzzy Partitions”, Journal of Cybernetics, 4, 95–104.

    Article  MathSciNet  MATH  Google Scholar 

  • EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21(2), 362–375.

    Article  Google Scholar 

  • FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D., and LANDER, E.S. (1999), “Molecular Classification of Cancer: Class Discovery Monitoring and Class Prediction by Gene Expression Monitoring”, Science, 286, 531–537.

    Article  Google Scholar 

  • GOODMAN, L., and KRUSKAL, W. (1954), “Measures of Association for Cross-Validations, Part 1”, Journal of the American Statistical Association, 49, 732–764.

    MATH  Google Scholar 

  • GOWER, J.C. (1966), “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis”, Biometrika, 53(3,4), 325–338.

    Article  MathSciNet  MATH  Google Scholar 

  • HANDL, J., KNOWLES, J., and KELL, D.B. (2005), “Computational Cluster Validation in Post-Genomic Data Analysis”, Bioinformatics, 21(15), 3201–3212.

    Article  Google Scholar 

  • HUBERT, L.(1973), “Monotone Invariant Clustering Procedures”, Psychometrika, 38(1), 47–62.

    Article  MathSciNet  MATH  Google Scholar 

  • KAUFMAN L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, New York: Wiley.

    Book  MATH  Google Scholar 

  • KENDALL, M.G. (1938), “A New Measure of Rank Correlation”, Biometrika. 30(1-2), 81–93.

    Article  MATH  Google Scholar 

  • MACNAUGHTON-SMITH, P., WILLIAMS, W.T., DALE, M.B., and MOCKETT L.G. (1964), “Dissimilarity Analysis: A New Technique of Hierarchical Sub-Division”, Nature, 202, 1034–1035.

    Article  MATH  Google Scholar 

  • MOLLINEDA, R.A., and VIDAL, E. (2000), “A Relative Approach to Hierarchical Clustering”, in Pattern Recognition and Applications, eds. M.I. Torres and A. Sanfeliu, Amsterdam : IOS Press, pp 19–28.

  • MURTAGH, F., and LEGENDRE P. (2014), “Ward’s Hierarchical Agglomerative Method : Which Algorithms Implement Ward’s Criterion? ” Journal of Classification, 31, 274–295.

    Article  MathSciNet  MATH  Google Scholar 

  • REINERT, M. (1983), “Une Méthode de Classification Descendante Hiérarchique: Application à l'Analyse Lexicale par Contexte”, Les Cahiers de l'Analyse des Données, 8(2), 187–198.

    Google Scholar 

  • ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.

    Article  MATH  Google Scholar 

  • ROUX, M. (1991), “Basic Procedures in Hierarchical Cluster Analysis”, in Applied Multivariate Analysis in SA–R and Environmental Studies, eds. J. Devillers and W. Karcher, Dordrecht : Kluwer Academic Publishers, pp 115–135.

  • ROUX, M. (1995),“About Divisive Methods in Hierarchical Clustering”, in Data Science and Its Applications, eds. Y. Escoufier, C. Hayashi, B. Fichet, N. Ohsumi, E. Diday, Y. Baba, and L. Lebart, Tokyo: Acadademic Press, pp 101–106.

  • SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman and Co.

    MATH  Google Scholar 

  • SOKAL, R.R., and ROHLF, F.J. (1962), “The Comparison of Dendrograms by Objective Methods”, Taxonomy, 11(2), 33–40.

    Article  Google Scholar 

  • STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, Technical Report TR 00-034. University of Minnesota, Minneapolis, USA.

    Google Scholar 

  • SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering Via Joint Between- Within Distances: Extending Ward's Minimum Variance Method”, Journal of Classification, 22, 151–183.

    Article  MathSciNet  MATH  Google Scholar 

  • TUBB, A., PARKER, N.J., and NICKLESS, G. (1980), “The Analysis of Romano-British Pottery by Atomic Absorption Spectrophotometry”, Archaeometry, 22, 153–171.

    Article  Google Scholar 

  • WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statisitcal Association, 58, 236–244.

    Article  MathSciNet  Google Scholar 

  • WILLIAMS, W.T., and LAMBERT, J.M. (1959), “Multivariate Methods In Plant Ecology. I. Association Analysis in Plant Communities”, Journal of Ecology, 47(1), 83–101.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maurice Roux.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roux, M. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. J Classif 35, 345–366 (2018). https://doi.org/10.1007/s00357-018-9259-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-018-9259-9

Keywords

Navigation