Abstract
A general scheme for divisive hierarchical clustering algorithms is proposed. It is made of three main steps: first a splitting procedure for the subdivision of clusters into two subclusters, second a local evaluation of the bipartitions resulting from the tentative splits and, third, a formula for determining the node levels of the resulting dendrogram. A set of 12 such algorithms is presented and compared to their agglomerative counterpart (when available). These algorithms are evaluated using the Goodman-Kruskal correlation coefficient. As a global criterion it is an internal goodness-of-fit measure based on the set order induced by the hierarchy compared to the order associated with the given dissimilarities. Applied to a hundred random data tables and to three real life examples, these comparisons are in favor of methods which are based on unusual ratio-type formulas to evaluate the intermediate bipartitions, namely the Silhouette formula, the Dunn's formula and the Mollineda et al. formula. These formulas take into account both the within cluster and the between cluster mean dissimilarities. Their use in divisive algorithms performs very well and slightly better than in their agglomerative counterpart.
Similar content being viewed by others
References
BOLEY, D. (1998), “Principal Directions Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.
CUNNINGHAM, K.M., and OGILVIE, J.C. (1972), “Evaluation Of Hierarchical Grouping Techniques : A Preliminary Study”, Computer Journal, 15(3), 209–213.
DUNN, J.C. (1974), “Well Separated Clusters and Optimal Fuzzy Partitions”, Journal of Cybernetics, 4, 95–104.
EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21(2), 362–375.
FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179–188.
GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D., and LANDER, E.S. (1999), “Molecular Classification of Cancer: Class Discovery Monitoring and Class Prediction by Gene Expression Monitoring”, Science, 286, 531–537.
GOODMAN, L., and KRUSKAL, W. (1954), “Measures of Association for Cross-Validations, Part 1”, Journal of the American Statistical Association, 49, 732–764.
GOWER, J.C. (1966), “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis”, Biometrika, 53(3,4), 325–338.
HANDL, J., KNOWLES, J., and KELL, D.B. (2005), “Computational Cluster Validation in Post-Genomic Data Analysis”, Bioinformatics, 21(15), 3201–3212.
HUBERT, L.(1973), “Monotone Invariant Clustering Procedures”, Psychometrika, 38(1), 47–62.
KAUFMAN L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, New York: Wiley.
KENDALL, M.G. (1938), “A New Measure of Rank Correlation”, Biometrika. 30(1-2), 81–93.
MACNAUGHTON-SMITH, P., WILLIAMS, W.T., DALE, M.B., and MOCKETT L.G. (1964), “Dissimilarity Analysis: A New Technique of Hierarchical Sub-Division”, Nature, 202, 1034–1035.
MOLLINEDA, R.A., and VIDAL, E. (2000), “A Relative Approach to Hierarchical Clustering”, in Pattern Recognition and Applications, eds. M.I. Torres and A. Sanfeliu, Amsterdam : IOS Press, pp 19–28.
MURTAGH, F., and LEGENDRE P. (2014), “Ward’s Hierarchical Agglomerative Method : Which Algorithms Implement Ward’s Criterion? ” Journal of Classification, 31, 274–295.
REINERT, M. (1983), “Une Méthode de Classification Descendante Hiérarchique: Application à l'Analyse Lexicale par Contexte”, Les Cahiers de l'Analyse des Données, 8(2), 187–198.
ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.
ROUX, M. (1991), “Basic Procedures in Hierarchical Cluster Analysis”, in Applied Multivariate Analysis in SA–R and Environmental Studies, eds. J. Devillers and W. Karcher, Dordrecht : Kluwer Academic Publishers, pp 115–135.
ROUX, M. (1995),“About Divisive Methods in Hierarchical Clustering”, in Data Science and Its Applications, eds. Y. Escoufier, C. Hayashi, B. Fichet, N. Ohsumi, E. Diday, Y. Baba, and L. Lebart, Tokyo: Acadademic Press, pp 101–106.
SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman and Co.
SOKAL, R.R., and ROHLF, F.J. (1962), “The Comparison of Dendrograms by Objective Methods”, Taxonomy, 11(2), 33–40.
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, Technical Report TR 00-034. University of Minnesota, Minneapolis, USA.
SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering Via Joint Between- Within Distances: Extending Ward's Minimum Variance Method”, Journal of Classification, 22, 151–183.
TUBB, A., PARKER, N.J., and NICKLESS, G. (1980), “The Analysis of Romano-British Pottery by Atomic Absorption Spectrophotometry”, Archaeometry, 22, 153–171.
WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statisitcal Association, 58, 236–244.
WILLIAMS, W.T., and LAMBERT, J.M. (1959), “Multivariate Methods In Plant Ecology. I. Association Analysis in Plant Communities”, Journal of Ecology, 47(1), 83–101.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roux, M. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. J Classif 35, 345–366 (2018). https://doi.org/10.1007/s00357-018-9259-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-018-9259-9