A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

Roux, Maurice

doi:10.1007/s00357-018-9259-9

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

Published: 07 August 2018

Volume 35, pages 345–366, (2018)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Maurice Roux¹

1309 Accesses
87 Citations
4 Altmetric
Explore all metrics

Abstract

A general scheme for divisive hierarchical clustering algorithms is proposed. It is made of three main steps: first a splitting procedure for the subdivision of clusters into two subclusters, second a local evaluation of the bipartitions resulting from the tentative splits and, third, a formula for determining the node levels of the resulting dendrogram. A set of 12 such algorithms is presented and compared to their agglomerative counterpart (when available). These algorithms are evaluated using the Goodman-Kruskal correlation coefficient. As a global criterion it is an internal goodness-of-fit measure based on the set order induced by the hierarchy compared to the order associated with the given dissimilarities. Applied to a hundred random data tables and to three real life examples, these comparisons are in favor of methods which are based on unusual ratio-type formulas to evaluate the intermediate bipartitions, namely the Silhouette formula, the Dunn's formula and the Mollineda et al. formula. These formulas take into account both the within cluster and the between cluster mean dissimilarities. Their use in divisive algorithms performs very well and slightly better than in their agglomerative counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

BOLEY, D. (1998), “Principal Directions Divisive Partitioning”, Data Mining and Knowledge Discovery, 2(4), 325–344.
Article Google Scholar
CUNNINGHAM, K.M., and OGILVIE, J.C. (1972), “Evaluation Of Hierarchical Grouping Techniques : A Preliminary Study”, Computer Journal, 15(3), 209–213.
Article Google Scholar
DUNN, J.C. (1974), “Well Separated Clusters and Optimal Fuzzy Partitions”, Journal of Cybernetics, 4, 95–104.
Article MathSciNet MATH Google Scholar
EDWARDS, A.W.F., and CAVALLI-SFORZA, L.L. (1965), “A Method for Cluster Analysis”, Biometrics, 21(2), 362–375.
Article Google Scholar
FISHER, R. A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179–188.
Article Google Scholar
GOLUB, T.R., SLONIM, D.K., TAMAYO, P., HUARD, C., GAASENBEEK, M., MESIROV, J.P., COLLER, H., LOH, M.L., DOWNING, J.R., CALIGIURI, M.A., BLOOMFIELD, C.D., and LANDER, E.S. (1999), “Molecular Classification of Cancer: Class Discovery Monitoring and Class Prediction by Gene Expression Monitoring”, Science, 286, 531–537.
Article Google Scholar
GOODMAN, L., and KRUSKAL, W. (1954), “Measures of Association for Cross-Validations, Part 1”, Journal of the American Statistical Association, 49, 732–764.
MATH Google Scholar
GOWER, J.C. (1966), “Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis”, Biometrika, 53(3,4), 325–338.
Article MathSciNet MATH Google Scholar
HANDL, J., KNOWLES, J., and KELL, D.B. (2005), “Computational Cluster Validation in Post-Genomic Data Analysis”, Bioinformatics, 21(15), 3201–3212.
Article Google Scholar
HUBERT, L.(1973), “Monotone Invariant Clustering Procedures”, Psychometrika, 38(1), 47–62.
Article MathSciNet MATH Google Scholar
KAUFMAN L., and ROUSSEEUW, P.J. (1990), Finding Groups in Data, New York: Wiley.
Book MATH Google Scholar
KENDALL, M.G. (1938), “A New Measure of Rank Correlation”, Biometrika. 30(1-2), 81–93.
Article MATH Google Scholar
MACNAUGHTON-SMITH, P., WILLIAMS, W.T., DALE, M.B., and MOCKETT L.G. (1964), “Dissimilarity Analysis: A New Technique of Hierarchical Sub-Division”, Nature, 202, 1034–1035.
Article MATH Google Scholar
MOLLINEDA, R.A., and VIDAL, E. (2000), “A Relative Approach to Hierarchical Clustering”, in Pattern Recognition and Applications, eds. M.I. Torres and A. Sanfeliu, Amsterdam : IOS Press, pp 19–28.
MURTAGH, F., and LEGENDRE P. (2014), “Ward’s Hierarchical Agglomerative Method : Which Algorithms Implement Ward’s Criterion? ” Journal of Classification, 31, 274–295.
Article MathSciNet MATH Google Scholar
REINERT, M. (1983), “Une Méthode de Classification Descendante Hiérarchique: Application à l'Analyse Lexicale par Contexte”, Les Cahiers de l'Analyse des Données, 8(2), 187–198.
Google Scholar
ROUSSEEUW, P.J. (1987), “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis”, Journal of Computational and Applied Mathematics, 20, 53–65.
Article MATH Google Scholar
ROUX, M. (1991), “Basic Procedures in Hierarchical Cluster Analysis”, in Applied Multivariate Analysis in SA–R and Environmental Studies, eds. J. Devillers and W. Karcher, Dordrecht : Kluwer Academic Publishers, pp 115–135.
ROUX, M. (1995),“About Divisive Methods in Hierarchical Clustering”, in Data Science and Its Applications, eds. Y. Escoufier, C. Hayashi, B. Fichet, N. Ohsumi, E. Diday, Y. Baba, and L. Lebart, Tokyo: Acadademic Press, pp 101–106.
SNEATH, P.H.A., and SOKAL, R.R. (1973), Numerical Taxonomy, San Francisco: W.H. Freeman and Co.
MATH Google Scholar
SOKAL, R.R., and ROHLF, F.J. (1962), “The Comparison of Dendrograms by Objective Methods”, Taxonomy, 11(2), 33–40.
Article Google Scholar
STEINBACH, M., KARYPIS, G., and KUMAR, V. (2000), “A Comparison of Document Clustering Techniques”, Technical Report TR 00-034. University of Minnesota, Minneapolis, USA.
Google Scholar
SZÉKELY, G.J., and RIZZO, M.L. (2005), “Hierarchical Clustering Via Joint Between- Within Distances: Extending Ward's Minimum Variance Method”, Journal of Classification, 22, 151–183.
Article MathSciNet MATH Google Scholar
TUBB, A., PARKER, N.J., and NICKLESS, G. (1980), “The Analysis of Romano-British Pottery by Atomic Absorption Spectrophotometry”, Archaeometry, 22, 153–171.
Article Google Scholar
WARD, J.H. JR. (1963), “Hierarchical Grouping to Optimize an Objective Function”, Journal of the American Statisitcal Association, 58, 236–244.
Article MathSciNet Google Scholar
WILLIAMS, W.T., and LAMBERT, J.M. (1959), “Multivariate Methods In Plant Ecology. I. Association Analysis in Plant Communities”, Journal of Ecology, 47(1), 83–101.
Article Google Scholar

Download references

Author information

Authors and Affiliations

IMBE (Aix Marseille Université, CNRS, IRD, Univ Avignon), Faculté des Sciences de St-Jérôme, 13397, Marseille cedex 20, France
Maurice Roux

Authors

Maurice Roux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maurice Roux.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roux, M. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. J Classif 35, 345–366 (2018). https://doi.org/10.1007/s00357-018-9259-9

Download citation

Published: 07 August 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s00357-018-9259-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation