Skip to main content

Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints


Hierarchical agglomerative clustering (HAC) with Ward’s linkage has been widely used since its introduction by Ward (Journal of the American Statistical Association, 58(301), 236–244, 1963). This article reviews extensions of HAC to various input data and contiguity-constrained HAC, and provides applicability conditions. In addition, different versions of the graphical representation of the results as a dendrogram are also presented and their properties are clarified. We clarify and complete the results already available in an heterogeneous literature using a uniform background. In particular, this study reveals an important distinction between a consistency property of the dendrogram and the absence of crossover within it. Finally, a simulation study shows that the constrained version of HAC can sometimes provide more relevant results than its unconstrained version despite the fact that the constraint leads to optimize the objective criterion on a reduced set of solutions at each step. Overall, this article provides comprehensive recommendations, both for the use of HAC and constrained HAC depending on the input data, and for the representation of the results.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. In the rare situation when the minimal linkage is achieved by more than one merger, a choice between these mergers has to be made. Different choices are made by different implementations of HAC.


  3. In some cases, similarity measures are also supposed to take non-negative values, but we will not make this assumption in the present article.

  4. The detailed analysis of all examples and counter-examples of this section is provided in Appendix 2.


  6. The pre-processed and normalized data have been downloaded from the authors’ website at (raw sequence data are also published on the GEO website, accession number GSE35156).


  • Ah-Pine, J., & Wang, X. (2016). Similarity based hierarchical clustering with an application to text collections. In Boström, H., Knobbe, A., Soares, C., & Papapetrou, P. (Eds.) Proceedings of the 15th International Symposium on Intelligent Data Analysis (IDA 2016), Lecture Notes in Computer Sciences (pp. 320–331). Stockholm.

  • Ambroise, C., Dehman, A., Neuvial, P., Rigaill, G., Vialaneix, N. (2019). Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. Algorithms for Molecular Biology, 14, 22.

    Article  Google Scholar 

  • Arlot, S., Brault, V., Baudry, J.-P., Maugis, C., Michel, B. (2016). capushe: CAlibrating Penalities Using Slope HEuristics. R package version 1.1.1.

  • Arlot, S., Celisse, A., Harchaoui, Z. (2019). A kernel multiple change-point algorithm via model selection. Submitted for publication. arXiv:1202.3878v3. Now published in JMLR, see Bibtex entry:

  • Aronszajn, N. (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3), 337–337.

    MathSciNet  Article  Google Scholar 

  • Batagelj, V. (1981). Note on ultrametric hierarchical clustering algorithms. Psychometrika, 46(3), 351–352.

    MathSciNet  Article  Google Scholar 

  • Bennett, K.D. (1996). Determination of the number of zones in a biostratigraphical sequence. New Phytologist, 132(1), 155–170.

    Article  Google Scholar 

  • Chavent, M., Kuentz-Simonet, V., Labenne, A., Saracco, J. (2018). Clustgeo2: an R package for hierarchical clustering with spatial constraints. Computational Statistics, 33(4), 1799–1822.

    MathSciNet  Article  Google Scholar 

  • Chen, J., & Ye, J. (2008). Training SVM with indefinite kernels. In Cohen, W., McCallum, A., & Roweis, S. (Eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008) (pp. 136–146). New York: ACM.

  • Chen, Y., Garcia, E., Gupta, M., Rahimi, A., Cazzanti, L. (2009). Similarity-based classification: concepts and algorithm. Journal of Machine Learning Research, 10, 747–776.

    MathSciNet  MATH  Google Scholar 

  • Danon, L., Diaz-Guilera, A., Duch, J., Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005, P09008.

    Article  Google Scholar 

  • Dehman, A. (2015). Spatial clustering of linkage disequilibrium blocks for genome-wide association studies, PhD thesis, Université Paris Saclay.

  • Dixon, J., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J., Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376–380.

    Article  Google Scholar 

  • Ferligoj, A., & Batagelj, V. (1982). Clustering with relational constraint. Psychometrika, 47(4), 413–426.

    MathSciNet  Article  Google Scholar 

  • Fraser, J., Ferrai, C., Chiariello, A.M., Schueler, M., Rito, T., Laudanno, G., Barbieri, M., Moore, B.L., Kraemer, D.C., Aitken, S., Xie, S.Q., Morris, K.J., Itoh, M., Kawaji, H., Jaeger, I., Hayashizaki, Y., Carninci, P., Forrest, A.R., The FANTOM Consortium, Semple, C.A., Dostie, J., Pombo, A., Nicodemi, M. (2015). Hierarchical folding and reorganization of chromosomes are linked to transcriptional changes in cellular differentiation. Molecular Systems Biology, 11, 852.

    Article  Google Scholar 

  • Gordon, A. (1996). A survey of constrained classification. Computational Statistics & Data Analysis, 21(1), 17–29.

    MathSciNet  Article  Google Scholar 

  • Grimm, E.C. (1987). CONISS: A FORTRAN 77 program for stratigraphically constrained analysis by the method of incremental sum of squares. Computers & Geosciences, 13(1), 13–35.

    Article  Google Scholar 

  • Haddad, N., Vaillant, C., Jost, D. (2017). IC-Finder: inferring robustly the hierarchical organization of chromatin folding. Nucleic Acids Research, 45(10), e81–e81.

    Google Scholar 

  • Hartigan, J.A. (1967). Representation of similarity matrices by trees. Journal of the American Statistical Association, 62(320), 1140–1158.

    MathSciNet  Article  Google Scholar 

  • Imakaev, M., Fudenberg, G., McCord, R., Naumova, N., Goloborodko, A., Lajoie, B., Dekker, J., Mirny, L. (2012). Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003.

    Article  Google Scholar 

  • Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32(3), 241–254.

    Article  Google Scholar 

  • Krislock, N., & Wolkowicz, H. (2012). Handbook on semidefinite, conic and polynomial optimization, volume 166 of International Series in Operations Research & Management Science, chapter Euclidean distance matrices and applications, (pp. 879–914). New York: Springer.

    MATH  Google Scholar 

  • Kruskal, J. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1–27.

    MathSciNet  Article  Google Scholar 

  • Lance, G., & Williams, W. (1967). A general theory of classificatory sorting strategies: 1. Hierarchical systems. The Computer Journal, 9(4), 373–380.

    Article  Google Scholar 

  • Lebart, L. (1978). Programme d’agrégation avec contraintes. Les Cahiers de l’Analyse des Données, 3(3), 275–287.

    Google Scholar 

  • Miyamoto, S., Abe, R., Endo, Y., Takeshita, J.-I. (2015). Ward method of hierarchical clustering for non-Euclidean similarity measures. In Proceedings of the VIIth International Conference of Soft Computing and Pattern Recognition (SoCPaR 2015). Fukuoka: IEEE.

  • Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion. Journal of Classification, 31(3), 274–295.

    MathSciNet  Article  Google Scholar 

  • Qin, J., Lewis, D.P., Noble, W.S. (2003). Kernel hierarchical gene clustering from microarray expression data. Bioinformatics, 19(16), 2097–2104.

    Article  Google Scholar 

  • Rammal, R., Toulouse, G., Virasoro, M.A. (1986). Ultrametricity for physicists. Reviews of Modern Physics, 58(3), 765–788.

    MathSciNet  Article  Google Scholar 

  • Schleif, F.-M., & Tino, P. (2015). Indefinite proximity learning: a review. Neural Computation, 27(10), 2039–2096.

    MathSciNet  Article  Google Scholar 

  • Schoenberg, I. (1935). Remarks to Maurice fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert”. Annals of Mathematics, 36, 724–732.

    MathSciNet  Article  Google Scholar 

  • Schölkopf, B., & Smola, A.J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press.

  • Steinley, D., & Hubert, L. (2008). Order-constrained solutions in K-means clustering: even better than being globally optimal. Psychometrika, 73(4), 647–664.

    MathSciNet  Article  Google Scholar 

  • Strauss, T., & von Maltitz, M.J. (2017). Generalising Ward’s method for use with Manhattan distances. PLoS ONE, 12, e0168288.

    Article  Google Scholar 

  • Székely, G.J., & Rizzo, M.L. (2005). Hierarchical clustering via joint between-within distances: extending Ward’s minimum variance method. Journal of Classification, 22(2), 151–183.

    MathSciNet  Article  Google Scholar 

  • Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    MathSciNet  Article  Google Scholar 

  • Wickham, H. (2016). ggplot2: elegant graphics for data analysis. New York: Springer.

    Book  Google Scholar 

  • Wishart, D. (1969). An algorithm for hierarchical classifications. Biometrics, 25(1), 165–170.

    Article  Google Scholar 

  • Young, G., & Householder, A. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.

    Article  Google Scholar 

  • Zufferey, M., Tavernari, D., Oricchio, E., Ciriello, G. (2018). Comparison of computational methods for the identification of topologically associating domains. Genome Biology, 19(1), 217.

    Article  Google Scholar 

Download references


The authors would like to thank Marie Chavent for numerous instructive discussions on this paper. The authors are grateful to the GenoToul bioinformatics platform (INRAE Toulouse, and its staff for providing computing facilities.


The PhD thesis of N.R. is funded by the INRAE/Inria doctoral program 2018. This work was also supported by the SCALES project funded by CNRS (Mission “Osez l’interdisciplinarité”).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nathanaël Randriamihamison.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1. Proof of Proposition 2

Proof of Proposition 2

We begin by noting that by Proposition 1, the only reversals that may occur are crossovers. With the notation of Proposition 2, a crossover at step t + 1 corresponds to the situation where:

$$ \begin{array}{@{}rcl@{}} \delta(G_{l} , G_{r})\geq \delta(G_{l} \cup G_{r}, G_{\bar{r}}) \textrm{ or } \delta(G_{l} , G_{r})\geq \delta(G_{l} \cup G_{r}, G_{\bar{l}}). \end{array} $$

By symmetry, we focus on the first case. With the notation of Proposition 2, and using the Lance-Willams formula (4), the first condition is equivalent to:

$$ \begin{array}{@{}rcl@{}} \delta(G_{l}, G_{r}) \geq \frac{g_{lr'} \delta(G_{l}, G_{\bar{r}}) + g_{rr^{\prime}} \delta (G_{r}, G_{\bar{r}})}{g_{lr^{\prime}} + g_{rr^{\prime}}} \end{array} $$

while the second one is equivalent to:

$$ \begin{array}{@{}rcl@{}} \delta(G_{l}, G_{r}) \geq \frac{g_{\bar{l}l} \delta(G_{\bar{l}}, G_{l}) + g_{\bar{l}r} \delta (G_{\bar{l}}, G_{r})}{g_{\bar{l}l} + g_{\bar{l}r}} \end{array} $$

hence the result. □

Appendix 2. Step-by-step Description of the Counter-Examples

In the following tables, Bold values are used to signal reversals. Italic values in Table 3 are used to highlight the value of the objective function (ESSt) for the clustering with 3 clusters.

Table 2 Details of Fig. 1
Table 3 Details of Fig. 2
Table 4 Details of Fig. 3
Table 5 Details of Fig. 4
Table 6 Details of Fig. 11

Appendix 3. Counter-Example of the Monotonicity of \(\bar {I}_{t}\) for Standard HAC in the Euclidean Case

Fig. 11
figure 11

A reversal for Euclidean standard HAC with height defined as \(\bar {I}_{t}\). Top left: Configuration of the objects in \(\mathbb {R}^{2}\). Top right: Coordinates of the objects and Euclidean distance matrix corresponding to this configuration. Bottom left: Representation of the values of the dissimilarity (dark colors correspond to larger values, so distant objects). Bottom right: dendrogram obtained from standard HAC. Only the first 3 merges of the dendrogram are represented to ensure a comprehensive view of the sequence of heights

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Randriamihamison, N., Vialaneix, N. & Neuvial, P. Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints. J Classif 38, 363–389 (2021).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Hierarchical agglomerative clustering
  • Ward’s linkage
  • Contiguity constraint
  • Dendrogram
  • Monotonicity