## Abstract

This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods that, based on the dissimilarity structure, output hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter. Our construction of hierarchical clustering methods is built around the concept of admissible methods, which are those that abide by the axioms of value—nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them—and transformation—when dissimilarities are reduced, the network may become more clustered but not less. Two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Furthermore, alternative clustering methodologies and axioms are considered. In particular, modifying the axiom of value such that clustering in two-node networks occurs at the minimum of the two dissimilarities entails the existence of a unique admissible clustering method. Finally, the developed clustering methods are implemented to analyze the internal migration in the United States.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
MATLAB implementations of the clustering methods here discussed can be downloaded at http://www.mit.edu/~segarra/clustering_methods_ADAC.zip.

- 2.
See Footnote 1.

- 3.

## References

Ackerman M, Ben-David S (2008) Measures of clustering quality: a working set of axioms for clustering. In: Neural Info. Process. Syst. (NIPS), pp 121–128

Bach F, Jordan M (2004) Learning spectral clustering. In: Neural Info. Process. Syst. (NIPS), pp 305–312

Ben-David S, Von Luxburg U, Pál D (2006) A sober look at clustering stability. In: Conf. Learning Theory (COLT), pp 5–19

Boyd JP (1980) Asymmetric clusters of internal migration regions of France. IEEE Trans Syst Man Cybern 2:101–104

Burago D, Burago Y, Ivanov S (2001) A course in metric geometry, AMS Graduate Studies in Math., vol 33. American Mathematical Society, Providence

Carlsson G, Mémoli F (2010a) Characterization, stability and convergence of hierarchical clustering methods. J Mach Learn Res 11:1425–1470

Carlsson G, Mémoli F (2010b) Multiparameter hierarchical clustering methods. In: Conf. Intl. Fed. Classif. Soc. (IFCS). Springer, Berlin, pp 63–70

Carlsson G, Mémoli F (2013) Classifying clustering schemes. Found Comput Math 13(2):221–252

Carlsson G, Mémoli F, Ribeiro A, Segarra S (2013a) Alternative axiomatic constructions for hierarchical clustering of asymmetric networks. In: Global Conf. on Signal and Info. Process. (GlobalSIP), pp 791–794

Carlsson G, Memoli F, Ribeiro A, Segarra S (2013b) Axiomatic construction of hierarchical clustering in asymmetric networks. In: Intl. Conf. on Acoustics, Speech and Signal Process. (ICASSP), pp 5219–5223

Carlsson G, Mémoli F, Ribeiro A, Segarra S (2014) Hierarchical quasi-clustering methods for asymmetric networks. JMLR W&CP: Int Conf Mach Learn 32(1):352–360

Chino N (2012) A brief survey of asymmetric MDS and some open problems. Behaviormetrika 39(1):127–165

Chino N, Shiraiwa K (1993) Geometrical structures of some non-distance models for asymmetric MDS. Behaviormetrika 20(1):35–47

Choi JI, Jain M, Srinivasan K, Levis P, Katti S (2010) Achieving single channel, full duplex wireless communication. In: Proc. Intl. Conf. Mobile Comp. and Netw. ACM, pp 1–12

Chung FR (1997) Spectral graph theory, vol 92. American Mathematical Soc, Providence

Guyon I, Von Luxburg U, Williamson RC (2009) Clustering: science or art. In: NIPS 2009 wksp. on clustering theory

Hu TC (1961) The maximum capacity route problem. Oper Res 9(6):898–900

Hubert L (1973) Min and max hierarchical clustering using asymmetric similarity measures. Psychometrika 38(1):63–72

Jain A, Dubes RC (1988) Algorithms for clustering data. Prentice Hall Advanced Reference Series, Prentice Hall Inc

Kleinberg JM (2002) An impossibility theorem for clustering. In: Neural Info. Process. Syst. (NIPS), pp 446–453

Lance GN, Williams WT (1967) A general theory of classificatory sorting strategies 1: hierarchical systems. Comput J 9(4):373–380

Meila M, Pentney W (2007) Clustering by weighted cuts in directed graphs. SIAM Intl Conf Data Mining, pp 135–144

Müllner D (2011) Modern hierarchical, agglomerative clustering algorithms. ArXiv e-prints arXiv:1109.2378

Murtagh F (1985) Multidimensional clustering algorithms. Compstat Lectures. Physica, Vienna

Newman M, Girvan M (2002) Community structure in social and biological networks. Proc Natl Acad Sci 99(12):7821–7826

Newman M, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113

Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: Neural Info. Process. Syst. (NIPS), pp 849–856

Okada A, Iwamoto T (1996) University enrollment flow among the Japanese prefectures: a comparison before and after the joint first stage achievement test by asymmetric cluster analysis. Behaviormetrika 23(2):169–185

Pentney W, Meila M (2005) Spectral clustering of biological sequence data. In: Ntnl. Conf. Artificial Intel., pp 845–850

Saito T, Yadohisa H (2004) Data analysis of asymmetric structures: advanced approaches in computational statistics. CRC Press, Boca Raton

Sato Y (1988) An analysis of sociometric data by MDS in Minkowski space. Stat Theory Data Anal II:385–396

Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

Slater P (1976) Hierarchical internal migration regions of France. IEEE Trans Syst Man Cybern 4:321–324

Slater P (1984) A partial hierarchical regionalization of 3140 US counties on the basis of 1965–1970 intercounty migration. Environ Plan A 16(4):545–550

Smith Z, Chowdhury S, Mémoli, (2016) Hierarchical representations of network data with optimal distortion bounds. In: Asilomar Conf. Signals, Systems and Computers, pp 1773–1777

Tarjan RE (1983) An improved algorithm for hierarchical clustering using strong components. Inf Process Lett 17(1):37–41

Vicari D (2014) Classification of asymmetric proximity data. J Classif 31(3):386–420

Vicari D (2015) CLUSKEXT: clustering model for skew-symmetric data including external information. Adv Data Anal Classif 1–22

Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416

Von Luxburg U, Ben-David S (2005) Towards a statistical theory of clustering. In: PASCAL wksp. on statistics and optimization of clustering

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

Zadeh RB, Ben-David S (2009) A uniqueness theorem for clustering. In: Conf. Uncert. Artif. Intell. (UAI), pp 639–646

Zhao Y, Karypis G (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10:141–168

Zhou D, Schölkopf B, Hofmann T (2005) Semi-supervised learning on directed graphs. In: Neural Info. Process. Syst. (NIPS), pp 1633–1640

## Author information

## Additional information

Work in this paper is supported by NSF CCF-1217963, NSF CAREER CCF-0952867, NSF IIS-1422400, NSF CCF-1526513, AFOSR FA9550-09-0-1-0531, AFOSR FA9550-09-1-0643, NSF DMS-0905823, and NSF DMS-0406992.

## Appendix: Proofs

### Appendix: Proofs

### Proof of Proposition 2

We prove that any method \({{\mathcal H}}\) satisfying axioms (A1)–(A2) is idempotent by showing that is satisfies that \({{\mathcal H}}(X, u_X) = (X,u_X)\), for all \((X, u_X) \in {{\mathcal U}}\). Consider the application of admissible methods \({{\mathcal H}}\) to the ultrametric network \(U_X=(X, u_X)\). Since \(U_X\) is symmetric, from (22) we have that \(u^{\text { NR}}_X=u^{\text { R}}_X\). Thus, if we show that \({{\mathcal H}}^{\text { R}}\) is idempotent, we know that \({{\mathcal H}}^{\text { NR}}\) is as well. Moreover, from (20) it would follow that every admissible method is idempotent. Consequently, we need to show that \({{\mathcal H}}^\text { R}(X, u_X) = (X,u_X), \ \text {for all} \ (X, u_X) \in {{\mathcal U}}\).

Denoting by \((X, u^\text { R}_X)={{\mathcal H}}^\text { R}(X,u_X)\) the outcome of applying \({{\mathcal H}}^\text { R}\) to \(U_X\), we can write for all \(x,x'\in X\) [cf. (15)]

where there is no need to take the maximum between \(u_X(x_i,x_{i+1})\) and \(u_X(x_{i+1},x_{i})\) since \(U_X\) is symmetric. Given a chain \(C(x,x')\) and using the fact that \(u_X\) is an ultrametric it follows from the strong triangle inequality in (7) that \(u_X(x,x') \le \max _{i | x_i\in C(x,x')} u_X(x_i,x_{i+1})\). Since the previous inequality is valid for *all* chains and the value of \(u^\text { R}_X(x,x')\) in (27) comes from the cost of some chain, we have that \(u^\text { R}_X(x,x') \ge u_X(x,x')\), for all \(x, x' \in X\). Also, by considering the particular chain \(C(x,x')=[x,x']\) with cost \(u_X(x,x')\), it follows from (27) that \(u_X^\text { R}(x,x') \le u_X(x,x')\), for all \(x, x' \in X\). Combining these inequalities, we have that \(u_X^\text { R}(x,x') = u_X(x,x')\) for all \(x, x' \in X\), as wanted \(\square \)

### Proof of Theorem 2

In proving Theorem 2, we make use of the following lemma.

### Lemma 1

Let \(N=(X, A_X)\) be any network and \(\delta \) any positive constant. Suppose that \(x,x' \in X\) are such that their associated minimum chain cost [cf. (3)] satisfies \({\tilde{u}}^*_X(x, x') \ge \delta \). Then, there exists a partition \(P_\delta (x,x')=\{B_\delta (x), B_\delta (x')\}\) of the node set *X* into blocks \(B_\delta (x)\) and \(B_\delta (x')\) with \(x \in B_\delta (x)\) and \(x' \in B_\delta (x')\) such that \(A_X(b, b') \ge \delta \), for all points \(b \in B_\delta (x)\) and \(b' \in B_\delta (x')\).

### Proof

We prove this by contradiction. If a partition \(P_\delta (x,x')=\{B_\delta (x), B_\delta (x')\}\) with \(x \in B_\delta (x)\) and \(x' \in B_\delta (x')\) satisfying Lemma 1 does not exist for all pairs of points \(x,x'\in X\) satisfying \({\tilde{u}}^*_X(x, x') \ge \delta \), then there is at least one pair of nodes \(x,x'\in X\) satisfying \({\tilde{u}}^*_X(x, x') \ge \delta \) such that for *all* partitions of *X* into two blocks \(P=\{B, B'\}\) with \(x \in B\) and \(x' \in B'\) we can find at least a pair of elements \(b_P \in B\) and \(b'_P \in B'\) for which

Begin by considering the partition \(P_1=\{B_1, B'_1\}\) where \(B_1 = \{ x \}\) and \(B'_1 = X \backslash \{x\}\). Since (28) is true for all partitions having \(x\in B\) and \(x'\in B'\) and *x* is the unique element of \(B_1\), there must exist a node \(b'_{P_1} \in B'_1\) such that

Hence, the chain \(C(x, b'_{P_1})= [x, b'_{P_1}]\) composed of these two nodes has cost smaller than \(\delta \). Moreover, since \({\tilde{u}}^*_X(x, b'_{P_1})\) represents the minimum cost among all chains \(C(x, b'_{P_1})\) linking *x* to \(b'_{P_1}\), we can assert that \({\tilde{u}}^*_X(x, b'_{P_1}) \le A_X(x, b'_{P_1}) < \delta \). Consider now the partition \(P_2=\{B_2, B'_2\}\) where \(B_2= \{ x, b'_{P_1} \}\) and \(B'_2=X \backslash B_2\). From (28), there must exist a node \(b'_{P_2} \in B'_2\) that satisfies at least one of the two following conditions: i) \(A_X(x, b'_{P_2}) < \delta \), or ii) \(A_X(b'_{P_1}, b'_{P_2}) < \delta \). If i) is true, the chain \(C(x, b'_{P_2})=[x, b'_{P_2}]\) has cost smaller than \(\delta \). If ii) is true, we combine the dissimilarity bound with the one in (29) to conclude that the chain \(C(x, b'_{P_2})=[x, b'_{P_1}, b'_{P_2}]\) has cost smaller than \(\delta \). In either case we conclude that there exists a chain \(C(x, b'_{P_2})\) linking *x* to \(b'_{P_2}\) whose cost is smaller than \(\delta \). Therefore, the minimum chain cost must satisfy \({\tilde{u}}^*_X(x, b'_{P_2}) < \delta \). We can repeat this process iteratively where, e.g., partition \(P_3\) is composed by \(B_3= \{ x, b'_{P_1}, b'_{P_2} \}\) and \(B'_3=X\backslash B_3\), to obtain partitions \(P_1, P_2, \ldots , P_{n-1}\) and corresponding nodes \(b'_{P_1}, b'_{P_2}, \dots , b'_{P_{n-1}}\) such that the associated minimum chain cost satisfies \({\tilde{u}}^*_X(x, b'_{P_i}) < \delta \), for all *i*. Observe that nodes \(b'_{P_i}\) are distinct by construction and distinct from *x*. Since there are *n* nodes in the network it must be that \(x'=b'_{P_k}\) for some \(i \in \{1, \ldots , n-1\}\), entailing that \({\tilde{u}}^*_X(x, x') < \delta \), and reaching a contradiction. \(\square \)

Continuing with the proof of Theorem 2, to show that (A1)–(A2) imply (A1\('\))–(A2) let \({{\mathcal H}}\) be a method that satisfies (A1) and (A2) and denote by \((\{1, 2, \ldots , n\},u_{n, \alpha , \beta })= {{\mathcal H}}(\mathbf {\Delta }_n(\alpha ,\beta ,\varPi ))\). We want to prove that (A1\('\)) is satisfied which means that we have to show that for all indices \(n\in {{\mathbb N}}\), constants \(\alpha ,\beta >0\), permutations \(\varPi \) of \(\{1,\ldots ,n\}\), and points \(i\ne j\), we have \(u_{n, \alpha , \beta }(i,j)=\max (\alpha ,\beta )\). We will do so by showing both i) \(u_{n, \alpha , \beta }(i,j)\ \le \ \max (\alpha ,\beta )\), and ii) \(u_{n, \alpha , \beta }(i,j)\ \ge \ \max (\alpha ,\beta )\), for all \(n\in {{\mathbb N}}\), \(\alpha ,\beta >0\), \(\varPi \), and \(i\ne j\).

To prove (i), define the two-node network \(N_\mathrm {max}:= \mathbf {\Delta }_2(\max (\alpha , \beta ), \max (\alpha , \beta ))\) and define \(\big (\{p, q\}, u_{p,q}\big ) := {{\mathcal H}}(N_\mathrm {max})\). Since \({{\mathcal H}}\) abides by (A1),

Consider now the map \(\phi _{i,j}:\{p,q\} \rightarrow \{1,\ldots ,n\}\) from \(N_\mathrm {max}\) to the permuted canonical network \(\mathbf {\Delta }_n(\alpha ,\beta ,\varPi )\) where \(\phi _{i,j}(p)=i\) and \(\phi _{i,j}(q)=j\). Since dissimilarities in \(\mathbf {\Delta }_n(\alpha ,\beta ,\varPi )\) are either \(\alpha \) or \(\beta \) and the dissimilarities in the two-node network are \(\max (\alpha ,\beta )\) it follows that the map \(\phi _{i,j}\) is dissimilarity reducing regardless of the particular values of *i* and *j*. Since the method \({{\mathcal H}}\) was assumed to satisfy (A2) as well, we must have \(u_{p,q} (p,q) \ge u_{n, \alpha , \beta }\big (\phi _{i,j}(p),\phi _{i,j}(q)\big ) = u_{n, \alpha , \beta }(i,j)\). Inequality (i) follows form substituting (30) into this last expression.

In order to show inequality (ii), pick two arbitrary distinct nodes \(i, j \in \{1,\ldots ,n\}\) in the node set of \(\mathbf {\Delta }_n(\alpha ,\beta ,\varPi )\). Denote by *C*(*i*, *j*) and *C*(*j*, *i*) two minimizing chains in the definition (3) of the directed minimum chain costs \({\tilde{u}}^*_{n, \alpha , \beta }(i, j)\) and \({\tilde{u}}^*_{n, \alpha , \beta }(j, i)\) respectively. Observe that at least one of the following two inequalities must be true \({\tilde{u}}^*_{n, \alpha , \beta }(i, j) \ge \max (\alpha , \beta )\) or \({\tilde{u}}^*_{n, \alpha , \beta }(j, i) \ge \max (\alpha , \beta )\). Indeed, if both inequalities were false, the concatenation of *C*(*i*, *j*) and *C*(*j*, *i*) would form a loop \(C(i,i)=C(i,j) \uplus C(j,i)\) of cost strictly less than \(\max (\alpha , \beta )\). This cannot be true because \(\max (\alpha , \beta )\) is the minimum loop cost of the network \(\mathbf {\Delta }_n(\alpha ,\beta ,\varPi )\).

Without loss of generality assume \({\tilde{u}}^*_{n, \alpha , \beta }(i, j) \ge \max (\alpha , \beta )\) is true and consider \(\delta = \max (\alpha , \beta )\). By Lemma 1 we are therefore guaranteed to find a partition of the node set \(\{1,\ldots ,n\}\) into two blocks \(B_\delta (i)\) and \(B_\delta (j)\) with \(i \in B_\delta (i)\) and \(j \in B_\delta (j)\) such that for all \(b \in B_\delta (i)\) and \(b' \in B_\delta (j)\) it holds that

Define a two-node network \(N_\mathrm {min}:=\mathbf {\Delta }_2(\max (\alpha , \beta ), \min (\alpha , \beta ))=(\{r, s\}, A_{r,s})\) where \(A_{r,s}(r,s)=\max (\alpha , \beta )\) and \(A_{r,s}(s,r)=\min (\alpha , \beta )\) and define \((\{r,s\}, u_{r,s}) := {{\mathcal H}}(N_\mathrm {min})\). Since the method \({{\mathcal H}}\) satisfies (A1) we must have

Consider the map \(\phi '_{i,j} : \{1,\ldots ,n\} \rightarrow \{r, s\}\) such that \(\phi '_{i,j}(b)=r\) for all \(b \in B_\delta (i)\) and \(\phi '_{i,j}(b')=s\) for all \(b' \in B_\delta (j)\). The map \(\phi '_{i,j}\) is dissimilarity reducing because

for all \(k, l \in \{1,\ldots ,n\}\). To see the validity of (33) consider three different possible cases. If *k* and *l* belong both to the same block, i.e., either \(k,l \in B_\delta (i)\) or \(k,l \in B_\delta (j)\), then \(\phi '_{i,j}(k)=\phi '_{i,j}(l)\) and \(A_{r,s}(\phi '_{i,j}(k), \phi '_{i,j}(l))=0\), immediately satisfying (33). If \(k \in B_\delta (j)\) and \(l \in B_\delta (i)\) it holds that \(A_{r,s}(\phi '_{i,j}(k), \phi '_{i,j}(l)) = A_{r,s}(s, r)= \min (\alpha , \beta )\) which cannot exceed \(\varPi (A_{n,\alpha ,\beta })(k,l)\) which is either equal to \(\alpha \) or \(\beta \). If \(k \in B_\delta (i)\) and \(l \in B_\delta (j)\), then we have \(A_{r,s}(\phi '_{i,j}(k), \phi '_{i,j}(l)) = A_{r,s}(r, s)= \max (\alpha , \beta )\) but we also have \(\varPi (A_{n,\alpha ,\beta })(k,l)= \max (\alpha , \beta )\) as it follows by taking \(b=k\) and \(b'=l\) in (31), thus, again satisfying (33).

Since \({{\mathcal H}}\) fulfills the Axiom of Transformation (A2) we must have

Substituting (32) in (34) we obtain the inequality (ii). Combining both inequalities (i) and (ii), it follows that \(u_{n, \alpha , \beta }(i,j)= \max (\alpha ,\beta )\). Thus, admissibility with respect to (A1)–(A2) implies admissibility with respect to (A1\('\))–(A2). The opposite implication is immediate since (A1) is a particular case of (A1\('\)), concluding the proof. \(\square \)

### Proof of Theorem 3

We show that if a clustering method satisfies axioms (A1\('\)) and (A2) then it satisfies the Property of Influence (P1). Notice that this result, combined with Theorem 2, implies the statement of Theorem 3. The following lemma is instrumental in the ensuing proof.

### Lemma 2

Let \(N=(X, A_X)\) be an arbitrary network with *n* nodes and \(\mathbf {\Delta }_n(\alpha ,\beta )=(\{1,\ldots ,n\}, A_{n,\alpha ,\beta })\) be the canonical network with \(0 < \alpha \le \text { sep}(X, A_X)\) and \(\beta =\text { mlc}(X, A_X)\). Then, there exists a bijective dissimilarity-reducing map \(\phi :X\rightarrow \{1,\ldots ,n\}\), i.e. \(A_X(x, x') \ge A_{n, \alpha ,\beta }(\phi (x), \phi (x'))\), for all \(x, x' \in X\).

### Proof

To construct the map \(\phi \) consider the function \(P:X \rightarrow \mathcal {P}(X)\) from the node set *X* to its power set \(\mathcal {P}(X)\) such that \(P(x) :=\{ x' \in X \, | \, x' \ne x \,\, , \,\, A_X(x', x)<\beta \}\), for all \(x \in X\). Having \(r \in P(s)\) for some \(r,s \in X\) implies that \(A_X(r, s) < \beta =\text { mlc}(X,A_X)\). An important observation is that we must have a node \(x\in X\) whose *P*-image is empty. Otherwise, pick a node \(x_n\in X\) and construct the chain \([x_0, x_1, \ldots , x_n]\) where the *i*th element \(x_{i-1}\) of the chain is in the *P*-image of \(x_i\). From the definition of *P* it follows that all dissimilarities along this chain satisfy \(A_X(x_{i-1},x_{i})<\beta =\text { mlc}(X,A_X)\). But since the chain \([x_0, x_1, \ldots , x_n]\) contains \(n+1\) elements, at least one node must be repeated. Hence, we have found a loop for which all dissimilarities are bounded above by \(\beta =\text { mlc}(X,A_X)\), which is impossible because it contradicts the definition of the minimum loop cost in (5). We can then find a node \(x_{i_1}\) for which \(P(x_{i_1})=\emptyset \). Fix \(\phi (x_{i_1})=1\).

Select now a node \(x_{i_2}\ne x_{i_1}\) whose *P*-image is either \(\{x_{i_1}\}\) or \(\emptyset \), which we write jointly as \(P(x_{i_2}) \subseteq \{x_{i_1}\}\). Following a similar reasoning to the previous one, such a node must exist and fix \(\phi (x_{i_2})=2\). Repeat this process *k* times so that at step *k* we have \(\phi (x_{i_k})=k\) for a node \(x_{i_k} \not \in \{x_{i_1},x_{i_2},\ldots x_{i_{k-1}}\}\) whose P-image is a subset of the nodes already picked, i.e., \(P(x_{i_k}) \subseteq \{x_{i_1}, \ldots x_{i_{k-1}}\}\). Since all the nodes \(x_{i_k}\) are different, the map \(\phi \) with \(\phi (x_{i_k})=k\) is bijective. By construction, \(\phi \) is such that for all \(l>k\), \(x_{i_l}\notin P(x_{i_k})\). From the definition of *P*, this implies that the dissimilarity from \(x_{i_l}\) to \(x_{i_k}\) must satisfy \(A_X(x_{i_l},x_{i_k}) \ge \beta \), for all \(l>k\). Moreover, from the definition of the canonical matrix \(A_{n, \alpha ,\beta }\) we have that \(A_{n, \alpha ,\beta }(\phi (x_{i_l}), \phi (x_{i_k})) = A_{n, \alpha ,\beta }(l,k) = \beta \) for all \(l>k\). By combining these two expressions, we conclude that \(A_X(x, x') \ge A_{n, \alpha ,\beta }(\phi (x), \phi (x'))\) is true for all points with \(\phi (x)>\phi (x')\). When \(\phi (x)<\phi (x')\), we have \(A_{n, \alpha ,\beta }(\phi (x), \phi (x'))=\alpha \) which was assumed to be bounded above by the separation of the network \((X, A_X)\), thus, \(A_{n, \alpha ,\beta }(\phi (x), \phi (x'))\) is not greater than any positive dissimilarity in the range of \(A_X\). \(\square \)

Continuing the main proof of Theorem 3, consider a given arbitrary network \(N=(X, A_X)\) with \(X=\{x_1, x_2, \ldots , x_n\}\) and define \((X, u_X) :={{\mathcal H}}(X,A_X)\). The method \({{\mathcal H}}\) is known to satisfy (A1\('\)) and (A2) and we want to show that it satisfies (P1) for which we need to show that \(u_X(x,x')\ge \text { mlc}(X,A_X)\) for all \(x\ne x'\).

Consider the canonical network \(\mathbf {\Delta }_n(\alpha ,\beta )=(\{1,\ldots ,n\}, A_{n,\alpha ,\beta })\) with \(\beta =\text { mlc}(X, A_X)\) being the minimum loop cost of the network *N* and \(\alpha >0\) a constant not exceeding the separation of the network. Thus, we have \(\alpha \le \text { sep}(X,A_X)\le \text { mlc}(X,A_X)=\beta \). Note that networks *N* and \(\mathbf {\Delta }_n(\alpha ,\beta )\) have equal number of nodes.

Defining \((\{1,\ldots ,n\}, u_{\alpha ,\beta }) := {{\mathcal H}}(\mathbf {\Delta }_n(\alpha ,\beta ))\), since \({{\mathcal H}}\) satisfies the Extended Axiom of Value (A1\('\)), then for all indices \(i, j \in \{1,\ldots ,n\}\) with \(i \ne j\) we have

Further, focus on the bijective dissimilarity-reducing map considered in Lemma 2 and notice that since \({{\mathcal H}}\) satisfies (A2) it follows that for all \(x, x' \in X\)

Since the equality in (35) is true for all \(i\ne j\) and since all points \(x\ne x'\) are mapped to points \(\phi (x)\ne \phi (x')\) because \(\phi \) is bijective, (36) implies \(u_X(x,x') \ge \beta = \text { mlc}(X, A_X)\), for all distinct \(x, x' \in X\). \(\square \)

### Proof of Theorem 4

We prove the theorem by showing both inequalities in (20).

*Proof of* \({u^{\text { NR}}_X(x,x') \le u_X(x,x')}\) Recall that validity of (A1)–(A2) implies validity of (P1) by Theorem 3. Consider the nonreciprocal clustering equivalence relation \(\sim _{\text { NR}_X(\delta )}\) at resolution \(\delta \) according to which \(x \sim _{\text { NR}_X(\delta )} x'\) if and only if *x* and \(x'\) belong to the same nonreciprocal cluster at resolution \(\delta \). Notice that this is true if and only if \(u^{\text { NR}}_X(x,x')\le \delta \). Further consider the set \(Z := X \mod \sim _{\text { NR}_X(\delta )}\) of corresponding equivalence classes and the map \(\phi _{\delta }:X\rightarrow Z\) that maps each point of *X* to its equivalence class. Notice that *x* and \(x'\) are mapped to the same point *z* if they belong to the same cluster at resolution \(\delta \).

We define the network \(N_Z:=(Z,A_Z)\) by endowing *Z* with the dissimilarity \(A_Z\) derived from the dissimilarity \(A_X\) as

The dissimilarity \(A_Z(z,z')\) compares all the dissimilarities \(A_X(x,x')\) between a member of the equivalence class *z* and a member of the equivalence class \(z'\) and sets \(A_Z(z,z')\) to the value corresponding to the least dissimilar pair; see Fig. 12. Notice that according to construction, the map \(\phi _\delta \) is dissimilarity reducing \(A_X(x,x') \ge A_Z(\phi _\delta (x),\phi _\delta (x'))\), because we either have \(A_Z(\phi _\delta (x),\phi _\delta (x'))=0\) if *x* and \(x'\) are co-clustered at resolution \(\delta \), or \(A_X(x,x') \ge \min _{x\in \phi _\delta ^{-1}(z), x'\in \phi _\delta ^{-1}(z')} A_X(x,x') = A_Z(\phi _\delta (x),\phi _\delta (x'))\) if they are mapped to different equivalent classes.

Consider now an arbitrary method \({{\mathcal H}}\) satisfying axioms (A1)–(A2) and denote by \((Z,u_Z) = {{\mathcal H}}(N_Z)\) the outcome of \({{\mathcal H}}\) when applied to \(N_Z\). To apply Property (P1) to this outcome we determine the minimum loop cost of \(N_Z\) in the following claim.

### Claim 1

The minimum loop cost of the network \(N_Z\) is \(\text { mlc}(N_Z) > \delta \).

### Proof

Assume that Claim 1 is not true, denote by \(C(z, z) = [z, z', \ldots , z^{(l)}, z]\) a loop of cost smaller than \(\delta \) and consider arbitrary nodes \(x\in \phi _\delta ^{-1}(z)\) and \(x'\in \phi _\delta ^{-1}(z')\). By definition, given two nodes in the same equivalence class, we can always find a chain from one to the other of cost not larger than \(\delta \). Moreover, since we are assuming that \(A_Z(z,z') \le \delta \), this implies that there exists at least one node \(x_1\) belonging to class *z* and another node \(x_2\) belonging to \(z'\) such that \(A_X(x_1, x_2) \le \delta \). Combining these two facts, we can guarantee the existence of a chain from *x* to \(x'\) of cost not larger than \(\delta \), since we can go first from *x* to \(x_1\) then from \(x_1\) to \(x_2\) and finally from \(x_2\) to \(x'\) without encountering dissimilarities greater than \(\delta \). In a similar way, we can go from \(x'\) to *x* by constructing a chain that goes through all the equivalence classes in *C*(*z*, *z*), i.e., from \(z'\) to \(z''\) then to \(z^{(3)}\) and so on until we reach *z*. Since we can go from *x* to \(x'\) and back with chains of cost not exceeding \(\delta \), it follows that \(u^{\text { NR}}_X(x,x')\le \delta \) contradicting the assumption that *x* and \(x'\) belong to different equivalent classes. Therefore, the assumption that Claim 1 is false cannot hold. \(\square \)

Continuing with the main proof, since the minimum loop cost of \(N_Z\) satisfies Claim 1 it follows from Property (P1) that \( u_Z(z,z') > \delta \) for all pairs of distinct equivalent classes \(z,z'\). Further note that, since \(\phi _\delta \) is dissimilarity reducing, Axiom (A2) implies that \(u_X(x,x') \ge u_Z(z,z')\). Combining these facts, we can conclude that when *x* and \(x'\) map to different equivalence classes it holds that \(u_X(x,x') \ge u_Z(z,z') >\delta \). Recall that *x* and \(x'\) mapping to different equivalence classes is equivalent to \( u^{\text { NR}}_X(x,x')>\delta \). Consequently, we can claim that \(u^{\text { NR}}_X(x,x')>\delta \) implies \(u_X(x,x')>\delta \), or, in set notation that \(\{(x,x') : u^{\text { NR}}_X(x,x')>\delta \} \subseteq \{(x,x') : u_X(x,x')>\delta \}\). Since the previous expression is true for arbitrary \(\delta >0\) it implies that \(u^{\text { NR}}_X(x,x') \le u_X(x,x')\) for all \(x, x' \in X\) as in the first inequality in (20). \(\square \)

*Proof of* \({u_X(x,x') \le u^{\text { R}}_X(x,x')}\) To prove the second inequality in (20) consider points *x* and \(x'\) with reciprocal ultrametric \(u^{\text { R}}_X(x,x') = \delta \). Let \(C^*(x,x')=[x=x_0,\ldots , x_l=x']\) be a chain achieving the minimum in (15) so that we can write

Turn attention to the symmetric two-node network \(\mathbf {\Delta }_2(\delta ,\delta )= (\{p,q\}, A_{p,q})\) with \(A_{p,q}(p,q)=A_{p,q}(q,p)=\delta \) and define \((\{p,q\}, u_{p,q}) := {{\mathcal H}}(\mathbf {\Delta }_2(\delta ,\delta ))\). Notice that according to Axiom (A1) we have \(u_{p,q}(p,q) = \max (\delta , \delta )=\delta \).

Focus now on transformations \(\phi _i:\{p,q\}\rightarrow X\) given by \(\phi _i(p)=x_i\), \(\phi _i(q)=x_{i+1}\) so as to map *p* and *q* to subsequent points in the chain \(C^*(x,x')\) used in (38). Since it follows from (38) that \(A_X(x_i,x_{i+1})\le \delta \) and \(A_X(x_{i+1},x_i) \le \delta \) for all *i*, it is just a simple matter of notation to observe that

Since according to (39) transformations \(\phi _i\) are dissimilarity reducing, it follows from Axiom (A2) that \(u_X(x_i,x_{i+1}) = u_X(\phi _i(p),\phi _i(q)) \le u_{p,q}(p,q) = \delta \), for all *i*. To complete the proof we use the fact that since \(u_X\) is an ultrametric and \(C^*(x,x')=[x=x_0,\ldots , x_l=x']\) is a chain joining *x* and \(x'\) the strong triangle inequality dictates [cf. (7)] that \(u_X(x,x') \le \max _i u_X(x_i,x_{i+1}) \le \delta \). The proof of the second inequality in (20) follows by substituting \(\delta = u^{\text { R}}_X(x,x')\) [cf. (38)]. \(\square \)

Having showed both inequalities in (20), the global proof concludes. \(\square \)

### Proof of Theorem 5

Suppose there exists a clustering method \({{\mathcal H}}\) that satisfies axioms (A1\(''\)) and (A2) but does not satisfy Property (P1\('\)). This means that there exists a network \(N=(X, A_X)\) with output ultrametrics \((X, u_X)={{\mathcal H}}(N)\) for which \(u_X(x_1, x_2) < \text { sep}(X, A_X)\) for at least one pair of nodes \(x_1 \ne x_2 \in X\). Focus on a symmetric two-node network \(\mathbf {\Delta }_2(s,s)=(\{p,q\}, A_{p,q})\) with \(A_{p,q}(p,q)=A_{p,q}(q,p)=s = \text { sep}(X, A_X)\) and define \((X, u_{p,q})={{\mathcal H}}(\mathbf {\Delta }_2(s,s))\). From Axiom (A1\(''\)), we must have that

Construct the map \(\phi :X \rightarrow \{p,q\}\) from the network *N* to \(\mathbf {\Delta }_2(s,s)\) that takes node \(x_1\) to \(\phi (x_1)=p\) and every other node \(x \ne x_1\) to \(\phi (x)=q\). No dissimilarity can be increased when applying \(\phi \) since every dissimilarity is mapped either to zero or to \(\text { sep}(X, A_X)\) which is by definition the minimum dissimilarity in the original network [cf. (6)]. Hence, \(\phi \) is dissimilarity reducing and from Axiom (A2) it follows that \(u_X(x_1, x_2) \ge u_{p,q}(\phi (x_1), \phi (x_2)) = u_{p,q}(p,q)\). By substituting (40) into the previous expression, we contradict \(u_X(x_1, x_2) < \text { sep}(X, A_X)\) proving that such method \({{\mathcal H}}\) cannot exist. \(\square \)

### Proof of Proposition 5

To show fulfillment of (A1\(''\)), consider the network \(\mathbf {\Delta }_2(\alpha , \beta )\) and define \((\{p,q\}, u^{\text { U}}_{p, q}) :={{\mathcal H}}^\text { U}(\mathbf {\Delta }_2(\alpha , \beta ))\). Since every chain connecting *p* and *q* must contain these two nodes as consecutive nodes, applying the definition in (25) yields \(u^{\text { U}}_{p, q}(p,q) = \min \big (A_{p,q}(p,q), A_{p,q}(q,p)\big ) = \min (\alpha ,\beta )\), and Axiom (A1\(''\)) is thereby satisfied. In order to show fulfillment of Axiom (A2), the proof is analogous to the one developed in Proposition 3. The proof only differs in the appearance of minimizations instead of maximizations to account for the difference in the definitions of unilateral and reciprocal ultrametrics [cf. (15) and (25)]. \(\square \)

### Proof of Theorem 6

Given an arbitrary network \((X, A_X)\), denote by \({{\mathcal H}}\) a clustering method that fulfills axioms (A1\(''\)) and (A2) and define \((X, u_X) := {{\mathcal H}}(X, A_X)\). Then, we show the theorem by proving the following inequalities for all nodes \(x, x' \in X\),

### Proof of leftmost inequality in (41)

Consider the unilateral clustering equivalence relation \(\sim _{\text { U}_X(\delta )}\) at resolution \(\delta \) according to which \(x \sim _{\text { U}_X(\delta )} x'\) if and only if *x* and \(x'\) belong to the same unilateral cluster at resolution \(\delta \). That is, \(x \sim _{\text { U}_X(\delta )} x' \iff u^{\text { U}}_X(x,x')\le \delta \). Further, as in the proof of Theorem 4, consider the set *Z* of equivalence classes at resolution \(\delta \). That is, \(Z := X \mod \sim _{\text { U}_X(\delta )}\). Also, consider the map \(\phi _{\delta }:X\rightarrow Z\) that maps each point of *X* to its equivalence class. Notice that *x* and \(x'\) are mapped to the same point *z* if and only if they belong to the same block at resolution \(\delta \), consequently \(\phi _\delta (x) = \phi _\delta (x') \iff u^{\text { U}}_X(x,x')\le \delta \). We define the network \(N_Z=(Z,A_Z)\) by endowing *Z* with the dissimilarity function \(A_Z\) derived from \(A_X\) as explained in (37). For further details on this construction, review the corresponding proof in Theorem 4 and see Fig. 12. We stress the fact that the map \(\phi _\delta \) is dissimilarity reducing for all \(\delta \).

### Claim 2

The separation of the equivalence class network \(N_Z\) is \(\text { sep}(N_Z) > \delta \).

### Proof

First, observe that by definition of unilateral clustering (25), we know that,

since a two-node chain between nodes *x* and \(x'\) is a particular chain joining the two nodes whereas the ultrametric is calculated as the minimum over all chains. Now, assume that \(\text { sep}(N_Z) \le \delta \). Therefore, by (37) there exists a pair of nodes *x* and \(x'\) that belong to different equivalence classes and have \(A_X(x,x')\le \delta \). However, if *x* and \(x'\) belong to different equivalence classes, they cannot be clustered at resolution \(\delta \), hence, \(u^U_X(x,x')>\delta \). Inequalities \(A_X(x,x')\le \delta \) and \(u^U_X(x,x')>\delta \) cannot hold simultaneously since they contradict (42). Thus, it must be that \(\text { sep}(N_Z) > \delta \). \(\square \)

Define \((Z,u_Z) := {{\mathcal H}}(Z,A_Z)\) and, since \(\text { sep}(N_Z)>\delta \) (cf. Claim 2), it follows from Property (P1\('\)) that for all \(z \ne z'\) it holds \(u_Z(z,z') > \delta \). Further, recalling that \(\phi _\delta \) is a dissimilarity-reducing map, from Axiom (A2) we must have \(u_X(x,x') \ge u_Z(\phi _\delta (x), \phi _\delta (x')) = u_Z(z, z')\) for some \(z, z' \in Z\). This fact, combined with \(u_Z(z,z') > \delta \), entails that when \(\phi _\delta (x)\) and \(\phi _\delta (x')\) belong to different equivalence classes \(u_X(x,x') \ge u_Z(\phi (x),\phi (x')) >\delta \). Notice now that \(\phi _\delta (x)\) and \(\phi _\delta (x')\) belonging to different equivalence classes is equivalent to \( u^{\text { U}}_X(x,x')>\delta \). Hence, we can state that \(u^{\text { U}}_X(x,x')>\delta \) implies \(u_X(x,x')>\delta \) for any arbitrary \(\delta >0\). In set notation, \(\{(x,x') : u^{\text { U}}_X(x,x')>\delta \} \subseteq \{(x,x') : u_X(x,x')>\delta \}\). Since the previous expression is true for arbitrary \(\delta >0\), this implies that \(u^{\text { U}}_X(x,x') \le u_X(x,x')\), proving the left inequality in (41). \(\square \)

### Proof of rightmost inequality in (41)

Consider two nodes *x* and \(x'\) with unilateral ultrametric value \(u^{\text { U}}_X(x,x') = \delta \). Let \(C^*(x,x')=[x=x_0,\ldots , x_l=x']\) be a minimizing chain in the definition (25) so that we can write

Consider the two-node network \(\mathbf {\Delta }_2(\delta , M)=(\{p,q\}, A_{p,q})\) where \(M := \max _{x,x'} A_X(x,x') \) and define \((\{p, q\}, u_{p,q}) :={{\mathcal H}}(\{p,q\},A_{p,q})\). Notice that according to Axiom (A1\(''\)) we have \(u_{p,q}(p,q) = u_{p,q}(q,p) = \min ( \delta , M) = \delta \), where the last equality is enforced by the definition of *M*.

Focus now on each link of the minimizing chain in (43). For every successive pair of nodes \(x_i\) and \(x_{i+1}\), we must have

Expression (44) is true since *M* is defined as the maximum dissimilarity in \(A_X\). Inequality (45) is justified by (43), since \(\delta \) is defined as the maximum among links of the minimum distance in both directions of the link. This observation allows the construction of dissimilarity-reducing maps \(\phi _i:\{p,q\}\rightarrow X\),

In this way, we can map *p* and *q* to subsequent nodes in the chain \(C(x,x')\) used in (43). Inequalities (44) and (45) combined with the map definition in (46) guarantee that \(\phi _i\) is a dissimilarity-reducing map for every *i*. Since clustering method \({{\mathcal H}}\) satisfies Axiom (A2), it follows that

Substituting \(\phi _i(p)\) and \(\phi _i(q)\) in (47) by the corresponding nodes given by the definition (46), we can write \(u_X(x_i, x_{i+1})= u_X(x_{i+1}, x_i) \le \delta \), for all *i*, where the symmetry property of ultrametrics was used. To complete the proof we invoke the strong triangle inequality (7) and apply it to \(C(x,x')=[x=x_0,\ldots , x_l=x']\), the minimizing chain in (43). As a consequence, \(u_X(x,x') \le \max _i u_X(x_i,x_{i+1}) \le \delta \). The proof of the right inequality in (41) is completed by substituting \(\delta = u^{\text { U}}_X(x,x')\) [cf. (43)] into the last previous expression. \(\square \)

Having proved both inequalities in (41), unilateral clustering is the only method that satisfies axioms (A1\(''\)) and (A2), completing the global proof. \(\square \)

### Proof of Theorem 7

The leftmost inequality in (26) can be proved using the same method of proof used for the leftmost inequality in (41) within the proof of Theorem 6. The proof of the rightmost inequality in (26) is equivalent to the proof of the rightmost inequality in Theorem 4. \(\square \)

## Rights and permissions

## About this article

### Cite this article

Carlsson, G., Mémoli, F., Ribeiro, A. *et al.* Hierarchical clustering of asymmetric networks.
*Adv Data Anal Classif* **12, **65–105 (2018). https://doi.org/10.1007/s11634-017-0299-5

Received:

Revised:

Accepted:

Published:

Issue Date:

### Keywords

- Hierarchical clustering
- Asymmetric network
- Directed graph
- Axiomatic construction
- Reciprocal clustering
- Nonreciprocal clustering

### Mathematics Subject Classification

- 91C20
- 62H30