Robust dissimilarity comparisons with categorical outcomes

The analysis of many phenomena requires partitioning societies into groups and studying the extent at which these groups are distributed with different intensities across relevant non-ordered categorical outcomes. When the groups are similarly distributed, their members have equal chances to achieve any of the attainable outcomes. Otherwise, a form of dissimilarity between groups distributions prevails. We characterize axiomatically the dissimilarity partial order of multi-group distributions defined over categorical outcomes. The main result provides an equivalent representation of this partial order by the ranking of multi-group distributions originating from the inclusion of their zonotope representations. The zonotope inclusion criterion refines (that is, is implied by) majorization conditions that are largely adopted in mainstream approaches to multi-group segregation or univariate and multivariate inequality analysis.


Introduction
The analysis of many phenomena requires partitioning societies into groups and studying the extent at which these groups are distributed with different intensities across relevant non-ordered categorical outcomes. For instance, residential segregation occurs in situations in which ethnic groups sort with different intensity across neighborhoods of a city. Likewise, school or occupational segregation is concerned with the uneven distribution of ethnic groups across schools or jobs.
In other cases, the interest is on the way shares of one or many attributes (such as income, wealth or consumption of different goods) are assigned across population units (such as countries, households or individuals) and the extent at which these distributions differ from the distribution of a normatively relevant benchmark, such as the demographic weights of the units. In this case, the focus is on uni-or multidimensional inequality and the often invoked anonymity principle would regard the way units are ordered as irrelevant.
All these examples are concerned with the extent of dissimilarity between two or more distributions defined over classes of non-ordered realizations. There is widespread agreement in the literature about what constitutes lack of segregation or equality: These are situations in which the groups are similarly distributed across the classes of realizations. The relevant notion of similarity we refer to dates back to Gini (1914), who argues that two (or more) groups are similarly distributed whenever "the populations of the two groups take the same values with the same frequency." 1 Although a well-established methodology exists for analyzing dissimilarity between two distributions, the literature disagrees about the way such comparisons can be extended to the multi-group setting.
This paper develops the axiomatic foundations for the measurement of multigroup dissimilarity and provides equivalent testable conditions. We embrace a convenient (and equivalent) way of representing empirical discrete distributions through matrix notation. Each distribution matrix displays (by row) the distributions of individuals belonging to one of many (at least two) groups across classes of realizations (by column). The example below refers to distributions of three groups across a variable number of classes: Entries of these matrices can be interpreted as frequencies so that, for instance, the share of group 2 in class 3 in is 80%. In the context of school segregation analysis, each of the two matrices above portrays the way students from each of three distinct ethnic groups are distributed across schools in a schooling district. Matrix  Gini (1914, p. 189), translated from Italian, formalizes similarity as proportionality: "If n is the size of group , m is the size of group , n x the size of group assigned to class x and m x the size of group assigned to the same class, then it should hold [under similarity] that, for any value of x, n x m x = n m ."

3
Robust dissimilarity comparisons with categorical outcomes depicts a district with four schools whereas matrix depicts another district with three schools.
Many existing criteria can be employed to compare distribution matrices and by the extent of dissimilarity displayed by their rows, such as dissimilarity indices or majorization conditions. The ranking produced by one or few indices, however, can be challenged by the use of alternative, yet plausible, measures whereas majorization conditions are robust to this criticism but could be empirically untractable. In this paper, we consider all dissimilarity orderings that are consistent with some normatively relevant axioms and we focus on the intersection of all such orderings as a robust criterion for dissimilarity analysis. It is well known (see Donaldson and Weymark 1998) that such criterion leads to a partial order of distribution matrices which induces unanimity in the way matrices are ordered by all underlying dissimilarity orderings satisfying the desirable axioms. The axioms that we consider characterize the ordering "displays at most as much dissimilarity as" by the possibility of obtaining from thorough sequences of elementary operations that either preserve dissimilarity between the matrices' rows (such as permuting the labels of groups and classes, adding and deleting classes which are empty, splitting proportionally classes) or reduce it (such as merging groups frequencies across two classes of the same distribution matrix) and by some consistency properties of the dissimilarity orderings (with respect to the possibility of producing convex mixtures of classes).
The axioms that we study allow to compare only matrices with the same number of groups but extend dissimilarity comparisons to matrices with a different number of classes. Such extension is relevant for empirical applications. Moreover, the possibility of considering comparisons between distribution matrices with a different number of classes will make explicit the normative content of this approach by highlighting the combination of operations that make possible to rank distribution matrices with the same number of classes.
The main result, in Theorem 1, establishes that the partial order of distribution matrices consistent with our axiomatic model is equivalent to the partial order of distribution matrices induced by the inclusion of their zonotope representations. A zonotope is a convex geometric set representation of the data, defined in the space of groups frequencies, which is originated by the Minkowski sum (i.e., element by element sum of fractions) of all column vectors of a distribution matrix. 2 In inequality analysis, zonotopes have been used to derive interesting multivariate extensions of the Lorenz curve (Koshevoy and Mosler 1996;Mosler 2012). Theorem 1 shows that the zonotope inclusion criterion is also relevant for dissimilarity analysis, for at least three reasons.
First, we demonstrate that the zonotope inclusion criterion is consistent with the implications of operations that unambiguously preserve or reduce dissimilarity. Each of these operations is found to have clear and intuitive consequences on the shape of the zonotope and hence provides a normative justification for using the zonotope inclusion criterion. Second, the zonotope inclusion criterion is a refinement of relevant majorization conditions and it is hence implied by them. We demonstrate that uniform and matrix majorization criteria, which are widely adopted criteria in robust multivariate distributional analysis Marshall et al. (2011), are related to the dissimilarity measurement model developed here. 3 In the multi-group setting, some matrices that cannot be ranked by matrix majorization can still be robustly ranked by zonotope inclusion, whereas the two criteria coincide only in the two-groups setting. As a counterexample, we use matrices in (1) to show in Appendix A.15 that the zonotope of matrix is included in the zonotope of matrix , whereas is not majorized by . Third, the zonotope inclusion criterion can be empirically tested, whereas this is seldom the case for majorization conditions.
Theorem 1 and the corollaries implied by it contribute to the literature along the following lines. First, the results identify the differences between the zonotope inclusion criterion and majorization conditions. In particular, being matrix majorized by postulates the existence of a sequence of dissimilarity reducing or preserving transformations mapping the classes of into those of . The zonotope inclusion condition focuses instead separately on each class of and requires that any such class could be obtained through dissimilarity reducing or preserving transformations of the classes of . There is no guarantee that such operations can be organized into a sequence, making the zonotope inclusion criterion a refinement of matrix majorization (as observed in Dahl 1999).
Second, Theorem 1 provides an axiomatic justification for using zonotope inclusion as a multi-group criterion that is more robust than (i.e., implies) alternative refinements of matrix majorization based on sequential comparisons of two-groups distributions or on specific dissimilarity indices. Although any two-groups projection of a multi-group zonotope originates a zonotope itself, assessing inclusion among all two-groups projections is not sufficient to conclude about inclusion of the multi-group zonotopes. This is because the zonotope inclusion criterion fails to satisfy a consistency property of partial orders, requiring that if two distribution matrices differ only in terms of two groups distributions (i.e., two rows), then the two matrix could be ranked by focusing on dissimilarity in the sub-matrices associated with these groups Moulin (2016). Failing this property is desirable in a multigroup context, because the dissimilarity ranking of matrices that differ by two or few groups distributions should not only depend on dissimilarity between these distributions, but also on the extent at which these distributions are dissimilar from the rest. This feature makes the dissimilarity criterion robust against potential aggregation biases (originating, for instance, the Simpson's paradox, see Blyth 1972).
Third, Theorem 1 rationalizes the normative underpinnings of a variety of sparse and apparently unrelated results on the measurement of (multi-group) segregation and multivariate and univariate inequality, which are shown to be embedded within the dissimilarity model.
The paper is organized as follows. Relevant notation and a definition of the zonotope inclusion ordering is provided in Section 2. Axioms and the main result are in Section 3. Section 4 describes the usefulness of our results for related orders. Section 5 concludes. All proofs are collected in a dedicated appendix.

Notation
A distribution matrix of size d × n depicts a set of distributions (indexed by rows) of d ≥ 1 groups across n ≥ 2 disjoint non-ordered classes (indexed by columns), representing categories of realizations. We develop dissimilarity comparisons of distribution matrices with a fixed number d of groups but variable number of classes. These matrices are collected in the set where a ij is interpreted as the proportion of group i observed in class j and the column vector j collects the proportions of all groups attaining realization j. Matrices , ∈ M 3 in (1) offer two examples. The distribution matrices in M d are row stochastic, meaning that matrix ∈ M d represents a collection of d elements of the unit simplex Δ n A . We let a j ∶= ∑ d i=1 a ij denote the "size" of class j, obtained by weighting uniformly the groups occupying it. For instance, the size of class 1 of matrix in (1) is a 1 = 19 28 . We follow the convention of using boldface letters to indicate column vectors, so that j is a column vector corresponding to column j of an identity matrix n of size n × n , wherease n = (1, … , 1) � and n ∶= (0, … , 0) � are the column vector with all n entries respectively equal to 1 or 0. The superscript always denotes transposition.
Every distribution matrix lies in-between two extreme cases. The first case is that of perfect similarity, occurring when the distributions of the groups coincide and can be represented by the same row vector � ∈ Δ n . This situation is depicted by matrix , whose d rows are all equal to ′ . A maximal dissimilarity matrix is a disjoint-row-support matrix where each class is occupied at most by one group, but one group may occupy different classes. 4 If a distribution matrix displays the structure of , then it is not possible to forecast the group belonging from knowledge of the realization. Conversely, if a distribution matrix is as , then it is always possible to forecast the group from the knowledge of the class of the realizations. Any distribution matrix displays a structure which lies in-between that of and . We also consider transformation matrices that, when multiplied by a distribution matrix, produce effects on the extent of dissimilarity between the rows of such matrix. Denote by R n,m the set of n × m row stochastic matrices whose rows lie in Δ m . 5 Moreover, denote R n whenever m = n , while D n ⊆ R n is the set of doubly stochastic matrices whose rows and columns lie in Δ n . The set collecting all n × n permutation matrices is denoted by P n .

The zonotope set
Geometric representations of the data are useful to derive empirical tests for ranking uni-and multivariate distributions. Lorenz curves, for instance, are the workhorse of robust income inequality analysis. A Lorenz curve is obtained by arranging income observations in increasing order and then plotting the cumulative sum of these incomes shares against the cumulative sample frequencies. If the Lorenz curve of one income distribution lies above the Lorenz curve of another income distribution, then one can robustly rank the former distribution as less unequal than the latter.
Lorenz curves do not provide sufficient structure for ranking multivariate distributions, insofar each Lorenz curve allows to compare only one distribution at a time with a normatively relevant one, usually the distribution of population shares. Lorenz curves may be useful to compare distributions of two groups across relevant units, such as in school segregation analysis. In this case, Lorenz curves (known as segregation curves) portray the degree of dissimilarity between the distribution of a group's members across schools and the distribution of another group across the same schools. Multi-group extensions are however problematic even in this domain.
In this section, we consider using the zonotope representation of the data to implement multi-group comparisons and we investigate the robust and testable ordering of distribution matrices generated by the zonotope inclusion criterion.
The zonotope set Z( ) ⊆ [0, 1] d of a matrix ∈ M d is a convex polytope lying on the hypercube [0, 1] d that is symmetric with respect to the point 1 2 d (see McMullen 1971). It is defined as follows: Elements in Z( ) are identified by the Minkowski sum of the vectors with coordinates given by 's classes. In Fig. 1a we represent the 2-dimensions zonotope of the distribution matrix ∈ M 2 , defined as follows:

3
Robust dissimilarity comparisons with categorical outcomes The dimensionality of the example helps visualizing the way Z( ) is constructed. First, vectors with coordinates corresponding to classes of are plotted in the unit square and connected to the origin with line segments. In the figure, these vectors are marked with different symbols. For instance, the black square represents the fourth class of . Then, the resulting segments are tied together in any possible arrangement. In the figure, adding together the vector corresponding to classes one and three gives the vector with coordinates (0.7, 0.1), while adding this vector to the one representing the fourth class gives (0.9, 0.6). The resulting zonotope of is the grey area in the figure that contains all possible arrangements of these segments, or portions of them. Panel b) of Fig. 1 represents instead Z( ) taken from (1), which is defined on the three-dimensional space. We report with solid lines only the visible edges of Z( ) . The relevant facets of Z( ) originated by the sequential sum of 's classes, are coloured in light gray. The example above highlights two situations of interest. The maximum dissimilarity zonotope is the d-dimensional hypercube and corresponds to Z( ) . Its diagonal is the similarity zonotope, which corresponds to Z( ) . All distribution matrices displaying some dissimilarity originate zonotopes that lie in the maximum dissimilarity zonotope and that include the similarity zonotope. The shape of Z( ) and of Z( ) does not depend on the data in matrices and , thus highlighting the irrelevance of within-group heterogeneity for dissimilarity evaluations. More broadly, for each matrix in M d there exists only one zonotope representation, although the same zonotope may correspond to many distribution matrices.

The zonotope inclusion criterion
Zonotopes can be used to compare matrices by the extent of dissimilarity they exhibit. In this paper, we study the ranking of distribution matrices such as and that is generated by the inclusion of the zonotope representations of the two matrices, that is Z( ) ⊆ Z( ) . Inclusion can be easily checked when d = 2 from inspection of the zonotopes graphs. Figure 2 shows an example where Z(̃ ) ⊆ Z( ) , for a distribution matrix ̃ obtained after performing an element-to-element summation of classes 2 and 3 in , thereby leading to the central class of ̃ that contains 40% of the population of both groups. This operation is obviously leveling disparities in the distributions of groups 1 and 2, although similarity is not achieved. The inclusion criterion is more difficult to visualize when d = 3 . For instance, panel b) of Fig. 1 reports relevant facets of the zonotope Z( ) obtained at fixed proportions of group 1 distribution (in light gray). The figure also shows the corresponding facets of the zonotope Z( ) (in dark gray). In this specific example, it is sufficient to check inclusion of the facets of Z( ) into the corresponding facets of Z( ) to conclude about For general distribution matrices A and B, the zonotope inclusion criterion Z( ) ⊆ Z( ) has an intuitive interpretation in terms of disproportionality of groups frequencies, which may lead to conclude about the dissimilarity displayed by and . To see this, define an isopopulation line (when d = 2 ) or (hyper)plane (when d ≥ 3 ) as the set of all combinations of proportions of the groups collected in vectors ∈ Z( ) that add up to p ∈ [0, 1] , such that 1 d � d ⋅ = p . In other words, p is the average "size" of , obtained by weighting equally all groups. Figure 2 depicts an example, based on distribution matrices and, ̃ in which the dashed line segments p ′ , p ′′ and p ′′′ correspond to isopopulation lines. In general, Z( ) ⊆ Z( ) is verified if the set of all proportions of the groups adding up to p in is included in (i.e., is less dispersed than) the corresponding set of all proportions of the groups adding up to p in . The criterion is robust, given that the inclusion should be verified for all p's, thus implying that any disproportional allocation of groups attainable by merging classes of can also be obtained by merging the classes of , but not the reverse. Proportionality is always attained in Z( ) , in which case there is only one attainable allocations ∈ Z( ) lying on any isopopulation line p, that is = d p . Conversely, disproportionality is maximal in Z( ) , in which case every attainable allocation lying on the isopopulation line p can be obtained from the original data.
The zonotope inclusion criterion always ranks Z( ) ⊆ Z( ) ⊆ Z( ) for any ∈ M d and for any d. The inclusion criterion, however, entails only a partial order of distribution matrices: two matrices cannot be ordered if their respective zonotope representations intersect. In the following section, we characterize the normative content of the inclusion criterion in terms of unanimous agreement in ranking as Robust dissimilarity comparisons with categorical outcomes displaying less dissimilarity than by all orderings consistent with some basic dissimilarity axioms.

The dissimilarity partial order
We investigate the possibility of ordering distribution matrices according to the dissimilarity they display. A dissimilarity ordering is a complete and transitive binary relation ≼ on the set M d with symmetric part ∼ , that ranks ≼ whenever is at most as dissimilar as . 7 Given ∈ M d , any dissimilarity ordering should always rank ≼ ≼ for any matrix that can be represented as and . These matrices are respectively regarded to as equivalent representations of perfect similarity or of maximal dissimilarity, the focus being on differences across group distributions and not on the degree of heterogeneity in the distribution of each group across realizations. 8

Fig. 2 Zonotope inclusion
7 For any , , ∈ M d the relation ≼ is transitive if ≼ and ≼ then ≼ and complete if either ≼ or ≼ or both, in which case ∼ . 8 Rao (1982) distinguishes the notion of dissimilarity from that of diversity. The former reflects differences or similarities between populations/groups heterogeneity, the latter reflects instead heterogeneity within the same population/group (see also Nehring and Puppe 2002). To see this, let ′ be any row of a perfect similarity matrix . Any two perfect similarity matrices and ̃ such that ′ is a uniform distribu-One direct implication of this feature of the dissimilarity orderings is that distribution matrices that differ in their number of classes ( n A ≠ n B ) can be regarded as indifferent from the dissimilarity orderings perspective, as long as their group distributions differ across matrices, but the dissimilarity that they display within each matrix coincides across matrices. 9 For this reason, we focus on matrices in M d , which must have the same number of rows but may differ in the number of classes.
Each dissimilarity ordering induces a complete ranking of distribution matrices. In this paper, we are interested in the robust ranking of distribution matrices generated by the intersection of the dissimilarity orders satisfying some desirable properties. This is a partial order Donaldson and Weymark (1998), that we characterize axiomatically and for which we provide equivalent representations.

Axioms and preliminary results
Axioms are based on elementary operations that, when applied to distribution matrices, can reduce or preserve dissimilarity among groups. We focus on the dissimilarity orderings that rank distribution matrices consistently with the effects of these operations. In order to ease the understanding of the axioms, we contextualize the consequences of such operations in terms of dissimilarity in the distribution of groups of students with different ethnic background across the schools in a schooling district. This is commonly referred to as a problem of schooling segregation.
The first axiom defines the context, introducing an anonymity property with respect to the labels (and hence the arrangement) of the classes of a distribution matrix.
Axiom 1 IPC (Independence from Permutations of Classes) For any , ∈ M d with n A = n B = n , if = ⋅ n for a permutation matrix n ∈ P n then ∼ .
Axiom IPC restricts the focus to evaluations where the classes of a matrix can be freely permuted without affecting the extent of dissimilarity it displays. In the context of schooling segregation, the axiom posits that the names of the schools are irrelevant to conclude about the dissimilarity in the distributions of students across these schools. This is arguably the case if the schools cannot or should not be ordered according to their performances, their quality or their budget. Another implication of axiom IPC is that any distribution matrix that is obtained by permuting the columns of matrix has to be regarded to as an equivalent representation of maximal dissimilarity.
Footnote 8 (continued) tion across classes (high diversity) whereas ̃ ′ is a distribution concentrating the mass in few or one realization (low diversity) are regarded as equivalent by every dissimilarity ordering, i.e. ∼̃. 9 We provide an example along the lines of the previous footnote. There are many matrices of different size displaying the same structure as the perfect similarity matrix , for instance ̃ such that ̃ ′ is of size ñ > n , where n is the size of ′ . Yet, for any of such matrices, ∼̃.

3
Robust dissimilarity comparisons with categorical outcomes Next, we consider two transformations that extend comparability to distribution matrices that differ in the number of classes. The first transformation has to do with the insertion or elimination of empty classes, i.e., classes that are not occupied by groups. The operation consists in adding/eliminating column vectors of size d with only zero entries to/from the original distribution matrix. In the schooling segregation example, the operation corresponds to adding/eliminating schools with no students to/from the same school district. The presence of these "empty" schools in the district is irrelevant for assessing dissimilarity of groups distributions across the remaining schools of the district.

Axiom 2 IEC (Independence from Empty Classes) For any
The IEC axiom emphasizes dissimilarity originated from non-empty columns of a distribution matrix. If and differ only because of |n A − n B | empty classes in one of the two matrices, then the dissimilarity in should be regarded to as equivalent to that in . When combined with IPC, the axiom IEC allows to regard as indifferent all matrices obtained by adding or delating an empty class in any position.
The second transformation considered increases the number of classes by splitting proportionally (the groups frequencies in) a class into two new classes. This transformation requires to replicate one column of a distribution matrix and then to scale the entries of the original and of the replicated columns by the splitting coefficients ∈ (0, 1) and 1 − , respectively. This operation guarantees that the resulting distribution matrix is row stochastic and that the degree of proportionality of the groups frequencies in the new columns coincides with that in the original column. In the schooling segregation example, splitting a school would require to randomly allocate its students population (i.e., irrespectively of their group assignment) into two smaller institutes, so that ethnic proportions in the two new institutes are not altered. Frankel and Volij (2011) advocate a similar property (called composition invariance) in the study of multi-group school segregation (see also James and Taeuber 1985). The Independence from Split of Classes (ISC) axiom posits that the transformation described above is a source of indifference for every dissimilarity ordering.

Axiom 3 ISC (Independence from Split of Classes) For any
A split transformation increases the number of classes and modifies the shape of a distribution matrix, but it does not alter the proportionality of the groups. For this reason, it is regarded to as dissimilarity preserving.
The merge of classes transformation complements the split operation. A merge of classes is implemented by vector summation of two adjacent columns of a distribution matrix, irrespectively of the groups composition of each column. The operation has an immediate interpretation in the schooling segregation example: it consists in merging all students from two neighboring schools into a single, larger school. Each ethnic group in the school of destination is increased by an amount equal to the proportion of the corresponding group in the school of departure, which is then emptied. If one or both schools are empty, segregation does not increase nor decreases. Consider, instead, the case of two ethnic groups that are similarly distributed across almost all schools in a district, apart from two schools, such that a group is over-represented compared to the other in one school, and under-represented in the other school. Merging each of these two schools with other schools in the district would reduce the compositional differences, without eliminating them. A merge of these two schools would, instead, establish proportionality in ethnic composition across all schools, leading to perfect similarity. The Dissimilarity Decreasing Merge of Classes (MC) axiom states that every merge of classes transformation cannot increase dissimilarity.

Axiom 4 MC (Dissimilarity Decreasing Merge of Classes) For any
Consider obtaining from with a merge transformation adding together, distribution by distribution, the group proportions observed in two classes j and j+1 whenever j+1 = j , > 0 , such that j = (1 + ) j and k = k ∀k < j while k = k+1 ∀k > j . This operation leaves dissimilarity unaffected. The operation is opposite to a split, but supports the same indifference class, gathering all matrices , ∈ M d such that ∼ for all orderings consistent with axiom ISC, even if not consistent with MC.
Axioms MC, IEC, ISC and IPC are independent. The transitive closure of all dissimilarity orderings satisfying these axioms defines a partial order of distribution matrices (see Donaldson and Weymark 1998), which is represented by the matrix majorization criterion. We refer to this partial order as ≼ R , indicating that matrix is matrix majorized by whenever there exists ∈ R n A ,n B such that = (Marshall et al. 2011;Dahl 1999).
Proposition 1 For any , ∈ M d the following statements are equivalent: (i) ≼ for all orderings ≼ satisfying axioms MC, ISC, IEC and IPC.
The notion of matrix majorization has been investigated in a variety of contexts (see p. 625 in Marshall et al. (2011) and literature therein), the most relevant being that of comparisons of informativeness of statistical experiments with a finite number of outcomes. 10 The characterization in Proposition 1, which is alternative to Grant et al. (1998), Frankel andVolij (2011) and Lasso de la Vega and Volij (2014), shows that every informativeness comparison of matrices in M d verifies the existence of dissimilarity preserving and reducing transformations mapping the most informative distribution matrix into the least informative one.
While appealing, matrix majorization entails the possibility of ranking distribution matrices only if there exists a sequence of relevant transformations of the data that can be represented by a row stochastic matrix. This requirement is very stringent in some cases. Consider for instance matrices and in (1). It is shown in Appendix A.15 that every column of can be obtained by splitting and merging the columns of . Yet, these operations cannot be arranged to form a sequence in this specific example. As a consequence, these operations cannot be represented in the form of a row stochastic matrix, thereby yielding that does not matrix majorize . In fact, as shown in the appendix, there is a unique admissible transformation matrix with non-negative entries, denoted yielding = . The transformation matrix is, clearly, not row stochastic (for a related example, see Koshevoy 1995).
We address the concerns raised by the example above by introducing a new class of dissimilarity axioms. These axioms relax the possibility of ranking distribution matrices exclusively by mean of (sequences of) merge transformations, invoking instead a form of consistency of the dissimilarity orderings with respect to convex combinations of (columns of) distribution matrices. The first axiom, denoted Strong-MixC, states that if there are matrices 1 , . … , m that are ranked as not more dissimilar than , then the convex mix of such matrices yields a matrix that cannot display more dissimilarity than . Before stating the axiom, recall that any element of the convex hull (denoted conv) of matrices j = ( The axiom name follows from the fact that every mix of matrices can be interpreted as a specific mix of classes that assigns uniform weights to the classes of the same matrix.

Axiom 5 Strong-MixC (Dissimilarity Consistency with Uniform Mixing of Classes)
is impossible to disentangle the underlying signal from observation of the experiment outcome and would be less informative than any matrix .
Footnote 10 (continued) The axiom Strong-MixC postulates a "betweenness" property (see Dekel 1986) for dissimilarity orderings, meaning that all orderings satisfying it regard a new distribution whose classes are obtained as convex combinations of the respective classes of 1 , . … , m as not more dissimilar than . The fact that such convex combination assigns the same weight w j to each class of a matrix j , with ∑ j w j = 1 , guarantees that ∈ M d . Matrix may be regarded as more dissimilar than some of the matrices j , but it cannot display more dissimilarity than given that j ≼ ∀j.
The normative appeal of axiom Strong-MixC rests in its relation with operations of mixing of columns of different matrices. Such operations are regarded to as unambiguously dissimilarity not increasing, given their relation with merge of classes operations. We contextualize this point in terms of school segregation analysis. The axiom Strong-Mix implies that every school or merge of schools from schooling district (i.e., columns of the distribution matrix) can always be obtained through a convex combination of schools or merges of schools issued from the schooling districts 1 , . … , m . Formally, let V 01 n be the set of n × 1 vectors whose elements are either 0 or 1. Non-zero entries of vector ∈ V 01 n identify classes (or one class) of a distribution matrix, so that yields a new school that is obtained by merging some of 's schools (or by keeping one of its schools). Any mixing operation underlying axiom Strong-MixC always grants that: where conv is the convex hull of such vectors.
Arguably, every merge of schools smooths the extent of ethnic disparities within a schooling district and can contribute to reduce segregation. Condition (4) implies that there are less opportunities to reduce segregation by merging some of the schools in schooling district compared to the extent of opportunities for reducing segregation that are available by merging schools in districts 1 , . … , m . In fact, the former district/matrix can always be obtained as a combination of the latter matrices, whereas the reverse may not be true. As an example, consider the (rather extreme) case in which is the similarity matrix: in this case there are no opportunities at all to reduce segregation by merging 's classes, being segregation already at its minimum.
Condition (4) is always granted by axiom Strong-MixC, which imposes additional structure. 11 We consider a new axiom, denoted MixC, that regards every matrix ∈ M d that satisfies condition (4) as displaying not more dissimilarity than .

Axiom 6 MixC (Dissimilarity Consistency with Mixing of Classes) Consider a
If is obtained as in Strong-MixC, then = ∑ m j=1 w j j , ∀ ∈ V 01 n , with ∑ j w j = 1 , which satisfies (4) because in this case j = ∀j.

Robust dissimilarity comparisons with categorical outcomes
The axiom MixC values the fact of having less opportunities for reducing segregation in school district compared to districts 1 , . … , m , in the sense that if the latter districts display less school segregation than , then should also display less segregation than . Axiom MixC extends the orderings j ≼ for every j = 1, … , m to ≼ . The axiom may allow to compare cases such as those in the example related to the transformation matrix in (3): even if there is no sequence of merge of classes operations mapping into , it may be sufficient to conclude that every class of is obtained by merging classes of to verify condition (4) and thus rank ≼ , as we show in Appendix A.15.
The axiom MixC does not explicitly mention the way is constructed. One way to obtain it (which we rely upon in the proofs) is by considering weights w k j ∈ [0, 1] , with ∑ m j=1 w k j = 1 , that are specific to each class k, such that k = ∑ m j=1 w k j j k ∀k . These weights are more general than those considered by axiom Strong-MixC. When configuration is obtained in such a way and (4) is satisfied, the axiom MixC posits that mixing students across school of districts 1 , . … , m that are not more segregated than cannot generate a new schooling district that is more segregated than .
The fact that axiom Strong-MixC is always consistent with condition (4) proves the next statement.
Remark 1 If ≼ is consistent with MixC then it is consistent with Strong-MixC.
As a consequence, the MixC axiom implies a ranking of distribution matrices that is less partial compared to that characterized by axiom Strong-MixC, in the sense that all dissimilarity orderings consistent with Strong-MixC are capable of ranking unanimously only a subset of all matrices that can be ordered by the orderings consistent with MixC.
We now investigate if the partial orders of distribution matrices supported by matrix majorization and by the zonotope inclusion criterion are consistent with these new axioms. The next Remark shows that matrix majorization ≼ R is consistent with Strong-MixC.
When axiom Strong-MixC is paired with dissimilarity preserving axioms, it allows to characterize matrix majorization without resorting on axiom MC, which is hence implied by all the axioms considered.

Remark 3
If ≼ satisfies IPC, ISC, IEC and Strong-MixC then it satisfies MC.
Axiom MC becomes redundant when characterizing dissimilarity partial orders if the Strong-MixC axiom is combined with all the dissimilarity preserving axioms. The axiom Strong-MixC yields a new characterization of ≼ R which is alternative to the one presented in Proposition 1.
Proposition 2 For any , ∈ M d , the following statements are equivalent: ≼ for all orderings ≼ satisfying axioms Strong-MixC, ISC, IEC and IPC.
Proposition 2 highlights that, even if axiom MC is implied by Strong-MixC, ISC, IEC and IPC (see Remark 3), these axioms still lead to the partial order of matrix majorization as in Proposition 1 and not to another partial order that is capable of ranking more matrices. Weakening the Strong-MixC axiom towards MixC may help characterizing such partial order. In the two-groups case ( d = 2 ), Proposition 2 can be reformulated by weakening Strong-MixC in favor of axiom MixC, since matrix majorization ≼ R is consistent with axiom MixC when d = 2.
Remark 4 For , j , ∈ M 2 such that and j , j = 1, … , m , satisfy condition (4): In the multi-group context ( d ≥ 3 ), however, matrix majorization is not consistent with axiom MixC. A counterexample is given in Appendix A.15, where we use the matrices in (1) to identify matrices j ∈ M d , j = 1, 2, 3 such that j ≼ R and we then show that satisfies condition (4) but cannot be obtained as in axiom Strong-MixC and hence ⋠ R . As a consequence, matrix majorization is only sufficient but not necessary to conclude about unanimity in the ranking by all dissimilarity orderings ≼ consistent with axioms MixC, IPC, IEC and ISC. Conversely, the zonotope inclusion criterion is always consistent with axiom MixC.
Remark 5 For , j , ∈ M d such that and j , j = 1, … , m , satisfy condition (4): The zonotope inclusion criterion is consistent with MixC and, from Remark 1, it is also consistent with Strong-MixC. However, the Strong-MixC axiom is not necessary to characterize zonotope inclusion: the operations underlying Strong-MixC do not allow, alone, to break down the zonotope inclusion ordering of any pair and into the existence of simpler mixing transformation mapping one matrix into the other. The main result of the paper shows that the MixC axiom provides the needed structure to establish a characterization of the zonotope inclusion order. Dissimilarity orderings consistent with Strong-MixC but not with MixC can be represented by matrix majorization (Proposition 2), but this guarantees only a sufficient condition for zonotope inclusion.

Main result and discussion
Theorem 1 For any , ∈ M d , the following statements are equivalent: ≼ for all orderings ≼ satisfying axioms MixC, ISC, IEC, IPC.

Robust dissimilarity comparisons with categorical outcomes
The theorem provides a novel complete characterization of the zonotope inclusion criterion in terms of dissimilarity. The zonotope inclusion criterion originates a partial order of distribution matrices. If the inclusion test fails, then consensus on the ranking of distribution matrices by all dissimilarity orderings consistent with MixC, IPC, IEC and ISC cannot be reached. Nonetheless, this partial order is "less partial" than matrix majorization (that is, zonotope inclusion is a refinement of ≼ R ) and is thus capable of ordering a larger set of cases compared to it.
The remark shows that there are matrices that cannot be ranked by dissimilarity orderings consistent with Strong-MixC or, equivalently and alternatively, with axiom MC but that can be ordered only by resorting to axiom MixC, which provides the additional structure that is needed to refine ≼ R . The rationale for this refinement is clarified in the proof of Theorem 1. There, we make use of the operations underlying axioms Strong-MixC, IEC, IPC and ISC to characterize the distribution matrices that form the basis of the set of all matrices that are matrix majorized by any given . We also show that some of the classes in each of the basis matrices identify vertices of Z( ) (and that Z( ) is the convex hull of its vertices). While the convex hull of the basis matrices obtained by using the weights implied by axiom Strong-MixC is sufficient to characterize the full set of matrices such that ≼ R , the same operation is not sufficiently flexible to characterize the entire set of matrices such that Z( ) ⊆ Z( ) . Instead, when condition (4) is applied to the permutations of (all regarded as dissimilar as itself), it identifies the vertices of Z( ) , so that the convex mix of those gives Z( ) . Some of the matrices identified in this way cannot be obtained using the weights implied by Strong-MixC, which demonstrates Remark 6.
The reverse implication of Remark 6 is not true in general. The matrices and in (1) provide a counterexample in which Z( ) ⊆ Z( ) but the two matrices cannot be ranked by matrix majorization. Matrix majorization and zonotope inclusion orderings may coincide only under specific circumstances, such as those identified in Theorem 2 in Koshevoy (1995) or in cases where dissimilarity comparisons are limited to distributions where d = 2 (Dahl 1999; Lasso de la Vega and Volij 2014). We provide in the appendix a new geometric proof of the latter statement.
The set of matrices that can be ordered in terms of dissimilarity can be further extended by considering the transitive closure originated by all binary relations ≼ satisfying axioms in Theorem 1 and a new axiom, the Independence of Permutation of Groups (IPG), which introduces a property of invariance of the dissimilarity orderings with respect to the labeling of the groups.

Axiom 7 IPG (Independence from Permutations of Groups) For any
In the context of segregation analysis, the axiom provides a natural multi-group extension of the symmetry of types property Hutchens (2015). Together with ISC, IEC, IPC and MixC, the axiom extends dissimilarity comparisons based on zonotope inclusion orderings that are not concerned with the labeling of the groups. A proof of the following corollary rests on the properties of the zonotope.

Dissimilarity indices
The partial order of dissimilarity in Theorem 1 can be represented in terms of agreement of dissimilarity indices satisfying desirable properties. A dissimilarity index is a multivariate function D ∶ M d → ℝ + mapping a distribution matrix into a number, which can be interpreted as the level of dissimilarity among the d distributions represented in that matrix. These indices measure dissimilarity as the average of within-class dispersion of group frequencies.
Consider first the dissimilarity orderings consistent with Strong-MixC. In this case, dispersion of groups frequencies within each class can be quantified by a function h in the class H of real valued convex functions defined on Δ d . Dispersion in class j contributes to the overall dissimilarity proportionally to the size of the class j, a j . The dissimilarity index D h with h ∈ H aggregates these evaluations as follows: 12 where a ij ∕a j is the proportion of group i relative to the size of class j when groups are uniformly weighted. Dissimilarity is minimized when a ij ∕a j = 1∕d for each of the d groups in all classes. Hence, by normalizing h so that h 1 d � d = 0 , the index takes value 0 when perfect similarity is reached. Dissimilarity is instead maximal when for every j there exists a i such that a ij = a j . The following proposition sets out a dominance condition in terms of dissimilarity indices.
Proposition 3 For any , ∈ M d , ≼ for all orderings ≼ satisfying axioms Strong-MixC, ISC, IEC, IPC if and only if D h

Robust dissimilarity comparisons with categorical outcomes
Using Proposition 1, we conclude that matrix majorization entails a necessary and sufficient condition for assessing agreement in dissimilarity evaluations for all indices described in Proposition 3. Matrix majorization is refined by the zonotope inclusion criterion. We provide an equivalent representation of the latter criterion in terms of dissimilarity measures based on the so-called price dominance criterion (Kolm 1977;Koshevoy and Mosler 1996;Andreoli and Zoli 2020). 13 Consider a set of "prices" (or normative weights) = (p 1 , … , p d ) � , which take on real values and therefore could also be negative, allowing to draw evaluations of the relative groups composition of each class of a distribution matrix by the implied "budget" (or weighted average) � j ∕a j . In a case of perfect similarity, j ∕a j = 1 d d for all classes j. Therefore, perfect equality within each group i for all average realizations a ij ∕a j across all classes j indicates lack of dissimilarity. The same consideration applies if all realizations in each class are weighted in � j ∕a j irrespective of the weighting vector . Each class contributes additively to dissimilarity, with evaluations indexed by a convex functional ∶ ℝ → ℝ such that ( � j ∕a j ) , which is introduced to quantify the inequality across the realizations of all classes. If we quantify this aggregate inequality by 1 d ∑ n A j=1 a j ( � ⋅ j ∕a j ) , then its minimum level can be reached at m ∶= ( Proposition 4 provides the class of dissimilarity indices that are related to zonotope inclusion and defines a dominance condition that is weaker than that implied by Proposition 3.

Related orders
This section highlights the relevance of the dissimilarity model for the analysis of multi-group segregation and multivariate inequality. First, we argue that the zonotope inclusion criterion can be meaningfully used in the analysis of segregation with many (more than two) groups. We also demonstrate that widely used multigroup segregation indices characterized in the literature are consistent with dissimilarity preserving or reducing axioms. Second, we analyze the implications of the dissimilarity model for multivariate orderings of dispersion, where the focus is on dissimilarity between the distributions of certain attributes and a normatively relevant benchmark distribution. In this case, the configuration that displays less dissimilarity among the underlying distributions and with respect to the benchmark distribution, also exhibits less multivariate inequality/dispersion. We prove that the zonotope inclusion criterion weakens some of the most widely adopted robust criteria in the multivariate inequality literature. Third, we emphasize the relevance of the dissimilarity axioms for the analysis of univariate inequality.

Segregation
Segregation arises when individuals with different characteristics (such as their ethnic origin or gender) are distributed unevenly across the neighborhoods of a city, or across the schools of a schooling district, or across jobs within a firm. In segregation analysis, the realizations of interest are categorical and not-ordered. Mainstream approaches to segregation focus on the two-groups case and postulate consistency with the partial order generated by non-intersecting segregation curves Duncan and Duncan (1955) as a baseline. A segregation curve is obtained by ordering the classes of by increasing magnitude of the ratio a 2j ∕a 1j evaluated for each class j. It gives the proportions of group 1 and of group 2 that are observed in the classes where group 2 is relatively overrepresented. The graph of the segregation curve coincides with the lower boundary of the zonotope representing the cumulative shares of groups 1 and group 2 across categories.
The ranking of two-groups distributions generated by non-intersecting segregation curves can be characterized through elementary segregation-reducing operations (see Hutchens 1991) or, alternatively, by matrix majorization (Lasso de la Vega and Volij 2014; Hutchens 2015). When segregation curves cross, distributions can be ranked by segregation indices consistent with the segregation curve ordering (Reardon and Firebaugh 2002;Reardon 2009). Alternatively, segregation curves have been used to assess the dissimilarity between each group distribution and the population distribution as in Alonso-Villar and del Rio (2010).
We are not, however, aware of any ordering generated by a multi-group expansions of the segregation curves ordering. Frankel and Volij (2011) have provided normative justifications for using matrix majorization as a robust segregation criterion for ranking multi-group distributions (see also Flückinger and Silber 1999;Chakravarty and Silber 2007). However, matrix majorization is a demanding condition that is not testable in the multi-group setting. Results in this paper deliver three contributions to this literature.
First, Proposition 1 clarifies that the operations of merge (or, equivalently, Strong-MixC), split, permutation and insertion/elimination of empty classes characterize the ranking produced by non-intersecting segregation curves when d = 2 . The same axioms characterize matrix majorization when d ≥ 2 , thus showing that every segregation comparison involves a dissimilarity comparison. The dissimilarity axiom Robust dissimilarity comparisons with categorical outcomes MixC allows to weaken this criterion to a less partial ordering compared to matrix majorization, which can also be interpreted in terms of segregation.
Second, we promote the zonotopes inclusion criterion as a relevant multi-group extension of the segregation curve dominance criterion. The zonotope inclusion criterion is new in the segregation literature, it is testable and it allows to deal with the multi-dimensional nature of the data. In the case d = 2 , segregation curve dominance is always consistent with zonotope inclusion, insofar the segregation curve can be understood as the lower boundary of a zonotope. 15 Moreover, segregation curve dominance is also equivalent to matrix majorization.
In the multi-group setting ( d ≥ 3) , however, a dominance criterion based on comparisons of segregation curves across all pairs of groups provides only a necessary condition for zonotope inclusion. Furthermore, in this context the zonotope inclusion criterion is weaker than matrix majorization, thus providing a natural refinement to it. Theorem 1 characterizes it in terms of MixC axiom, thus proving the link between multi-group segregation and dissimilarity.
Third, Proposition 3 identifies and characterizes the class of multi-group segregation indices that are coherent with the family D h . Below are some examples of wellknown segregation indices belonging to this class.

The Duncan and Duncan's dissimilarity index for a matrix
It measures dissimilarity as the average absolute distance between the elements a 1j ∕a j and a 2j ∕a j in each class. By setting it follows that D h ( ) = D( ).
In the multi-group context ( ∈ M d ), segregation can be measured by the Atkinson-type segregation index, defined as  (2016)

Multivariate majorization and the Lorenz zonotope
In this section, the focus is on the multivariate majorization criteria that are adopted in robust inequality analysis. We argue that every inequality comparison involves the assessment of the dissimilarity between some relevant distributions and a benchmark distribution. A canonical example is that in which the distribution matrices , ∈ M d represent multivariate distributions of commodities. A matrix represents the way in which shares of each commodity (by row) are allocated to certain classes, which can represent the demographic units (e.g. families or individuals) that consume these commodities. Units are not ordered in any meaningful way.
Multidimensional inequality arises from the dissimilarity between the d distributions under analysis and the distribution of the demographic weight of the n units. It is common to assume that every unit receives a uniform weight equal to 1/n. Under these circumstances, the next corollary, which follows from Proposition 1, formalizes the relation between multivariate inequality analysis and dissimilarity.
When n A = n B = n , matrix in the corollary is doubly stochastic ( ∈ D n ). The condition = ⋅ with ∈ D n implied by (6), often referred to as uniform majorization, is widely adopted in robust univariate and multivariate inequality analysis (see p. 613 in Marshall et al. 2011). All social welfare functions that are increasing and Schur-concave (i.e. display some degree of inequality aversion) would rank the two multivariate distributions accordingly.
Uniform majorization is a demanding criterion, alike matrix majorization, insofar it posits that inequality can be reduced only when every row of a distribution matrix is obtained from the corresponding row of another distribution matrix through a common set of transformations implied by the matrix ∈ D n . The resulting ordering of distribution matrices is therefore partial. Koshevoy (1995) and Koshevoy and Mosler (1996) have studied a less partial order of multivariate distributions that is based on the Lorenz zonotopes inclusion criterion. A Lorenz zonotope, denoted LZ( ) with ∈ M d , is a d + 1 dimensional zonotope of a distribution matrix augmented by the population distribution vector, that is LZ( Robust dissimilarity comparisons with categorical outcomes with LZ( ) ∈ ℝ d+1 + and ̃ defined as in (6). The Lorenz zonotope inclusion criterion induces a partial order of distribution matrices that provides a testable refinement of uniform majorization.
The following chain of implications clarifies the relation between multivariate inequality and dissimilarity. The proof follows from previous results.

Remark 8
Let , ∈ M d such that d ≥ 2 and n A = n B = n . Then: The first implication, showing that uniform majorization implies matrix majorization, has been discussed in the previous section. The Lorenz zonotope inclusion criterion defines an inequality partial order of distribution matrices that is less partial than (i.e., is implied by) matrix majorization. It follows that the ranking of distribution matrices given by LZ( ) ⊆ LZ( ) is always consistent with the implications of a merge transformation (or, alternatively, a convex combination underlying Strong-MixC axiom) on matrices ̃ and ̃ . Any such transformation bears two implications for multidimensional inequality.
First, a merge transformation reduces dissimilarity between the distribution of each dimension separately and the benchmark distribution, implying a reduction of inequality in each dimension. Second, a merge transformation reduces the dissimilarity across dimensions, implying an increase in correlation between dimensions. This aspect is controversial, since the Lorenz zonotopes inclusion criterion may fail to rank distribution matrices that are instead unanimously ordered by social welfare functions satisfying aversion to correlation increasing transfers (Epstein and Tanny 1980;Atkinson and Bourguignon 1982;Decancq 2012), a desirable property in multidimensional inequality analysis stating that any transformation that rises the degree of association in realizations is bound to decrease social welfare Andreoli and Zoli (2020). The Extended Lorenz zonotope inclusion criterion proposed in Mosler (2012) addresses these concerns.
Although the Lorenz zonotope inclusion criterion may be problematic for multidimensional welfare analysis, it is still a relevant criterion for assessing dissimilarity between distributions. The criterion LZ( ) ⊆ LZ( ) can be weakened by looking at inclusions of the projections of the Lorenz Zonotope in the space of outcomes, that is Z( ) ⊆ Z( ) . The latter is useful to analyze inequalities that arise from differences between distributions, regardless of the degree of inequality of each of these distributions. This feature is relevant, for instance, for constructing robust inequality of opportunity comparisons (see, for instance, Roemer and Trannoy 2016;Andreoli et al. 2019). 16 Corollary 2 provides a characterization result that extends robust inequality assessments based on uniform majorization to matrices that possibly differ in size ( n A ≠ n B ) but with the same number d of dimensions.
There are other robust multivariate inequality criteria that are weaker than uniform majorization (Dahl 1999;Martínez Pería et al. 2005) and that imply the Lorenz zonotope inclusion criterion. These criteria are also nested within the dissimilarity model.

Income inequality
Corollary 2 also holds in the case d = 1 . This case is of particular interest for social welfare analysis, as it rationalizes empirical comparisons of income inequality. In this section, we argue that every income inequality comparison involves a dissimilarity comparison, but not the reverse. Empirical comparisons of income inequality consist in assessing the way total income in a sample of n income recipient units (such as households or individuals) is split across these units. We can hence represent a distribution of income shares by the n-variate vectors � , � ∈ M 1 , with � ⋅ n = � ⋅ n = 1 . Each entry of the vectors corresponds to an income share allocated to a given unit. Anonymity is often invoked by the literature addressing income inequality measurement, thereby implying that any permutation of the units does not affect the extent of inequality displayed by ′ of ′ .
As per condition (6) in Corollary 2, every empirical income inequality comparison involves a dissimilarity comparison between the distribution of income shares owned by each of the n units and the units' weights. Furthermore, the ranking of distribution matrices induced by the LZ inclusion is consistent with Lorenz curves partial order. The chain of implications in Remark 8 hence runs in both directions when d = 1 : Lorenz zonotopes inclusion implies unanimity for all social welfare functions that are increasing and concave in income which is equivalent to uniform majorization. All those conditions imply that income inequality analysis always subsumes a dissimilarity comparison.
A similar result holds even when the benchmark distribution of population weights is not uniform. Ebert and Moyes (2003) analyze the relation between welfare evaluations, Lorenz dominance and equivalence scales for incomes when population weights may differ among units and across distribution matrices. In this case, the interest is in ranking matrices such as ̃ ∶= ( , ) � , where = ( 1 , … , n ) � and j can be understood as individual j's weight. Using Corollary 2, every welfare-consistent measure of inequality can be written as an average of convex transformations of equivalized incomes, scaled by their demographic weights. This is formalized by the inequality index D h (̃ ) = ∑ n j=1 j h(a j ∕ j ) with h ∈ H convex and a j ∕ j is j's equivalent income. 17 From Proposition 3 (in combination with Remark 7), the zonotope inclusion criterion Z(( , ) � ) ⊆ Z(( , ) � ) provides a sufficient test for welfare dominance.
A well known result in inequality measurement is that an income distribution ′ displays less inequality than another distribution ′ if it can be obtained from the latter through a finite sequence of progressive (Pigou-Dalton, PD) transfers of income from rich donors to poor recipients, without switching their relative positions in the income ranking (Hardy et al. 1934;Marshall et al. 2011). 18 In the univariate case ( d = 1 ), Corollary 2 implies that every sequence of PD transfers 17 The result (see Lemma 1 in the Appendix) follows from the homogeneity and convexity of g ∶ ℝ 2 → ℝ , yielding g( j , a j ) = j g(1, a j ∕ j ) = j h(a j ∕ j ) with h convex. Here we assume that income and population weights have unit size. If this is not the case, then a j ∕ j is proportional to j's equivalent income. 18 Consider a distribution ′ . A PD transfer consists in a movement of a mass > 0 from class j to class k such that a j > a k , yielding b j = a j − , b k = a k + and b = a ∀ ≠ j, k such that b j ≥ b k . As a consequence of this transformation, � = � , ∈ D n (Lorenz dominance) and ∑ j a j = ∑ j b j .

3
Robust dissimilarity comparisons with categorical outcomes of incomes can be rationalized by a specific sequence of more fundamental dissimilarity preserving and reducing operations that are concerned with the way income shares and weights are shifted across units:

Corollary 3 Every PD transfer operation can be decomposed into a sequence of split of classes and merge of classes operations.
Split and merge operations can hence be seen as inequality reducing transformations that are more elementary than PD transfers. The proof of Corollary 3 rests on the fact that any T-transform, an equivalent matrix representation of a PD transfer (see p. 33 in Marshall et al. 2011), can be exactly decomposed into the product of matrices representing split and merge operations. It follows that any univariate inequality comparison based on uniform majorization can be seen as a dissimilarity comparison but not the reverse, insofar the dissimilarity preserving and reducing operations of respectively split and merge characterize matrix majorization of which uniform majorization is a particular case.
The interesting and new result provided by Corollary 2 is that there always exists a sequence of split and merge operations that supports uniform majorization even in the multidimensional case ( d ≥ 2 ), although the same sequence cannot be generally rearranged to represent PD transfers Kolm (1977).

Concluding remarks
A large and sparse literature on segregation and inequality measurement has proposed criteria for ranking multi-group distributions according to the dissimilarity they exhibit. This paper establishes the axiomatic foundations of the dissimilarity criterion. We do so by developing a parsimonious axiomatic model which is based on dissimilarity preserving operations and a dissimilarity reducing operation, the merge, which consists in aggregating, distribution by distribution, the proportion of people observed in two separate classes. We study the partial order of distribution matrices originated by the transitive closure of all binary dissimilarity relations consistent with the operations and with a mixing axiom. This last axiom is crucial to justify an equivalent characterization of the "displays at most as much dissimilarity as" partial order. Our main theorem identifies a novel nonparametric criterion, based on inclusion of the zonotope set representations of the data, which is equivalent to the dissimilarity partial order thus identified.
The zonotope inclusion criterion is relevant in many contexts. One application could be in evaluating the impact of certain educational policies for the patterns of dissimilarity between multi-ethnic distributions of students across schools in a district. This problem is commonly referred to as schooling segregation. We can use zonotopes to conclude about robust changes in segregation when comparing the actual situation and a counterfactual distribution that would have emerged in the absence of the policy. While the education policy itself may have little to do with splitting, merging or mixing schools, the zonotope inclusion signals that one can always move from the counterfactual to the actual allocation of students through operations that are unanimously understood as segregation-reducing.
In some cases, zonotopes inclusion is rejected by the data. The dissimilarity indices analyzed in the paper allow to produce conclusive evaluations about the changes in dissimilarity. The implied ranking is always consistent with the implications of the "elementary" transformations. Evaluations based on one or few dissimilarity indicators, however, are not robust and can always be challenged on the perspective offered by alternative measures. The complete characterization of the dissimilarity indicators presented here is left for future research.

Appendix A.1 Useful additional results
The first result shows that matrix majorization admits an equivalent representation in terms of unanimous ranking for a well defined class of convex functions.

Lemma 1 For any , ∈ M d , ≼ R if and only if
for all functions g ∶ ℝ d → ℝ that are convex and homogeneous such that g( � d ) = 0.
The second result shows that the insertion of empty classes, split and merge operations can be represented through linear transformations involving row stochastic matrices. An operation of insertion of empty classes transforms into with n B > n A by augmenting of n B − n A columns with zero entries. We denote by R IEC n A ,n B ⊂ R n A ,n B the set of all matrices reproducing an insertion of empty classes when post-multiplied to a distribution matrix . Hence ∈ R IEC n A ,n B is an identity matrix of size n A augmented by n B − n A columns with zero entries.
Let M 0 d ⊂ M d define the set of matrices exhibiting at least one column of zeroes. For ∈ M 0 d , let J 0 A denote the index set of all columns in with all zeroes and J A denote the index set of all the other columns of . Let j ∈ J A such that j + 1 ∈ J 0 A . The matrix [j] incorporates an operation of split of classes applied to matrix ∈ M 0 d that leads to matrix ∈ M d with j = j and Let k ≠ j , the set of all transformation matrices [j] reproducing a split of classes is denoted by: 1 3

Robust dissimilarity comparisons with categorical outcomes
Also the merge of classes operation originates a distribution matrix = ⋅ [j] , where the matrix [j] performs a merge of class j towards j + 1 . Such a matrix belongs to the set: The third preliminary result provides an equivalent (algebraic and finite) condition for testing zonotope inclusion. Let denote first the sets Proof Note that Z( ) ⊆ Z( ) if and only if for any ∈ Z( ) then ∈ Z( ), whereas for Z( ) ⊂ Z( ) we also have that ∃̃ ∈ Z( ) such that ̃ ∉ Z( ) . Using the Minkowski sum properties, this means that ∀ ∈ V n ∃ ∈ V n : ∶= Let assume, for ease of notation, that ∈ V n is ordered so that 0 ≤ 1 ≤ 2 ≤ … ≤ n ≤ 1 . Denote 1 = 1 and k = k − k−1 for k = 2, … , n . Note that k ∈ [0, 1] ∀k . Recall that k is the colum vector k of the identity matrix of size n and denote further (h,h+1,…,n) ∶= ∑ n k=h k for any 1 ≤ h ≤ n so that if h = n − 1 then (n−1,n) = n−1 + n and so on. We have that (h,h+1,…,n) ∈ V 01 n ∀h . For any ∈ V n there always exists a class j such that = ∑ n h=j h (h,h+1,…,n) , so that Z( ) ⊆ Z( ) is equivalent to for , ∈ V n . We use this equivalence in the proof.
i) ⇒ ii). Immediate, since V 01 n ⊆ V n . ii) ⇒ i). Assume that ii) holds, which can be equivalently stated as: for any 1 ≤ h ≤ n there exists (h,h+1,…,n) ∈ V n : (h,h+1,…,n) = (h,h+1,…,n) . Consider substituting into ⋅ ∑ n h=j h ⋅ (h,h+1,…,n) , yielding for any j: Since ∑ n h=j h (h,h+1,…,n) ∈ V n for any h ∈ [0, 1] and for any j, the latter equation implies (7) and then i). ◻ The fourth and last preliminary result shows the equivalence between the zonotope inclusion criterion and price majorization. Although the proof largely draws on Koshevoy (1995), the setting we investigate is logically distinct, insofar the price majorization we use invokes d-dimensional prices to evaluate inclusion R MC n ∶= [j] ∈ R n ∶ m j j+1 = m kk = 1 ∀k ≠ j, m ih = 0 in all other cases .
of d-dimensional zonotopes, whereas Koshevoy shows the equivalence with d + 1 -dimensional extensions of the zonotope (the Lorenz zonotope). Denote by C n the set of column stochastic matrices, so that ∈ C n ⇔ � ∈ R n . Define: " is price majorized by " whenever ∀ ∈ ℝ d , ∃ ∈ D n such that � = � . Given that we consider matrices in M d , then condition � n = � n is satisfied by construction, therefore it suffices to consider only transformations ∈ C n to obtain price majorization.
Proof i) ⇒ ii). Assume that i) holds, then from Lemma (2) we have that ∀k ∈ {1, … , n} ∃ jk ∈ [0, 1] for j = 1, 2, … , n such that k = ∑ j jk j . In compact notation: Matrix may not be row stochastic but it is guaranteed that n = n . Using the fact that ∑ i a ij = d∕n for any k ≠ j , by definition of , , we get ∑ j jk = 1 , which means that ∈ C n . Hence i) ⇒ = , ∈ C n ⇒ � = � for all ∈ ℝ d and ∈ C n , which is ii). ii) ⇒ i). Assume that ii) holds, which implies that ′ Lorenz dominantes ′ for any ∈ ℝ d , that is � (j) ∀k = 1, … , n where for each ∈ ℝ d the classes of have been ordered by increasing magnitude of � (j) , so that � (j) ≤ � (j+1) ∀j (and similarly for the classes of ). This is equivalent (see Marshall et al. 2011) to: where the max operator selects the k-tuple of columns of and that yield the largest value when multiplied by any vector of prices . An equivalent formulation is: which implies The previous condition identifies a situation where ⋅ ( j 1 + ... + j k ) lies in the convex hull of ⋅ ( j 1 + ... + j k ) , see Koshevoy (1995). Making use of the definition of convex hull inclusion, we can alternatively write:

3
Robust dissimilarity comparisons with categorical outcomes or equivalently From Lemma 2, the latter condition implies i). ◻

Appendix A.2 Proof of Proposition 1
Proof i) ⇒ ii). If i) holds, then it follows from Donaldson and Weymark (1998) that there exist partial orders that rank at most as dissimilar as and that these orders are consistent with the operations underlying axioms MC, IPC, IEC and ISC. Matrix majorization ≼ R is one of such partial orders. In fact, as highlighted in the preliminary results section, every operation underlying axioms MC, ISC, IEC and IPC can be represented by a row stochastic matrix transformation, respectively by ∈ R MC n A for a merge of classes operation, by ∈ R ISC n A for a split operation, by ∈ R IEC n A ,n for insertion/elimination of empty classes and by ∈ P n A for permutations of classes. A sequence of these operations is for instance = ∈ R n A ,n since the set of all row stochastic matrices R is closed with respect to the product operation. Any obtained through these operations is row stochastic, thereby implying that ii) holds.
ii) ⇒ i). Assume that ii) holds, hence = with ∈ R n A ,n B . In shorthand notation where j (k) denotes the generic element jk of . Each addendum in (10) can be written as: where each ∈ [0, 1] . In fact, every sequence of n B random numbers { (k)} n B k=1 with support in [0, 1] satisfying ∑ k (k) = 1 can be written as: The constraint ∑ k (k) = 1 imposes that there must exist an index k such that k = 1 . If k = 1 , then the series is completed and j = 0 = (j) for any j > k . Note that (k) = 0 also if k = 0 , thus the sequence of (k) may also include elements equal to (9) ⋅ ( j 1 + ... + j k ) = ∑ ∀j 1 ,...,j k j 1 ,...,j k ⋅ ( j 1 + ... + j k ), j 1 ,...,j k ∈ [0, 1], ∑ j 1 ,...,j k = 1, ∀k, ∀j 1 , ..., j k , (1) = 1 ∈ [0, 1] 1 3 0 even if it is not yet completed. Solving backward the sequence in (12) leads to (11) given that (k) = k ⋅ ∏ k−1 j=1 (1 − j ) with j ∈ [0, 1] ∀j and k ∈ [0, 1] ∀k = 2, … , n B . Consider a sequence of matrices [k] ∈ R ISC A . 19 Matrix [1] performs the first split of vector j according to proportion j1 . Matrix [2] performs a split on the residual component (1 − j1 ) j according to the proportion j2 . The iteration of these arguments leads to matrix [n B −1] , representing the last split of vector j out of a sequence of n B − 2 splits. It follows that (11) can be equivalently written as: Extending the representation in (13) to all addends in (10) leads to a total of n A (n B − 1) = n splits of 's classes. The split operation preserves the number of classes, therefore it can be operationalized only if there exists a matrix ∈ R IEC n A ,n adding a sufficient amount of empty classes to to perform the n splits. According to the summation operator in (10), the order of the classes of is irrelevant. Thus operations of permutations of classes are admitted. 20 By combining all the operations in a single row we obtain ⋅̂ , where the n A × n matrix ̂ rewrites: where [k] (j) is indexed for j to highlight the relation with the class j in . Here ̃ [k] (j) ∶= diag , [k] (j), � and and ′ are two identity matrices of size (j − 1)n B and (n A − j)n B respectively. Line (15) comes from the fact that every block diagonal matrix can be represented as the product of the matrices associated with each block, obtained substituting the remaining blocks with identity matrices.
To conclude, it is possible to perform permutations of n A n B classes to rearrange the entries in ⋅̂ to accommodate the definition of a merge of classes transformation through a matrix n A n B . A convenient permutation rearranges n B groups of n A -tuples of classes of ⋅̂ , so that the j-th group consists of the sequence of classes ( 1j 1 , … , n A j n A , …). 21 Consider a sequence of merges of classes, so that class 1 in the new configuration is merged with class 2, then the resulting class 2 is merged , respectively. Given the order of the classes, the same procedure can be extended to all the n B − 1 remaining n A -tuples of classes. This operation leaves many empty classes, that can be eliminated using a matrix ′ , incorporating the elimination of empty classes operation. As a result: All the matrices multiplying are row stochastic, and the resulting matrix is also row stochastic. Hence ∈ R n A ,n B can be always decomposed in permutation transformations of the product of matrices originated exclusively by split, merge and insertion/deletion of empty classes. This is i), which concludes the proof. ◻

Appendix A.3 Proof of Remark 2
Proof Assume j ≼ R , then (by Proposition 1) ∃ j ∈ R n such that j = j , ∀j . Recall that axiom Strong-MixC requires that

Appendix A.4 Proof of Remark 3
Proof A direct verification of the remark can be obtained by considering matrices ,̂ ∈ M d such that ̂ is obtained by permuting columns j and j + 1 of . Thus, by IPC we have ∼̂. Setting m = 2, with 1 ∶= and 2 ∶=̂ in the definition of Strong-MixC and letting w 1 = w 2 = 1∕2, we obtain 0 = 1∕2 ⋅ 1 + 1∕2 ⋅ 2 = 1∕2 ⋅ ( +̂ ). That is, 0 coincides with except for columns j and j + 1 whose vectors are identical and coincide with 1∕2 ⋅ j + 1∕2 ⋅ j+1 . By Strong-MixC we have that 0 ≼ . Consider matrix ∈ M d that is identical to 0 except for columns j and j + 1 where j = d and j+1 = j + j+1 . If we drop the empty column j from and split the class/column j + 1 with weights 1/2 and 1/2 into two adjacent classes we obtain matrix 0 . Applying IEC and ISC we have that ∼ 0 . By transitivity of the dissimilarity relation ∼ 0 ≼ implies that ≼ . Note that by construction matrices and are those in the definition of axiom MC that turns out to be satisfied. ◻ ⋅ � .

Appendix A.5 Proof of Proposition 2
Proof i) ⇒ ii). The proof consists in showing that the set of all matrices such that ≼ R can be characterized using exclusively operations underlying axioms Strong-MixC, IEC, ISC and IPC. The result follows from the fact that if i) holds, then it follows from Donaldson and Weymark (1998) that there exist partial orders that rank at most as dissimilar as and that these orders are consistent with the operations underlying axioms ISC, IEC, IPC and Strong-MixC, and matrix majorization ≼ R is one of such partial orders (by Proposition 1 and Remark 2). The result holds for any matrix in M d . Let , ∈ M d with n A not necessarily equal to n B . First, consider matrices ̃ ,̃ ∈ M d of size d × n obtained from , through split and permutation of classes and deletion of empty classes such that n . Every ordering ≼ satisfying axioms IEC, ISC and IPC ranks ̃∼ and ̃∼ .
We now investigate the implications for the transitive closure of the orderings ≼ (satisfying axioms IPC, IEC, ESC) deriving from the fact that it satisfies Strong-MixC. We do so by looking at all permutations of a matrix ̃ , that turn out to be indifferent to it in terms of dissimilarity for all orderings satisfying IPC 22 , and applying to them the mixing operations considered in Strong-MixC, thereby obtaining the set of matrices that are matrix majorized by ̃ . The result holds for any initial matrix ∈ M d . Denote by J k , k = 2, … , n a partition of the classes {1, … , n} with cardi- tions for any k, with m = ∑ k m k , that are collected in the set J k , so that J k ∈ J k and J k ⊆ {1, … , n} . For any such partition J k , define P J k n the set of n × n permutation matrices corresponding to all permutations of indices in J k (there are k! of them). The matrix ̃ , for ∈ P J k n ∀k is obtained by permuting k columns indexed as in J k . Given that ≼ satisfies IPC, then ̃∼̃ . Consider a mix operation as defined in Strong-MixC axiom, that gives equal weight w h = 1 k! to each permutation ∈ R n , that can be denoted by j , for j = 1, … , m . Note that matrices j form the basis of the set R n (see Proposition 3.1 in Dahl 1999). 23 We hence have that ̃ j ≼̃ ∀j for all orderings ≼ satisfying Strong-MixC and IPC, moreover we have that ̃ j ≼ R̃ given that all j are in R n . Consider now obtaining ̃ by mixing (using the weights in axiom Strong-MixC) matrices ̃ j ∀j , such as: Since ∑ j w j = 1 , the summation term identifies the convex hull of all row-stochastic matrices that form the basis for the set R n . From Corollary 3.2 in Dahl (1999) guaranteeing that ̃ ≼ R̃ . Recall now (see the second set of preliminary results) that because of the transformations of split and permutation of classes and deletion of empty classes that have generated matrices ̃ ,̃ ∈ M d such that ̃∼ and ̃∼ we have that it is possible to obtain ̃ from through a row stochastic transformation and also the other way round making use of another row stochastic transformation, and similarly for ̃ and , thus ≼ R̃ and ̃ ≼ R that combined with ̃ ≼ R̃ by transitivity gives condition ii).
ii) ⇒ i). Assume ii), which is equivalent to ≼ by all dissimilarity orders satisfying MC, IPC, IEC, ISC from Proposition 1 and implies i) by Remark 3. ◻ Strong-MixC by Remark 1). Moreover, note that for such matrices it is also verified that for any ∈ V n there exists a ∈ V n such that ̃ j =̃ which yields ̃ ( j − ) = d ∀j . Using the fact that j ∈ R n ∀j then j ∈ V n , so that it is sufficient to set = j for every ∈ V n to guarantee that ∈ V n . As argued in the proof of Lemma 2 the previous condition implies that Z(̃ j ) ⊆ Z(̃ ) ∀j . Note that only some classes (i.e., columns) of ̃ j identify vertices of Z(̃ ) . To see this, note that for any matrix This holds ∀j, k . Any vector is a vertex of the zonotope Z(̃ ) , since k =̃ with ∈ V 01 n such that j = 1 whenever j ∈ J k whereas j = 0 for any j ∉ J k . From Proposition 3.1 in Dahl (1999), the zonotope set Z(̃ ) is the convex hull of these vertices. To complete the proof, consider now a matrix ̃ ∈ M d satisfying condition (4), where j =̃ j , j = 1, … , m . Such a matrix is ranked ̃ ≼̃ by all dissimilarity ordering satisfying axioms MixC, IEC, IPC and ISC. For any such matrix, we have that ∀ ∈ V 01 n ∃ j ∈ V 01 n , j = 1, … , m : ̃ ∈ conv{̃ 1 1 , … ,̃ m m } . Any element of the conv set writes: ∑ m j=1 j̃ j j =̃ ∑ m j=1 j j j =̃ ∑ m j=1 j j =̃ with j ∈ V n since j ∈ R n ∀j and also ∈ V n given that j ∈ [0, 1] , ∑ j j = 1 . It follows that ∀ ∈ V 01 n ∃ ∈ V n : ̃ =̃ , which is equivalent to Z(̃ ) ⊆ Z(̃ ) from Lemma 2. The fact that Z(̃ ) = Z( ) and Z(̃ ) = Z( ) gives ii).
In order to highlight the differences with the analogous steps in the proof of Proposition 2, note that, as shown there, each matrix j is a basis for R n : every rowstochastic matrix is obtained as a convex combination (with weights as in axiom Strong-MixC) of matrices j for j = 1, … , m. However, the weighting schemes underlying axiom Strong-MixC are only a subset of the admissible weights according to axioms MixC. In particular, these weights are not capable of generating the convex hull of all vertices of Z(̃ ) and, as a consequence, they are not sufficient to characterize the full set of matrices ̃ such that Z(̃ ) ⊆ Z(̃ ) , as opposed to the weights considered in MixC. Nonetheless, the weights underlying axiom Strong-MixC provide sufficient structure to characterize the set of matrices ̃ such that ̃ ≼ R̃ , as illustrated in Proposition 2.
ii) ⇒ i). We show that the zonotope inclusion condition can always be rationalized by the existence of matrices that are ranked consistently by all dissimilarity orderings in i). Assume that ii) holds, and note that Z( ) ⊆ Z( ) ⇔ Z(̃ ) ⊆ Z(̃ ) where ̃ ,̃ ∈ M d are d × n matrices obtained though split and permutation of classes and deletion of empty classes from and respectively, such that � d̃ = � d̃ = d n � n , as done in the first part of the proof of Proposition 2. Consider further matrices ̃ j ∈ M d , for j = 1, … , m with m = n , of size d × n defined as: ̃ j ∶= (̃ j , 1 n−1 j , … , 1 n−1 j ) ⋅ 1,j with i,j ∈ P n being a permutation matrix permuting classes 1 and j. Denote vector j ∶= d −̃ j the "residual", where we recall that ̃ j denotes column j of matrix ̃ . By construction, Z(̃ j ) ⊆ Z(̃ ) ∀j . Considering that because of the definition of zonotope inclusion we have that ̃ j =̃ , for ∈ V n then j =̃ ( n − ) . Notice that the vector is specific to a matrix ̃ j . It follows that: with j ∈ R n for every j = 1, … , m . We conclude that condition ii) always implies that ̃ j ≼ R̃ ∀j for the n matrices ̃ j obtained as above. Now consider the set of weights w k j with the following features: w j j = 1 and w j k = 0 for any j = 1, … , n and k = 1, … , n such that k ≥ j . By construction: ̃k = ∑ j w k j̃j k for all k = 1, … , n . From Proposition 1, ̃ j ≼̃ ∀j for all orderings satisfying MC, ISC, IEC, IPC. These are also the orderings satisfying ISC, IEC, IPC and Strong-MixC from Proposition 2. By Remark 1, this set of orderings includes those that are in i) because those considered in the set satisfy the stronger version of the MixC axiom. To conclude, we should verify that matrix ̃ is obtained through a weighting scheme that is consistent with condition (4). This is immediate, since for = ( 1 , … , n ) � ∈ V 01 n we have that where we recall that k is the k-th column of a n × n identity matrix and that the second equality follows from the choice of the weighting scheme. It is sufficient to set j ∶= j j ∈ V 01 n to conclude that the condition ∀ ∈ V 01 n ∃ j ∈ V 01 n such that ̃ = (̃ 1 1 , … ,̃ n n n ) satisfies condition (4). As a result for the obtained matrix ̃ we have that ̃ ≼̃ for all orderings satisfying ISC, IEC, IPC and MixC. The same orderings rank ̃∼ and ̃∼ , which implies i). ◻

Appendix A.9 Proof of Remark 6
Proof Recall the definition of the zonotope set inclusion, for any , ∈ M d , that is Z( ) ⊆ Z( ) if and only if ∀ ∈ V n B ∃ ∈ V n A such that = . Let ≼ R . By Proposition 1, = with ∈ R n B ,n A , which after substituting in = yields ( − ) = d . Given that ∈ R n B ,n A , this implies that ∈ V n A for any ∈ V n B , it is therefore sufficient to set = for ∈ V n B to guarantee that ∈ V n A , which thus implies Z( ) ⊆ Z( ) . ( n − )) 1,j =̃ ( , 1 n − 1 ( n − ), … , 1 n − 1 ( n − )) 1,j =̃ j from ̃ and aggregates these vectors with elementwise summations. For every i = 1, 2 we can equivalently write the condition as: with jk ∈ [0, 1] ∀j, k . It is necessary and sufficient that (17) holds for an arbitrary group i = 1 to guarantee that (17) also holds for i = 2 , given that by construction ã 2j = 2 n −ã 1j and b 2j = 2 n −b 1j ∀j = 1, … ,ñ . After rearranging the terms in (17) in increasing order, so that b 1(k) ≤b 1(k+1) and ã 1(k) ≤ã 1(k+1) , for k = 1, … ,ñ − 1 , and using the fact that jk ∈ [0, 1] it follows that ∑ h k=1b 1(k) ≥ ∑ h j=1ã 1(j) , for any h = 1, … ,ñ . From Marshall et al. (2011), if d = 2, this condition is equivalent to uniform majorization, i.e. ∃ ∈ D̃n ∶ (b 11 , …b 1ñ ) = (ã 11 , …ã 1ñ ) . Hence ̃ =̃ , since the same matrix guarantees that (b 21 , …b 2ñ ) = (ã 21 , …ã 2ñ ) . Furthermore, notice that the indifference class of ≼ R is also characterized by the existence of row stochastic matrices: for any , ∈ M d , ∼ for all orderings ≼ satisfying axioms MC, ISC, IEC and IPC if and only if ∃ ∈ R n A ,n B and ∃ � ∈ R n B ,n A such that = ⋅ and = ⋅ � . We can hence write ̃ = with ∈ R n A ,ñ and =̃ with ∈ R̃n ,n B . Thus =̃ =̃ = = with ∈ R n A ,n B since D̃n ⊆ R̃n and the set R n is closed with respect to matrix product. This concludes the proof for d = 2 . When d > 2 , there is no guarantee that the row stochastic matrix satisfying (17) that is if both matrices , ∈ M d are "expanded" by adding one more row whose elements are given by the sum of the elements of the associated column in the original matrix. Condition (18) hence rewrites ∑ j g( � j , b j ) ≤ ∑ j g( � j , a j ) with g defined on ℝ d+1 . Given that g is convex and homogeneous, then g( � j , a j ) = a j g( � j ∕a j , 1) = a j h( � j ∕a j ) where h ∈ H , while for convenience empty classes receive weight a = 0 . Moreover, adding |n A − n B | empty classes preserves the relation in (18). We have therefore obtained the index D h . Thus, D h ( ) ≤ D h ( ) ∀h ∈ H is equivalent to (18) and to condition ii). ◻ g(a 1j , … , a dj ),