1 Introduction

The analysis of many phenomena requires partitioning societies into groups and studying the extent at which these groups are distributed with different intensities across relevant non-ordered categorical outcomes. For instance, residential segregation occurs in situations in which ethnic groups sort with different intensity across neighborhoods of a city. Likewise, school or occupational segregation is concerned with the uneven distribution of ethnic groups across schools or jobs.

In other cases, the interest is on the way shares of one or many attributes (such as income, wealth or consumption of different goods) are assigned across population units (such as countries, households or individuals) and the extent at which these distributions differ from the distribution of a normatively relevant benchmark, such as the demographic weights of the units. In this case, the focus is on uni- or multidimensional inequality and the often invoked anonymity principle would regard the way units are ordered as irrelevant.

All these examples are concerned with the extent of dissimilarity between two or more distributions defined over classes of non-ordered realizations. There is widespread agreement in the literature about what constitutes lack of segregation or equality: These are situations in which the groups are similarly distributed across the classes of realizations. The relevant notion of similarity we refer to dates back to Gini (1914), who argues that two (or more) groups are similarly distributed whenever “the populations of the two groups take the same values with the same frequency.”Footnote 1 Although a well-established methodology exists for analyzing dissimilarity between two distributions, the literature disagrees about the way such comparisons can be extended to the multi-group setting.

This paper develops the axiomatic foundations for the measurement of multi-group dissimilarity and provides equivalent testable conditions. We embrace a convenient (and equivalent) way of representing empirical discrete distributions through matrix notation. Each distribution matrix displays (by row) the distributions of individuals belonging to one of many (at least two) groups across classes of realizations (by column). The example below refers to distributions of three groups across a variable number of classes:

$$\begin{aligned} \mathbf {A} = \left( \begin{array}{cccc} \frac{1}{4} &{} \frac{1}{4} &{} \frac{1}{4} &{} \frac{1}{4} \\ 0 &{} 0 &{} 0.8 &{} 0.2 \\ \frac{3}{7} &{} \frac{2}{7} &{} 0 &{} \frac{2}{7} \end{array}\right) \ \ {\rm{and}}\ \ \mathbf {B} = \left( \begin{array}{cccc} \frac{1}{2} &{} \frac{1}{4} &{} \frac{1}{4} \\ 0 &{} 0.4 &{} 0.6 \\ \frac{10}{14} &{} \frac{3}{14} &{} \frac{1}{14} \end{array}\right) . \end{aligned}$$
(1)

Entries of these matrices can be interpreted as frequencies so that, for instance, the share of group 2 in class 3 in \(\mathbf {A}\) is 80%. In the context of school segregation analysis, each of the two matrices above portrays the way students from each of three distinct ethnic groups are distributed across schools in a schooling district. Matrix \(\mathbf {A}\) depicts a district with four schools whereas matrix \(\mathbf {B}\) depicts another district with three schools.

Many existing criteria can be employed to compare distribution matrices \(\mathbf {A}\) and \(\mathbf {B}\) by the extent of dissimilarity displayed by their rows, such as dissimilarity indices or majorization conditions. The ranking produced by one or few indices, however, can be challenged by the use of alternative, yet plausible, measures whereas majorization conditions are robust to this criticism but could be empirically untractable. In this paper, we consider all dissimilarity orderings that are consistent with some normatively relevant axioms and we focus on the intersection of all such orderings as a robust criterion for dissimilarity analysis. It is well known (see Donaldson and Weymark 1998) that such criterion leads to a partial order of distribution matrices which induces unanimity in the way matrices are ordered by all underlying dissimilarity orderings satisfying the desirable axioms. The axioms that we consider characterize the ordering \(\mathbf {B}\) “displays at most as much dissimilarity as” \(\mathbf {A}\) by the possibility of obtaining \(\mathbf {B}\) from \(\mathbf {A}\) thorough sequences of elementary operations that either preserve dissimilarity between the matrices’ rows (such as permuting the labels of groups and classes, adding and deleting classes which are empty, splitting proportionally classes) or reduce it (such as merging groups frequencies across two classes of the same distribution matrix) and by some consistency properties of the dissimilarity orderings (with respect to the possibility of producing convex mixtures of classes).

The axioms that we study allow to compare only matrices with the same number of groups but extend dissimilarity comparisons to matrices with a different number of classes. Such extension is relevant for empirical applications. Moreover, the possibility of considering comparisons between distribution matrices with a different number of classes will make explicit the normative content of this approach by highlighting the combination of operations that make possible to rank distribution matrices with the same number of classes.

The main result, in Theorem 1, establishes that the partial order of distribution matrices consistent with our axiomatic model is equivalent to the partial order of distribution matrices induced by the inclusion of their zonotope representations. A zonotope is a convex geometric set representation of the data, defined in the space of groups frequencies, which is originated by the Minkowski sum (i.e., element by element sum of fractions) of all column vectors of a distribution matrix.Footnote 2 In inequality analysis, zonotopes have been used to derive interesting multivariate extensions of the Lorenz curve (Koshevoy and Mosler 1996; Mosler 2012). Theorem 1 shows that the zonotope inclusion criterion is also relevant for dissimilarity analysis, for at least three reasons.

First, we demonstrate that the zonotope inclusion criterion is consistent with the implications of operations that unambiguously preserve or reduce dissimilarity. Each of these operations is found to have clear and intuitive consequences on the shape of the zonotope and hence provides a normative justification for using the zonotope inclusion criterion. Second, the zonotope inclusion criterion is a refinement of relevant majorization conditions and it is hence implied by them. We demonstrate that uniform and matrix majorization criteria, which are widely adopted criteria in robust multivariate distributional analysis Marshall et al. (2011), are related to the dissimilarity measurement model developed here.Footnote 3 In the multi-group setting, some matrices that cannot be ranked by matrix majorization can still be robustly ranked by zonotope inclusion, whereas the two criteria coincide only in the two-groups setting. As a counterexample, we use matrices in (1) to show in Appendix A.15 that the zonotope of matrix \(\mathbf {B}\) is included in the zonotope of matrix \(\mathbf {A}\), whereas \(\mathbf {B}\) is not majorized by \(\mathbf {A}\). Third, the zonotope inclusion criterion can be empirically tested, whereas this is seldom the case for majorization conditions.

Theorem 1 and the corollaries implied by it contribute to the literature along the following lines. First, the results identify the differences between the zonotope inclusion criterion and majorization conditions. In particular, \(\mathbf {B}\) being matrix majorized by \(\mathbf {A}\) postulates the existence of a sequence of dissimilarity reducing or preserving transformations mapping the classes of \(\mathbf {A}\) into those of \(\mathbf {B}\). The zonotope inclusion condition focuses instead separately on each class of \(\mathbf {B}\) and requires that any such class could be obtained through dissimilarity reducing or preserving transformations of the classes of \(\mathbf {A}\). There is no guarantee that such operations can be organized into a sequence, making the zonotope inclusion criterion a refinement of matrix majorization (as observed in Dahl 1999).

Second, Theorem 1 provides an axiomatic justification for using zonotope inclusion as a multi-group criterion that is more robust than (i.e., implies) alternative refinements of matrix majorization based on sequential comparisons of two-groups distributions or on specific dissimilarity indices. Although any two-groups projection of a multi-group zonotope originates a zonotope itself, assessing inclusion among all two-groups projections is not sufficient to conclude about inclusion of the multi-group zonotopes. This is because the zonotope inclusion criterion fails to satisfy a consistency property of partial orders, requiring that if two distribution matrices differ only in terms of two groups distributions (i.e., two rows), then the two matrix could be ranked by focusing on dissimilarity in the sub-matrices associated with these groups Moulin (2016). Failing this property is desirable in a multi-group context, because the dissimilarity ranking of matrices that differ by two or few groups distributions should not only depend on dissimilarity between these distributions, but also on the extent at which these distributions are dissimilar from the rest. This feature makes the dissimilarity criterion robust against potential aggregation biases (originating, for instance, the Simpson’s paradox, see Blyth 1972).

Third, Theorem 1 rationalizes the normative underpinnings of a variety of sparse and apparently unrelated results on the measurement of (multi-group) segregation and multivariate and univariate inequality, which are shown to be embedded within the dissimilarity model.

The paper is organized as follows. Relevant notation and a definition of the zonotope inclusion ordering is provided in Section 2. Axioms and the main result are in Section 3. Section 4 describes the usefulness of our results for related orders. Section 5 concludes. All proofs are collected in a dedicated appendix.

2 Using zonotopes to test dissimilarity

2.1 Notation

A distribution matrix of size \(d\times n\) depicts a set of distributions (indexed by rows) of \(d\ge 1\) groups across \(n\ge 2\) disjoint non-ordered classes (indexed by columns), representing categories of realizations. We develop dissimilarity comparisons of distribution matrices with a fixed number d of groups but variable number of classes. These matrices are collected in the set

$$\begin{aligned} \mathcal {M}_{d}:=\left\{ \mathbf {A}=(\mathbf {a}_{1},\mathbf {a}_{2},\ldots ,\mathbf {a}_{n_{A}}):\ \mathbf {a}_{j}\in [0,1]^{d}\ \forall j,\sum _{j=1}^{n_{A}}{a_{ij}}=1\ \forall i,\ for\ n_A\ge 2\right\} , \end{aligned}$$

where \(a_{ij}\) is interpreted as the proportion of group i observed in class j and the column vector \(\mathbf {a}_{j}\) collects the proportions of all groups attaining realization j. Matrices \(\mathbf {A},\mathbf {B}\in \mathcal {M}_3\) in (1) offer two examples. The distribution matrices in \(\mathcal {M}_{d}\) are row stochastic, meaning that matrix \(\mathbf {A}\in \mathcal {M}_{d}\) represents a collection of d elements of the unit simplex \(\Delta ^{n_{A}}\). We let \(\overline{a}_{j}:=\sum _{i=1}^d{a_{ij}}\) denote the "size" of class j, obtained by weighting uniformly the groups occupying it. For instance, the size of class 1 of matrix \(\mathbf {A}\) in (1) is \(\overline{a}_1=\frac{19}{28}\).

We follow the convention of using boldface letters to indicate column vectors, so that \(\mathbf {i}_j\) is a column vector corresponding to column j of an identity matrix \(\mathbf {I}_n\) of size \(n\times n\), wherease \(\mathbf {1}_{n}=(1,\ldots ,1)'\) and \(\mathbf {0}_{n}:=(0,\ldots ,0)'\) are the column vector with all n entries respectively equal to 1 or 0. The superscript always denotes transposition.

Every distribution matrix lies in-between two extreme cases. The first case is that of perfect similarity, occurring when the distributions of the groups coincide and can be represented by the same row vector \(\mathbf {s}'\in \Delta ^{n}\). This situation is depicted by matrix \(\mathbf {S}\), whose d rows are all equal to \(\mathbf {s}'\). A maximal dissimilarity matrix \(\mathbf {D}\) is a disjoint-row-support matrix where each class is occupied at most by one group, but one group may occupy different classes.Footnote 4

If a distribution matrix displays the structure of \(\mathbf {S}\), then it is not possible to forecast the group belonging from knowledge of the realization. Conversely, if a distribution matrix is as \(\mathbf {D}\), then it is always possible to forecast the group from the knowledge of the class of the realizations. Any distribution matrix displays a structure which lies in-between that of \(\mathbf {S}\) and \(\mathbf {D}\).

We also consider transformation matrices that, when multiplied by a distribution matrix, produce effects on the extent of dissimilarity between the rows of such matrix. Denote by \(\mathcal {R}_{n,m}\) the set of \(n\times m\) row stochastic matrices whose rows lie in \(\Delta ^{m}\).Footnote 5 Moreover, denote \(\mathcal {R}_{n}\) whenever \(m=n\), while \(\mathcal {D}_{n}\subseteq \mathcal {R}_{n}\) is the set of doubly stochastic matrices whose rows and columns lie in \(\Delta ^{n}\). The set collecting all \(n\times n\) permutation matrices is denoted by \(\mathcal {P}_n\).

2.2 The zonotope set

Geometric representations of the data are useful to derive empirical tests for ranking uni- and multivariate distributions. Lorenz curves, for instance, are the workhorse of robust income inequality analysis. A Lorenz curve is obtained by arranging income observations in increasing order and then plotting the cumulative sum of these incomes shares against the cumulative sample frequencies. If the Lorenz curve of one income distribution lies above the Lorenz curve of another income distribution, then one can robustly rank the former distribution as less unequal than the latter.

Lorenz curves do not provide sufficient structure for ranking multivariate distributions, insofar each Lorenz curve allows to compare only one distribution at a time with a normatively relevant one, usually the distribution of population shares. Lorenz curves may be useful to compare distributions of two groups across relevant units, such as in school segregation analysis. In this case, Lorenz curves (known as segregation curves) portray the degree of dissimilarity between the distribution of a group’s members across schools and the distribution of another group across the same schools. Multi-group extensions are however problematic even in this domain.

In this section, we consider using the zonotope representation of the data to implement multi-group comparisons and we investigate the robust and testable ordering of distribution matrices generated by the zonotope inclusion criterion.

Fig. 1
figure 1

Zonotopes in \(d=2\) and \(d=3\) dimensions

The zonotope set \(Z(\mathbf {A})\subseteq [0,1]^d\) of a matrix \(\mathbf {A}\in \mathcal {M}_d\) is a convex polytope lying on the hypercube \([0,1]^d\) that is symmetric with respect to the point \(\frac{1}{2} \mathbf {1}_d\) (see McMullen 1971). It is defined as follows:

$$\begin{aligned} Z(\mathbf {A}):=\left\{ \mathbf {z}:=(z_{1},\ldots ,z_{d})':\mathbf {z} =\sum _{j=1}^{n_{A}}{\theta _{j}\mathbf {a}_{j}},\ \theta _{j}\in [0,1]\ \forall j=1,\ldots ,n_{A}\right\} . \end{aligned}$$

Elements in \(Z(\mathbf {A})\) are identified by the Minkowski sum of the vectors with coordinates given by \(\mathbf {A}\)’s classes. In Fig. 1a we represent the 2-dimensions zonotope of the distribution matrix \(\mathbf {E}\in \mathcal {M}_2\), defined as follows:

$$\begin{aligned} \mathbf {E}=\left( \begin{array}{cccc} 0.4 &{} 0.1 &{} 0.3 &{} 0.2 \\ 0.1 &{} 0.4 &{} 0 &{} 0.5 \end{array} \right) . \end{aligned}$$
(2)

The dimensionality of the example \(\mathbf {E}\) helps visualizing the way \(Z(\mathbf {E})\) is constructed. First, vectors with coordinates corresponding to classes of \(\mathbf {E}\) are plotted in the unit square and connected to the origin with line segments. In the figure, these vectors are marked with different symbols. For instance, the black square represents the fourth class of \(\mathbf {E}\). Then, the resulting segments are tied together in any possible arrangement. In the figure, adding together the vector corresponding to classes one and three gives the vector with coordinates (0.7, 0.1), while adding this vector to the one representing the fourth class gives (0.9, 0.6). The resulting zonotope of \(\mathbf {E}\) is the grey area in the figure that contains all possible arrangements of these segments, or portions of them. Panel b) of Fig. 1 represents instead \(Z(\mathbf {A})\) taken from (1), which is defined on the three-dimensional space. We report with solid lines only the visible edges of \(Z(\mathbf {A})\). The relevant facets of \(Z(\mathbf {A})\) originated by the sequential sum of \(\mathbf {A}\)’s classes, are coloured in light gray.

The example above highlights two situations of interest. The maximum dissimilarity zonotope is the d-dimensional hypercube and corresponds to \(Z(\mathbf {D})\). Its diagonal is the similarity zonotope, which corresponds to \(Z(\mathbf {S})\). All distribution matrices displaying some dissimilarity originate zonotopes that lie in the maximum dissimilarity zonotope and that include the similarity zonotope. The shape of \(Z(\mathbf {D})\) and of \(Z(\mathbf {S})\) does not depend on the data in matrices \(\mathbf {S}\) and \(\mathbf {D}\), thus highlighting the irrelevance of within-group heterogeneity for dissimilarity evaluations. More broadly, for each matrix in \(\mathcal {M}_d\) there exists only one zonotope representation, although the same zonotope may correspond to many distribution matrices.

2.3 The zonotope inclusion criterion

Zonotopes can be used to compare matrices by the extent of dissimilarity they exhibit. In this paper, we study the ranking of distribution matrices such as \(\mathbf {A}\) and \(\mathbf {B}\) that is generated by the inclusion of the zonotope representations of the two matrices, that is \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). Inclusion can be easily checked when \(d=2\) from inspection of the zonotopes graphs. Figure 2 shows an example where \(Z(\tilde{\mathbf {E}})\subseteq Z(\mathbf {E})\), for a distribution matrix \(\tilde{\mathbf {E}}\) obtained after performing an element-to-element summation of classes 2 and 3 in \(\mathbf {E}\), thereby leading to the central class of \(\tilde{\mathbf {E}}\) that contains \(40\%\) of the population of both groups. This operation is obviously leveling disparities in the distributions of groups 1 and 2, although similarity is not achieved. The inclusion criterion is more difficult to visualize when \(d=3\). For instance, panel b) of Fig. 1 reports relevant facets of the zonotope \(Z(\mathbf {A})\) obtained at fixed proportions of group 1 distribution (in light gray). The figure also shows the corresponding facets of the zonotope \(Z(\mathbf {B})\) (in dark gray). In this specific example, it is sufficient to check inclusion of the facets of \(Z(\mathbf {B})\) into the corresponding facets of \(Z(\mathbf {A})\) to conclude about \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\).Footnote 6 In more general situations when \(d>3\), visualization of zonotope inclusion is not possible and the inclusion criterion should be tested algorithmically.

Fig. 2
figure 2

Zonotope inclusion

For general distribution matrices A and B, the zonotope inclusion criterion \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) has an intuitive interpretation in terms of disproportionality of groups frequencies, which may lead to conclude about the dissimilarity displayed by \(\mathbf {B}\) and \(\mathbf {A}\). To see this, define an isopopulation line (when \(d=2\)) or (hyper)plane (when \(d\ge 3\)) as the set of all combinations of proportions of the groups collected in vectors \(\mathbf {z}\in Z(\mathbf {A})\) that add up to \(p\in [0,1]\), such that \(\frac{1}{d}\mathbf {1}_{d}'\cdot \mathbf {z}=p\). In other words, p is the average “size” of \(\mathbf {z}\), obtained by weighting equally all groups. Figure 2 depicts an example, based on distribution matrices \(\mathbf{E}\) and, \(\tilde{\mathbf{E}}\) in which the dashed line segments \(p'\), \(p''\) and \(p'''\) correspond to isopopulation lines. In general, \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) is verified if the set of all proportions of the groups adding up to p in \(\mathbf {B}\) is included in (i.e., is less dispersed than) the corresponding set of all proportions of the groups adding up to p in \(\mathbf {A}\). The criterion is robust, given that the inclusion should be verified for all p’s, thus implying that any disproportional allocation of groups attainable by merging classes of \(\mathbf {B}\) can also be obtained by merging the classes of \(\mathbf {A}\), but not the reverse. Proportionality is always attained in \(Z(\mathbf {S})\), in which case there is only one attainable allocations \(\mathbf {z}\in Z(\mathbf {S})\) lying on any isopopulation line p, that is \(\mathbf {z}=\mathbf {1}_dp\). Conversely, disproportionality is maximal in \(Z(\mathbf {D})\), in which case every attainable allocation lying on the isopopulation line p can be obtained from the original data.

The zonotope inclusion criterion always ranks \(Z(\mathbf {S})\subseteq Z(\mathbf {A}) \subseteq Z(\mathbf {D})\) for any \(\mathbf {A}\in \mathcal {M}_d\) and for any d. The inclusion criterion, however, entails only a partial order of distribution matrices: two matrices cannot be ordered if their respective zonotope representations intersect. In the following section, we characterize the normative content of the inclusion criterion in terms of unanimous agreement in ranking \(\mathbf {B}\) as displaying less dissimilarity than \(\mathbf {A}\) by all orderings consistent with some basic dissimilarity axioms.

3 Characterizations

3.1 The dissimilarity partial order

We investigate the possibility of ordering distribution matrices according to the dissimilarity they display. A dissimilarity ordering is a complete and transitive binary relation \(\preccurlyeq\) on the set \(\mathcal {M}_d\) with symmetric part \(\thicksim\), that ranks \(\mathbf {B}\preccurlyeq \mathbf {A}\) whenever \(\mathbf {B}\) is at most as dissimilar as \(\mathbf {A}\).Footnote 7 Given \(\mathbf {A}\in \mathcal {M}_d\), any dissimilarity ordering should always rank \(\mathbf {S}\preccurlyeq \mathbf {A}\preccurlyeq \mathbf {D}\) for any matrix that can be represented as \(\mathbf {S}\) and \(\mathbf {D}\). These matrices are respectively regarded to as equivalent representations of perfect similarity or of maximal dissimilarity, the focus being on differences across group distributions and not on the degree of heterogeneity in the distribution of each group across realizations.Footnote 8

One direct implication of this feature of the dissimilarity orderings is that distribution matrices that differ in their number of classes (\(n_A\ne n_B\)) can be regarded as indifferent from the dissimilarity orderings perspective, as long as their group distributions differ across matrices, but the dissimilarity that they display within each matrix coincides across matrices.Footnote 9 For this reason, we focus on matrices in \(\mathcal {M}_d\), which must have the same number of rows but may differ in the number of classes.

Each dissimilarity ordering induces a complete ranking of distribution matrices. In this paper, we are interested in the robust ranking of distribution matrices generated by the intersection of the dissimilarity orders satisfying some desirable properties. This is a partial order Donaldson and Weymark (1998), that we characterize axiomatically and for which we provide equivalent representations.

3.2 Axioms and preliminary results

Axioms are based on elementary operations that, when applied to distribution matrices, can reduce or preserve dissimilarity among groups. We focus on the dissimilarity orderings that rank distribution matrices consistently with the effects of these operations. In order to ease the understanding of the axioms, we contextualize the consequences of such operations in terms of dissimilarity in the distribution of groups of students with different ethnic background across the schools in a schooling district. This is commonly referred to as a problem of schooling segregation.

The first axiom defines the context, introducing an anonymity property with respect to the labels (and hence the arrangement) of the classes of a distribution matrix.

Axiom 1

IPC (Independence from Permutations of Classes) For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) with \(n_{A}=n_{B}=n\), if \(\mathbf {B}=\mathbf {A}\cdot \varvec{\Pi }_{n}\) for a permutation matrix \(\varvec{\Pi }_{n}\in \mathcal {P}_{n}\) then \(\mathbf {B}\thicksim \mathbf {A }\).

Axiom IPC restricts the focus to evaluations where the classes of a matrix can be freely permuted without affecting the extent of dissimilarity it displays. In the context of schooling segregation, the axiom posits that the names of the schools are irrelevant to conclude about the dissimilarity in the distributions of students across these schools. This is arguably the case if the schools cannot or should not be ordered according to their performances, their quality or their budget. Another implication of axiom IPC is that any distribution matrix that is obtained by permuting the columns of matrix \(\mathbf {D}\) has to be regarded to as an equivalent representation of maximal dissimilarity.

Next, we consider two transformations that extend comparability to distribution matrices that differ in the number of classes. The first transformation has to do with the insertion or elimination of empty classes, i.e., classes that are not occupied by groups. The operation consists in adding/eliminating column vectors of size d with only zero entries to/from the original distribution matrix. In the schooling segregation example, the operation corresponds to adding/eliminating schools with no students to/from the same school district. The presence of these “empty” schools in the district is irrelevant for assessing dissimilarity of groups distributions across the remaining schools of the district.

Axiom 2

IEC (Independence from Empty Classes) For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\), if \(\mathbf {B}=\left( \mathbf {A}, \mathbf {0}_{d}\right)\) then \(\mathbf {B}\thicksim \mathbf {A}\).

The IEC axiom emphasizes dissimilarity originated from non-empty columns of a distribution matrix. If \(\mathbf {A}\) and \(\mathbf {B}\) differ only because of \(|n_A-n_B|\) empty classes in one of the two matrices, then the dissimilarity in \(\mathbf {A}\) should be regarded to as equivalent to that in \(\mathbf {B}\). When combined with IPC, the axiom IEC allows to regard as indifferent all matrices obtained by adding or delating an empty class in any position.

The second transformation considered increases the number of classes by splitting proportionally (the groups frequencies in) a class into two new classes. This transformation requires to replicate one column of a distribution matrix and then to scale the entries of the original and of the replicated columns by the splitting coefficients \(\beta \in (0,1)\) and \(1-\beta\), respectively. This operation guarantees that the resulting distribution matrix is row stochastic and that the degree of proportionality of the groups frequencies in the new columns coincides with that in the original column. In the schooling segregation example, splitting a school would require to randomly allocate its students population (i.e., irrespectively of their group assignment) into two smaller institutes, so that ethnic proportions in the two new institutes are not altered. Frankel and Volij (2011) advocate a similar property (called composition invariance) in the study of multi-group school segregation (see also James and Taeuber 1985). The Independence from Split of Classes (ISC) axiom posits that the transformation described above is a source of indifference for every dissimilarity ordering.

Axiom 3

ISC (Independence from Split of Classes) For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\) with \(n_{B}=n_{A}+1\), if \(\exists \,j\) such that \(\mathbf {b}_{j}=\beta \mathbf {a}_{j}\) and \(\mathbf {b} _{j+1}=(1-\beta )\mathbf {a}_{j}\) with \(\beta \in (0,1)\), while \(\mathbf {b} _{k}=\mathbf {a}_{k}\) \(\forall k<j\) and \(\mathbf {b}_{k+1}=\mathbf {a}_{k}\) \(\forall k>j\), then \(\mathbf {B}\thicksim \mathbf {A}\).

A split transformation increases the number of classes and modifies the shape of a distribution matrix, but it does not alter the proportionality of the groups. For this reason, it is regarded to as dissimilarity preserving.

The merge of classes transformation complements the split operation. A merge of classes is implemented by vector summation of two adjacent columns of a distribution matrix, irrespectively of the groups composition of each column. The operation has an immediate interpretation in the schooling segregation example: it consists in merging all students from two neighboring schools into a single, larger school. Each ethnic group in the school of destination is increased by an amount equal to the proportion of the corresponding group in the school of departure, which is then emptied. If one or both schools are empty, segregation does not increase nor decreases. Consider, instead, the case of two ethnic groups that are similarly distributed across almost all schools in a district, apart from two schools, such that a group is over-represented compared to the other in one school, and under-represented in the other school. Merging each of these two schools with other schools in the district would reduce the compositional differences, without eliminating them. A merge of these two schools would, instead, establish proportionality in ethnic composition across all schools, leading to perfect similarity. The Dissimilarity Decreasing Merge of Classes (MC) axiom states that every merge of classes transformation cannot increase dissimilarity.

Axiom 4

MC (Dissimilarity Decreasing Merge of Classes) For any \(\mathbf {A},\,\mathbf {B}\in \mathcal {M}_{d}\) with \(n_{A}=n_{B}\), if \(\mathbf {b}_{j}=\mathbf {0}_{d}\), \(\mathbf {b}_{j+1}=\mathbf {a}_{j}+\mathbf {a} _{j+1}\) while \(\mathbf {b}_{k}=\mathbf {a}_{k}\ \forall k\ne j,j+1\), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).

Consider obtaining \(\mathbf {B}\) from \(\mathbf {A}\) with a merge transformation adding together, distribution by distribution, the group proportions observed in two classes \(\mathbf {a}_j\) and \(\mathbf {a}_{j+1}\) whenever \(\mathbf {a}_{j+1}=\beta \mathbf {a}_{j}\), \(\beta >0\), such that \(\mathbf {b}_j=(1+\beta )\mathbf {a}_j\) and \(\mathbf {b} _{k}=\mathbf {a}_{k}\) \(\forall k<j\) while \(\mathbf {b}_{k}=\mathbf {a}_{k+1}\) \(\forall k>j\). This operation leaves dissimilarity unaffected. The operation is opposite to a split, but supports the same indifference class, gathering all matrices \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\) such that \(\mathbf {B}\thicksim \mathbf {A}\) for all orderings consistent with axiom ISC, even if not consistent with MC.

Axioms MC, IEC, ISC and IPC are independent. The transitive closure of all dissimilarity orderings satisfying these axioms defines a partial order of distribution matrices (see Donaldson and Weymark 1998), which is represented by the matrix majorization criterion. We refer to this partial order as \(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\), indicating that matrix \(\mathbf {B}\) is matrix majorized by \(\mathbf {A}\) whenever there exists \(\mathbf {X}\in \mathcal {R}_{n_A,n_B}\) such that \(\mathbf {B}=\mathbf {A}\mathbf {X}\) (Marshall et al. 2011; Dahl 1999).

Proposition 1

For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) the following statements are equivalent:

  1. (i)

    \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MC, ISC, IEC and IPC.

  2. (ii)

    \(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\).

The notion of matrix majorization has been investigated in a variety of contexts (see p. 625 in Marshall et al. (2011) and literature therein), the most relevant being that of comparisons of informativeness of statistical experiments with a finite number of outcomes.Footnote 10 The characterization in Proposition 1, which is alternative to Grant et al. (1998), Frankel and Volij (2011) and Lasso de la Vega and Volij (2014), shows that every informativeness comparison of matrices in \(\mathcal {M}_d\) verifies the existence of dissimilarity preserving and reducing transformations mapping the most informative distribution matrix into the least informative one.

While appealing, matrix majorization entails the possibility of ranking distribution matrices only if there exists a sequence of relevant transformations of the data that can be represented by a row stochastic matrix. This requirement is very stringent in some cases. Consider for instance matrices \(\mathbf {A}\) and \(\mathbf {B}\) in (1). It is shown in Appendix A.15 that every column of \(\mathbf {B}\) can be obtained by splitting and merging the columns of \(\mathbf {A}\). Yet, these operations cannot be arranged to form a sequence in this specific example. As a consequence, these operations cannot be represented in the form of a row stochastic matrix, thereby yielding that \(\mathbf {A}\) does not matrix majorize \(\mathbf {B}\). In fact, as shown in the appendix, there is a unique admissible transformation matrix with non-negative entries, denoted

$$\begin{aligned} \mathbf {X}= & {} \left( \begin{array}{cccc} 1 &{} 0.5 &{} 0 \\ 1 &{} 0 &{} 0.25 \\ 0 &{} 0.5 &{} 0.75 \\ 0 &{} 0 &{} 0 \end{array} \right) , \end{aligned}$$
(3)

yielding \(\mathbf {B}=\mathbf {A}\mathbf {X}\). The transformation matrix \(\mathbf {X}\) is, clearly, not row stochastic (for a related example, see Koshevoy 1995).

We address the concerns raised by the example above by introducing a new class of dissimilarity axioms. These axioms relax the possibility of ranking distribution matrices exclusively by mean of (sequences of) merge transformations, invoking instead a form of consistency of the dissimilarity orderings with respect to convex combinations of (columns of) distribution matrices. The first axiom, denoted Strong-MixC, states that if there are matrices \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) that are ranked as not more dissimilar than \(\mathbf {A}\), then the convex mix of such matrices yields a matrix \(\mathbf {B}\) that cannot display more dissimilarity than \(\mathbf {A}\). Before stating the axiom, recall that any element of the convex hull (denoted conv) of matrices \(\mathbf {B}^j=(\mathbf {b}^j_1,\ldots ,\mathbf {b}^j_{n})\in \mathcal {M}_d\) is a matrix \(\mathbf {B}=(\mathbf {b}_1,\ldots ,\mathbf {b}_{n})\in \mathcal {M}_d\) such that \(\mathbf {b}_k=\sum _{j=1}^{m}{w_j\mathbf {b}_k^j}\) \(\forall k\), for any set of weights \(w_j\in [0,1]\) satisfying \(\sum _{j=1}^{m}{w_j}=1\). The axiom name follows from the fact that every mix of matrices can be interpreted as a specific mix of classes that assigns uniform weights to the classes of the same matrix.

Axiom 5

Strong-MixC (Dissimilarity Consistency with Uniform Mixing of Classes) Consider a \(d\times n\) matrix \(\mathbf {A}\in \mathcal {M}_{d}\) and a sequence \(j=1,\ldots ,m\), \(m\ge 2\) of \(d\times n\) matrices \(\mathbf {B}^j\in \mathcal {M}_d\) such that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\). If \(\mathbf {B} \in conv\{\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\}\), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).

The axiom Strong-MixC postulates a “betweenness” property (see Dekel 1986) for dissimilarity orderings, meaning that all orderings satisfying it regard a new distribution \(\mathbf {B}\) whose classes are obtained as convex combinations of the respective classes of \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) as not more dissimilar than \(\mathbf {A}\). The fact that such convex combination assigns the same weight \(w_j\) to each class of a matrix \(\mathbf {B}^j\), with \(\sum _j{w_j}=1\), guarantees that \(\mathbf {B}\in \mathcal {M}_d\). Matrix \(\mathbf {B}\) may be regarded as more dissimilar than some of the matrices \(\mathbf {B}^j\), but it cannot display more dissimilarity than \(\mathbf {A}\) given that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\).

The normative appeal of axiom Strong-MixC rests in its relation with operations of mixing of columns of different matrices. Such operations are regarded to as unambiguously dissimilarity not increasing, given their relation with merge of classes operations. We contextualize this point in terms of school segregation analysis. The axiom Strong-Mix implies that every school or merge of schools from schooling district \(\mathbf {B}\) (i.e., columns of the distribution matrix) can always be obtained through a convex combination of schools or merges of schools issued from the schooling districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\). Formally, let \(\mathcal {V}^{01}_n\) be the set of \(n\times 1\) vectors whose elements are either 0 or 1. Non-zero entries of vector \(\varvec{\gamma }\in \mathcal {V}^{01}_n\) identify classes (or one class) of a distribution matrix, so that \(\mathbf {B}\varvec{\gamma }\) yields a new school that is obtained by merging some of \(\mathbf {B}\)’s schools (or by keeping one of its schools). Any mixing operation underlying axiom Strong-MixC always grants that:

$$\begin{aligned} \forall \varvec{\gamma }\in \mathcal {V}^{01}_n \ \ \exists \varvec{\gamma }^j\in \mathcal {V}^{01}_n,\ j=1,\ldots ,m\ :\ \ \mathbf {B}\varvec{\gamma }\in conv\{\mathbf {B}^1\varvec{\gamma }^1,\ldots ,\mathbf {B}^m\varvec{\gamma }^m\}, \end{aligned}$$
(4)

where conv is the convex hull of such vectors.

Arguably, every merge of schools smooths the extent of ethnic disparities within a schooling district and can contribute to reduce segregation. Condition (4) implies that there are less opportunities to reduce segregation by merging some of the schools in schooling district \(\mathbf {B}\) compared to the extent of opportunities for reducing segregation that are available by merging schools in districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\). In fact, the former district/matrix can always be obtained as a combination of the latter matrices, whereas the reverse may not be true. As an example, consider the (rather extreme) case in which \(\mathbf {B}\) is the similarity matrix: in this case there are no opportunities at all to reduce segregation by merging \(\mathbf {B}\)’s classes, being segregation already at its minimum.

Condition (4) is always granted by axiom Strong-MixC, which imposes additional structure.Footnote 11 We consider a new axiom, denoted MixC, that regards every matrix \(\mathbf {B}\in \mathcal {M}_d\) that satisfies condition (4) as displaying not more dissimilarity than \(\mathbf {A}\).

Axiom 6

MixC (Dissimilarity Consistency with Mixing of Classes) Consider a \(d\times n\) matrix \(\mathbf {A}\in \mathcal {M}_{d}\) and a sequence \(j=1,\ldots ,m\), \(m\ge 2\) of \(d\times n\) matrices \(\mathbf {B}^j\in \mathcal {M}_d\) such that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\). If \(\mathbf {B}\in \mathcal {M}_d\) satisfies condition (4), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).

The axiom MixC values the fact of having less opportunities for reducing segregation in school district \(\mathbf {B}\) compared to districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\), in the sense that if the latter districts display less school segregation than \(\mathbf {A}\), then \(\mathbf {B}\) should also display less segregation than \(\mathbf {A}\). Axiom MixC extends the orderings \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) for every \(j=1,\ldots ,m\) to \(\mathbf {B}\preccurlyeq \mathbf {A}\). The axiom may allow to compare cases such as those in the example related to the transformation matrix in (3): even if there is no sequence of merge of classes operations mapping \(\mathbf {A}\) into \(\mathbf {B}\), it may be sufficient to conclude that every class of \(\mathbf {B}\) is obtained by merging classes of \(\mathbf {A}\) to verify condition (4) and thus rank \(\mathbf {B}\preccurlyeq \mathbf {A}\), as we show in Appendix A.15.

The axiom MixC does not explicitly mention the way \(\mathbf {B}\) is constructed. One way to obtain it (which we rely upon in the proofs) is by considering weights \(w_j^k\in [0,1]\), with \(\sum _{j=1}^{m}{w_j^k}=1\), that are specific to each class k, such that \(\mathbf {b}_k=\sum _{j=1}^{m}{w_j^k\mathbf {b}_k^j}\) \(\forall k\). These weights are more general than those considered by axiom Strong-MixC. When configuration \(\mathbf {B}\) is obtained in such a way and (4) is satisfied, the axiom MixC posits that mixing students across school of districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) that are not more segregated than \(\mathbf {A}\) cannot generate a new schooling district \(\mathbf {B}\) that is more segregated than \(\mathbf {A}\).

The fact that axiom Strong-MixC is always consistent with condition (4) proves the next statement.

Remark 1

If \(\preccurlyeq\) is consistent with MixC then it is consistent with Strong-MixC.

As a consequence, the MixC axiom implies a ranking of distribution matrices that is less partial compared to that characterized by axiom Strong-MixC, in the sense that all dissimilarity orderings consistent with Strong-MixC are capable of ranking unanimously only a subset of all matrices that can be ordered by the orderings consistent with MixC.

We now investigate if the partial orders of distribution matrices supported by matrix majorization and by the zonotope inclusion criterion are consistent with these new axioms. The next Remark shows that matrix majorization \(\preccurlyeq ^R\) is consistent with Strong-MixC.

Remark 2

For \(\mathbf {A},\mathbf {B}^j\in \mathcal {M}_d\), \(j=1,\ldots ,m\), let \(\mathbf {B}\in conv\{\mathbf {B}^1,\ldots ,\mathbf {B}^m\}\): \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) \(\forall j\) \(\Rightarrow\) \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\).

When axiom Strong-MixC is paired with dissimilarity preserving axioms, it allows to characterize matrix majorization without resorting on axiom MC, which is hence implied by all the axioms considered.

Remark 3

If \(\preccurlyeq\) satisfies IPC, ISC, IEC and Strong-MixC then it satisfies MC.

Axiom MC becomes redundant when characterizing dissimilarity partial orders if the Strong-MixC axiom is combined with all the dissimilarity preserving axioms. The axiom Strong-MixC yields a new characterization of \(\preccurlyeq ^R\) which is alternative to the one presented in Proposition 1.

Proposition 2

For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\), the following statements are equivalent:

  1. (i)

    \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms Strong-MixC, ISC, IEC and IPC.

  2. (ii)

    \(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\).

Proposition 2 highlights that, even if axiom MC is implied by Strong-MixC, ISC, IEC and IPC (see Remark 3), these axioms still lead to the partial order of matrix majorization as in Proposition 1 and not to another partial order that is capable of ranking more matrices. Weakening the Strong-MixC axiom towards MixC may help characterizing such partial order. In the two-groups case (\(d=2\)), Proposition 2 can be reformulated by weakening Strong-MixC in favor of axiom MixC, since matrix majorization \(\preccurlyeq ^R\) is consistent with axiom MixC when \(d=2\).

Remark 4

For \(\mathbf {A},\mathbf {B}^j,\mathbf {B}\in \mathcal {M}_2\) such that \(\mathbf {B}\) and \(\mathbf {B}^j\), \(j=1,\ldots ,m\), satisfy condition (4): \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) \(\forall j\) \(\Rightarrow\) \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\).

In the multi-group context (\(d\ge 3\)), however, matrix majorization is not consistent with axiom MixC. A counterexample is given in Appendix A.15, where we use the matrices in (1) to identify matrices \(\mathbf {B}^j\in \mathcal {M}_d\), \(j=1,2,3\) such that \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) and we then show that \(\mathbf {B}\) satisfies condition (4) but cannot be obtained as in axiom Strong-MixC and hence \(\mathbf {B}\not \preccurlyeq ^R\mathbf {A}\). As a consequence, matrix majorization is only sufficient but not necessary to conclude about unanimity in the ranking by all dissimilarity orderings \(\preccurlyeq\) consistent with axioms MixC, IPC, IEC and ISC. Conversely, the zonotope inclusion criterion is always consistent with axiom MixC.

Remark 5

For \(\mathbf {A},\mathbf {B}^j,\mathbf {B}\in \mathcal {M}_d\) such that \(\mathbf {B}\) and \(\mathbf {B}^j\), \(j=1,\ldots ,m\), satisfy condition (4): \(Z(\mathbf {B}^j)\subseteq Z(\mathbf {A})\) \(\forall j\) \(\Rightarrow\) \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\).

The zonotope inclusion criterion is consistent with MixC and, from Remark 1, it is also consistent with Strong-MixC. However, the Strong-MixC axiom is not necessary to characterize zonotope inclusion: the operations underlying Strong-MixC do not allow, alone, to break down the zonotope inclusion ordering of any pair \(\mathbf {A}\) and \(\mathbf {B}\) into the existence of simpler mixing transformation mapping one matrix into the other. The main result of the paper shows that the MixC axiom provides the needed structure to establish a characterization of the zonotope inclusion order. Dissimilarity orderings consistent with Strong-MixC but not with MixC can be represented by matrix majorization (Proposition 2), but this guarantees only a sufficient condition for zonotope inclusion.

3.3 Main result and discussion

Theorem 1

For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\), the following statements are equivalent:

  1. (i)

    \(\mathbf {B}\ \preccurlyeq \ \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MixC, ISC, IEC, IPC.

  2. (ii)

    \(Z(\mathbf {B}) \ \subseteq \ Z(\mathbf {A})\).

The theorem provides a novel complete characterization of the zonotope inclusion criterion in terms of dissimilarity. The zonotope inclusion criterion originates a partial order of distribution matrices. If the inclusion test fails, then consensus on the ranking of distribution matrices by all dissimilarity orderings consistent with MixC, IPC, IEC and ISC cannot be reached. Nonetheless, this partial order is “less partial” than matrix majorization (that is, zonotope inclusion is a refinement of \(\preccurlyeq ^R\)) and is thus capable of ordering a larger set of cases compared to it.

Remark 6

Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\): \(\mathbf {B}\preccurlyeq ^{R}\mathbf {A} \Rightarrow Z(\mathbf {B}) \subseteq Z(\mathbf {A})\).

The remark shows that there are matrices that cannot be ranked by dissimilarity orderings consistent with Strong-MixC or, equivalently and alternatively, with axiom MC but that can be ordered only by resorting to axiom MixC, which provides the additional structure that is needed to refine \(\preccurlyeq ^R\). The rationale for this refinement is clarified in the proof of Theorem 1. There, we make use of the operations underlying axioms Strong-MixC, IEC, IPC and ISC to characterize the distribution matrices that form the basis of the set of all matrices that are matrix majorized by any given \(\mathbf {A}\). We also show that some of the classes in each of the basis matrices identify vertices of \(Z(\mathbf {A})\) (and that \(Z(\mathbf {A})\) is the convex hull of its vertices). While the convex hull of the basis matrices obtained by using the weights implied by axiom Strong-MixC is sufficient to characterize the full set of matrices \(\mathbf {B}\) such that \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\), the same operation is not sufficiently flexible to characterize the entire set of matrices \(\mathbf {B}\) such that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). Instead, when condition (4) is applied to the permutations of \(\mathbf {A}\) (all regarded as dissimilar as \(\mathbf {A}\) itself), it identifies the vertices of \(Z(\mathbf {A})\), so that the convex mix of those gives \(Z(\mathbf {A})\). Some of the matrices identified in this way cannot be obtained using the weights implied by Strong-MixC, which demonstrates Remark 6.

The reverse implication of Remark 6 is not true in general. The matrices \(\mathbf {A}\) and \(\mathbf {B}\) in (1) provide a counterexample in which \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) but the two matrices cannot be ranked by matrix majorization. Matrix majorization and zonotope inclusion orderings may coincide only under specific circumstances, such as those identified in Theorem 2 in Koshevoy (1995) or in cases where dissimilarity comparisons are limited to distributions where \(d=2\) (Dahl 1999; Lasso de la Vega and Volij 2014). We provide in the appendix a new geometric proof of the latter statement.

Remark 7

Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_2\): \(Z(\mathbf {B})\subseteq Z(\mathbf {A}) \Rightarrow \mathbf {B}\preccurlyeq ^{R}\mathbf {A}\) .

The set of matrices that can be ordered in terms of dissimilarity can be further extended by considering the transitive closure originated by all binary relations \(\preccurlyeq\) satisfying axioms in Theorem 1 and a new axiom, the Independence of Permutation of Groups (IPG), which introduces a property of invariance of the dissimilarity orderings with respect to the labeling of the groups.

Axiom 7

IPG (Independence from Permutations of Groups) For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\), if \(\mathbf {B}=\varvec{\Pi } _{d}\cdot \mathbf {A}\) for a permutation matrix \(\varvec{\Pi }_{d}\in \mathcal {P}_{d}\) then \(\mathbf {B}\thicksim \mathbf {A}\).

In the context of segregation analysis, the axiom provides a natural multi-group extension of the symmetry of types property Hutchens (2015). Together with ISC, IEC, IPC and MixC, the axiom extends dissimilarity comparisons based on zonotope inclusion orderings that are not concerned with the labeling of the groups. A proof of the following corollary rests on the properties of the zonotope.

Corollary 1

For any \(\mathbf {A}\in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all axioms satisfying MixC, ISC, IEC, IPC and IPG if and only if \(\exists \varvec{\Pi }_d\in \mathcal {P}_d\) such that \(Z(\mathbf {B})\subseteq Z(\varvec{\Pi }_d\mathbf {A}).\)

3.4 Dissimilarity indices

The partial order of dissimilarity in Theorem 1 can be represented in terms of agreement of dissimilarity indices satisfying desirable properties. A dissimilarity index is a multivariate function \(D:\mathcal {M}_d\rightarrow \mathbb {R}_+\) mapping a distribution matrix into a number, which can be interpreted as the level of dissimilarity among the d distributions represented in that matrix. These indices measure dissimilarity as the average of within-class dispersion of group frequencies.

Consider first the dissimilarity orderings consistent with Strong-MixC. In this case, dispersion of groups frequencies within each class can be quantified by a function h in the class \(\mathcal {H}\) of real valued convex functions defined on \(\Delta ^{d}\). Dispersion in class j contributes to the overall dissimilarity proportionally to the size of the class j, \(\overline{a}_{j}\). The dissimilarity index \(D_{h}\) with \(h\in \mathcal {H}\) aggregates these evaluations as follows:Footnote 12

$$\begin{aligned} D_{h}(\mathbf {A}):=\frac{1}{d}\sum _{j=1}^{n_{A}}{\overline{a}_{j}\cdot h\left( a_{1j}/\overline{a}_{j},\ldots ,a_{dj}/\overline{a}_{j}\right) }, \end{aligned}$$
(5)

where \(a_{ij}/\overline{a}_{j}\) is the proportion of group i relative to the size of class j when groups are uniformly weighted. Dissimilarity is minimized when \(a_{ij}/\overline{a}_{j}=1/d\) for each of the d groups in all classes. Hence, by normalizing h so that \({h\left( \frac{1}{d}\mathbf {1}_{d}'\right) =0}\), the index takes value 0 when perfect similarity is reached. Dissimilarity is instead maximal when for every j there exists a i such that \(a_{ij}=\overline{a}_{j}\). The following proposition sets out a dominance condition in terms of dissimilarity indices.

Proposition 3

For any \(\mathbf {A},\mathbf {B} \in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms Strong-MixC, ISC, IEC, IPC if and only if \(D_h(\mathbf {B})\ \le \ D_h(\mathbf {A})\) for all \(h\in \mathcal {H}\).

Using Proposition 1, we conclude that matrix majorization entails a necessary and sufficient condition for assessing agreement in dissimilarity evaluations for all indices described in Proposition 3. Matrix majorization is refined by the zonotope inclusion criterion. We provide an equivalent representation of the latter criterion in terms of dissimilarity measures based on the so-called price dominance criterion (Kolm 1977; Koshevoy and Mosler 1996; Andreoli and Zoli 2020).Footnote 13 Consider a set of “prices” (or normative weights) \(\mathbf {p}=(p_1,\ldots ,p_d)'\), which take on real values and therefore could also be negative, allowing to draw evaluations of the relative groups composition of each class of a distribution matrix by the implied “budget” (or weighted average) \(\mathbf {p}'\mathbf {a}_j/\overline{a}_j\). In a case of perfect similarity, \(\mathbf {a}_j/\overline{a}_j=\frac{1}{d}\mathbf {1}_d\) for all classes j. Therefore, perfect equality within each group i for all average realizations \(a_{ij}/\overline{a}_{j}\) across all classes j indicates lack of dissimilarity. The same consideration applies if all realizations in each class are weighted in \(\mathbf {p}'\mathbf {a}_j/\overline{a}_j\) irrespective of the weighting vector \(\mathbf {p}\). Each class contributes additively to dissimilarity, with evaluations indexed by a convex functional \(\phi : \mathbb {R}\rightarrow \mathbb {R}\) such that \(\phi (\mathbf {p}'\mathbf {a}_j/\overline{a}_j)\), which is introduced to quantify the inequality across the realizations of all classes. If we quantify this aggregate inequality by \(\frac{1}{d}\sum _{j=1}^{n_A}{\overline{a}_j\phi (\mathbf {p}'\cdot \mathbf {a}_j/\overline{a}_j)}\), then its minimum level can be reached at \(m:=\phi (\sum _{i=1}^d{p_i/d})\). The minimum bound for the dissimilarity index

$$\begin{aligned} D_{\mathbf {p},\phi }(\mathbf {A}):=\frac{1}{d}\sum _{j=1}^{n_A}{\overline{a}_j\phi (\mathbf {p}'\cdot \mathbf {a}_j/\overline{a}_j)}-m \end{aligned}$$

is therefore 0.Footnote 14

Proposition 4

For any \(\mathbf {A},\mathbf {B} \in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MixC, ISC, IEC, IPC if and only if \(D_{\mathbf {p},\phi }(\mathbf {B})\ \le \ D_{\mathbf {p},\phi }(\mathbf {A})\) for all \(\mathbf {p}\in \mathbb {R}^d\) and for every \(\phi\) convex.

Proposition 4 provides the class of dissimilarity indices that are related to zonotope inclusion and defines a dominance condition that is weaker than that implied by Proposition 3.

4 Related orders

This section highlights the relevance of the dissimilarity model for the analysis of multi-group segregation and multivariate inequality. First, we argue that the zonotope inclusion criterion can be meaningfully used in the analysis of segregation with many (more than two) groups. We also demonstrate that widely used multi-group segregation indices characterized in the literature are consistent with dissimilarity preserving or reducing axioms. Second, we analyze the implications of the dissimilarity model for multivariate orderings of dispersion, where the focus is on dissimilarity between the distributions of certain attributes and a normatively relevant benchmark distribution. In this case, the configuration that displays less dissimilarity among the underlying distributions and with respect to the benchmark distribution, also exhibits less multivariate inequality/dispersion. We prove that the zonotope inclusion criterion weakens some of the most widely adopted robust criteria in the multivariate inequality literature. Third, we emphasize the relevance of the dissimilarity axioms for the analysis of univariate inequality.

4.1 Segregation

Segregation arises when individuals with different characteristics (such as their ethnic origin or gender) are distributed unevenly across the neighborhoods of a city, or across the schools of a schooling district, or across jobs within a firm. In segregation analysis, the realizations of interest are categorical and not-ordered. Mainstream approaches to segregation focus on the two-groups case and postulate consistency with the partial order generated by non-intersecting segregation curves Duncan and Duncan (1955) as a baseline.

A segregation curve is obtained by ordering the classes of \(\mathbf {A}\) by increasing magnitude of the ratio \(a_{2j}/a_{1j}\) evaluated for each class j. It gives the proportions of group 1 and of group 2 that are observed in the classes where group 2 is relatively over-represented. The graph of the segregation curve coincides with the lower boundary of the zonotope representing the cumulative shares of groups 1 and group 2 across categories.

The ranking of two-groups distributions generated by non-intersecting segregation curves can be characterized through elementary segregation-reducing operations (see Hutchens 1991) or, alternatively, by matrix majorization (Lasso de la Vega and Volij 2014; Hutchens 2015). When segregation curves cross, distributions can be ranked by segregation indices consistent with the segregation curve ordering (Reardon and Firebaugh 2002; Reardon 2009). Alternatively, segregation curves have been used to assess the dissimilarity between each group distribution and the population distribution as in Alonso-Villar and del Rio (2010).

We are not, however, aware of any ordering generated by a multi-group expansions of the segregation curves ordering. Frankel and Volij (2011) have provided normative justifications for using matrix majorization as a robust segregation criterion for ranking multi-group distributions (see also Flückinger and Silber 1999; Chakravarty and Silber 2007). However, matrix majorization is a demanding condition that is not testable in the multi-group setting. Results in this paper deliver three contributions to this literature.

First, Proposition 1 clarifies that the operations of merge (or, equivalently, Strong-MixC), split, permutation and insertion/elimination of empty classes characterize the ranking produced by non-intersecting segregation curves when \(d=2\). The same axioms characterize matrix majorization when \(d\ge 2\), thus showing that every segregation comparison involves a dissimilarity comparison. The dissimilarity axiom MixC allows to weaken this criterion to a less partial ordering compared to matrix majorization, which can also be interpreted in terms of segregation.

Second, we promote the zonotopes inclusion criterion as a relevant multi-group extension of the segregation curve dominance criterion. The zonotope inclusion criterion is new in the segregation literature, it is testable and it allows to deal with the multi-dimensional nature of the data. In the case \(d=2\), segregation curve dominance is always consistent with zonotope inclusion, insofar the segregation curve can be understood as the lower boundary of a zonotope.Footnote 15 Moreover, segregation curve dominance is also equivalent to matrix majorization.

In the multi-group setting (\(d\ge 3)\), however, a dominance criterion based on comparisons of segregation curves across all pairs of groups provides only a necessary condition for zonotope inclusion. Furthermore, in this context the zonotope inclusion criterion is weaker than matrix majorization, thus providing a natural refinement to it. Theorem 1 characterizes it in terms of MixC axiom, thus proving the link between multi-group segregation and dissimilarity.

Third, Proposition 3 identifies and characterizes the class of multi-group segregation indices that are coherent with the family \(D_h\). Below are some examples of well-known segregation indices belonging to this class.

The Duncan and Duncan’s dissimilarity index for a matrix \(\mathbf {A}\in \mathcal {M}_{2}\) is \(D(\mathbf {A}):=\frac{1}{2}\sum _{j=1}^{n_{A}}\left| {a_{1j}-a_{2j}} \right|\). It measures dissimilarity as the average absolute distance between the elements \({a_{1j}/\overline{a}_{j}}\) and \({a_{2j}/\overline{a} _{j}}\) in each class. By setting

$$\begin{aligned}h(a_{1j}/\overline{a}_j,a_{2j}/\overline{a}_j):=\left| {a_{1j}/\overline{a}_{j}-a_{2j}/\overline{a}_{j}}\right| \end{aligned}$$

it follows that \(D_h(\mathbf {A})=D(\mathbf {A})\).

In the multi-group context (\(\mathbf {A}\in \mathcal {M}_{d}\)), segregation can be measured by the Atkinson-type segregation index, defined as \(A_{\omega }(\mathbf {A}):=1-\sum _{j=1}^{n_{A}}\prod \nolimits _{i=1}^{d}{\left( a_{ij}\right) }^{\omega _{i}}\) for \(\omega _{i}\ge 0\) such that \(\sum _{i=1}^{d}{\omega _{i}=1}\). By setting

$$\begin{aligned}{h\left( a_{1j}/\overline{a}_{j},\ldots ,a_{dj}/\overline{a}_{j}\right) }:=1-d\prod \nolimits _{i=1}^{d}{\left( a_{ij}/\overline{a}_{j}\right) }^{\omega _{i}}\end{aligned}$$

it follows that \(D_h(\mathbf {A})=A_{\omega }(\mathbf {A})\). Convexity of h stems from the features of the weighting scheme.

The mutual information index characterized in Frankel and Volij (2011) and Moulin (2016) is \(M(\mathbf {A}):=\log _{2}(d)-\sum _{j=1}^{n_{A}}\left( \frac{\overline{a}_{j}}{d}\right) \sum _{i=1}^{d}{\frac{a_{ij}}{\overline{a}_{j}}}\cdot \log _{2}\left( \frac{\overline{a}_{j}}{a_{ij}}\right)\) with \({\frac{a_{ij}}{\overline{a}_{j}}}\cdot \log _{2}\left( \frac{\overline{a}_{j}}{a_{ij}}\right)\) set equal to 0 if \(a_{ij}=0\). By setting

$$\begin{aligned}{h\left( a_{1j}/\overline{a}_{j},\ldots ,a_{dj}/\overline{a}_{j}\right) }:=\sum _{i=1}^{d}{(1/d)}{\cdot \log _{2}\left( d\right) -{\frac{a_{ij}}{\overline{a}_{j}}}\cdot \log _{2}\left( \frac{\overline{a}_{j}}{a_{ij}}\right) }\end{aligned}$$

it follows that \(D_h(\mathbf {A})=M(\mathbf {A})\). Convexity of h stems from the log operator.

4.2 Multivariate majorization and the Lorenz zonotope

In this section, the focus is on the multivariate majorization criteria that are adopted in robust inequality analysis. We argue that every inequality comparison involves the assessment of the dissimilarity between some relevant distributions and a benchmark distribution. A canonical example is that in which the distribution matrices \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) represent multivariate distributions of commodities. A matrix represents the way in which shares of each commodity (by row) are allocated to certain classes, which can represent the demographic units (e.g. families or individuals) that consume these commodities. Units are not ordered in any meaningful way.

Multidimensional inequality arises from the dissimilarity between the d distributions under analysis and the distribution of the demographic weight of the n units. It is common to assume that every unit receives a uniform weight equal to 1/n. Under these circumstances, the next corollary, which follows from Proposition 1, formalizes the relation between multivariate inequality analysis and dissimilarity.

Corollary 2

Let \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\). Then

$$\begin{aligned} \tilde{\mathbf {B}}:=\left( \begin{array}{c} \frac{1}{n_{B}}\mathbf {1}_{n_{B}}' \\ \mathbf {B} \end{array} \right) \ \ \ \preccurlyeq \ \ \ \tilde{\mathbf {A}}:=\left( \begin{array}{c} \frac{1}{n_{A}}\mathbf {1}_{n_{A}}' \\ \mathbf {A} \end{array} \right) \end{aligned}$$
(6)

for every dissimilarity ordering \(\preccurlyeq\) satisfying axioms MC (or Strong-MixC), ISC, IEC and IPC if and only if there exists \(\mathbf {X}\in \mathcal {R}_{n_{A},n_{B}}\) such that (i) \(\mathbf {B}=\mathbf {A} \mathbf {X}\) and (ii) \(\frac{n_A}{n_{B}}\mathbf {1}_{n_{B}}' = \mathbf {1}_{n_{A}}' \mathbf {X}\).

When \(n_A=n_B=n\), matrix \(\mathbf {X}\) in the corollary is doubly stochastic (\(\mathbf {X}\in \mathcal {D}_{n}\)). The condition \(\mathbf {B}=\mathbf {A}\cdot \mathbf {X}\) with \(\mathbf {X} \in \mathcal {D}_{n}\) implied by (6), often referred to as uniform majorization, is widely adopted in robust univariate and multivariate inequality analysis (see p. 613 in Marshall et al. 2011). All social welfare functions that are increasing and Schur-concave (i.e. display some degree of inequality aversion) would rank the two multivariate distributions accordingly.

Uniform majorization is a demanding criterion, alike matrix majorization, insofar it posits that inequality can be reduced only when every row of a distribution matrix \(\mathbf {B}\) is obtained from the corresponding row of another distribution matrix \(\mathbf {A}\) through a common set of transformations implied by the matrix \(\mathbf {X}\in \mathcal {D}_n\). The resulting ordering of distribution matrices is therefore partial. Koshevoy (1995) and Koshevoy and Mosler (1996) have studied a less partial order of multivariate distributions that is based on the Lorenz zonotopes inclusion criterion. A Lorenz zonotope, denoted \(LZ(\mathbf {A})\) with \(\mathbf {A}\in \mathcal {M}_d\), is a \(d+1\) dimensional zonotope of a distribution matrix \(\mathbf {A}\) augmented by the population distribution vector, that is \(LZ(\mathbf {A}):= Z(\tilde{\mathbf {A}})\) with \(LZ(\mathbf {A})\in \mathbb {R}_+^{d+1}\) and \(\tilde{\mathbf {A}}\) defined as in (6). The Lorenz zonotope inclusion criterion induces a partial order of distribution matrices that provides a testable refinement of uniform majorization.

The following chain of implications clarifies the relation between multivariate inequality and dissimilarity. The proof follows from previous results.

Remark 8

Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\) such that \(d\ge 2\) and \(n_A=n_B=n\). Then:

$$\begin{aligned} \mathbf {B}=\mathbf {A}\mathbf {X},\ \mathbf {X}\in \mathcal {D}_n\ \ \Rightarrow \ \ \mathbf {B}\preccurlyeq ^R\mathbf {A}\ \ \Rightarrow \ \ LZ(\mathbf {B})\subseteq LZ(\mathbf {A})\ \ \Rightarrow \ \ Z(\mathbf {B})\subseteq Z(\mathbf {A}) \end{aligned}$$

The first implication, showing that uniform majorization implies matrix majorization, has been discussed in the previous section. The Lorenz zonotope inclusion criterion defines an inequality partial order of distribution matrices that is less partial than (i.e., is implied by) matrix majorization. It follows that the ranking of distribution matrices given by \(LZ(\mathbf {B})\subseteq LZ(\mathbf {A})\) is always consistent with the implications of a merge transformation (or, alternatively, a convex combination underlying Strong-MixC axiom) on matrices \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {B}}\). Any such transformation bears two implications for multidimensional inequality.

First, a merge transformation reduces dissimilarity between the distribution of each dimension separately and the benchmark distribution, implying a reduction of inequality in each dimension. Second, a merge transformation reduces the dissimilarity across dimensions, implying an increase in correlation between dimensions. This aspect is controversial, since the Lorenz zonotopes inclusion criterion may fail to rank distribution matrices that are instead unanimously ordered by social welfare functions satisfying aversion to correlation increasing transfers (Epstein and Tanny 1980; Atkinson and Bourguignon 1982; Decancq 2012), a desirable property in multidimensional inequality analysis stating that any transformation that rises the degree of association in realizations is bound to decrease social welfare Andreoli and Zoli (2020). The Extended Lorenz zonotope inclusion criterion proposed in Mosler (2012) addresses these concerns.

Although the Lorenz zonotope inclusion criterion may be problematic for multidimensional welfare analysis, it is still a relevant criterion for assessing dissimilarity between distributions. The criterion \(LZ(\mathbf {B})\subseteq LZ(\mathbf {A})\) can be weakened by looking at inclusions of the projections of the Lorenz Zonotope in the space of outcomes, that is \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). The latter is useful to analyze inequalities that arise from differences between distributions, regardless of the degree of inequality of each of these distributions. This feature is relevant, for instance, for constructing robust inequality of opportunity comparisons (see, for instance, Roemer and Trannoy 2016; Andreoli et al. 2019).Footnote 16

Corollary 2 provides a characterization result that extends robust inequality assessments based on uniform majorization to matrices that possibly differ in size (\(n_A\ne n_B\)) but with the same number d of dimensions.

4.3 Income inequality

Corollary 2 also holds in the case \(d=1\). This case is of particular interest for social welfare analysis, as it rationalizes empirical comparisons of income inequality. In this section, we argue that every income inequality comparison involves a dissimilarity comparison, but not the reverse.

Empirical comparisons of income inequality consist in assessing the way total income in a sample of n income recipient units (such as households or individuals) is split across these units. We can hence represent a distribution of income shares by the n-variate vectors \(\mathbf {a}', \mathbf {b}'\in \mathcal {M}_1\), with \(\mathbf {a}'\cdot \mathbf {1}_{n}=\mathbf {b}'\cdot \mathbf {1} _{n}=1\). Each entry of the vectors corresponds to an income share allocated to a given unit. Anonymity is often invoked by the literature addressing income inequality measurement, thereby implying that any permutation of the units does not affect the extent of inequality displayed by \(\mathbf {a}'\) of \(\mathbf {b}'\).

As per condition (6) in Corollary 2, every empirical income inequality comparison involves a dissimilarity comparison between the distribution of income shares owned by each of the n units and the units’ weights. Furthermore, the ranking of distribution matrices induced by the LZ inclusion is consistent with Lorenz curves partial order. The chain of implications in Remark 8 hence runs in both directions when \(d=1\): Lorenz zonotopes inclusion implies unanimity for all social welfare functions that are increasing and concave in income which is equivalent to uniform majorization. All those conditions imply that income inequality analysis always subsumes a dissimilarity comparison.

A similar result holds even when the benchmark distribution of population weights is not uniform. Ebert and Moyes (2003) analyze the relation between welfare evaluations, Lorenz dominance and equivalence scales for incomes when population weights may differ among units and across distribution matrices. In this case, the interest is in ranking matrices such as \(\tilde{\mathbf {A}}:=(\varvec{\omega },\mathbf {a})'\), where \(\varvec{\omega }=(\omega _1,\ldots ,\omega _n)'\) and \(\omega _j\) can be understood as individual j’s weight. Using Corollary 2, every welfare-consistent measure of inequality can be written as an average of convex transformations of equivalized incomes, scaled by their demographic weights. This is formalized by the inequality index \(D_{h}(\tilde{\mathbf {A}}) =\sum _{j=1}^n \omega _jh(a_{j}/\omega _j)\) with \(h\in \mathcal {H}\) convex and \(a_{j}/\omega _j\) is j’s equivalent income.Footnote 17 From Proposition 3 (in combination with Remark 7), the zonotope inclusion criterion \(Z((\varvec{\omega },\mathbf {b})')\subseteq Z((\varvec{\omega },\mathbf {a})')\) provides a sufficient test for welfare dominance.

A well known result in inequality measurement is that an income distribution \(\mathbf {b}'\) displays less inequality than another distribution \(\mathbf {a}'\) if it can be obtained from the latter through a finite sequence of progressive (Pigou-Dalton, PD) transfers of income from rich donors to poor recipients, without switching their relative positions in the income ranking (Hardy et al. 1934; Marshall et al. 2011).Footnote 18 In the univariate case (\(d=1\)), Corollary 2 implies that every sequence of PD transfers of incomes can be rationalized by a specific sequence of more fundamental dissimilarity preserving and reducing operations that are concerned with the way income shares and weights are shifted across units:

Corollary 3

Every PD transfer operation can be decomposed into a sequence of split of classes and merge of classes operations.

Split and merge operations can hence be seen as inequality reducing transformations that are more elementary than PD transfers. The proof of Corollary 3 rests on the fact that any T-transform, an equivalent matrix representation of a PD transfer (see p. 33 in Marshall et al. 2011), can be exactly decomposed into the product of matrices representing split and merge operations. It follows that any univariate inequality comparison based on uniform majorization can be seen as a dissimilarity comparison but not the reverse, insofar the dissimilarity preserving and reducing operations of respectively split and merge characterize matrix majorization of which uniform majorization is a particular case.

The interesting and new result provided by Corollary 2 is that there always exists a sequence of split and merge operations that supports uniform majorization even in the multidimensional case (\(d\ge 2\)), although the same sequence cannot be generally rearranged to represent PD transfers Kolm (1977).

5 Concluding remarks

A large and sparse literature on segregation and inequality measurement has proposed criteria for ranking multi-group distributions according to the dissimilarity they exhibit. This paper establishes the axiomatic foundations of the dissimilarity criterion. We do so by developing a parsimonious axiomatic model which is based on dissimilarity preserving operations and a dissimilarity reducing operation, the merge, which consists in aggregating, distribution by distribution, the proportion of people observed in two separate classes. We study the partial order of distribution matrices originated by the transitive closure of all binary dissimilarity relations consistent with the operations and with a mixing axiom. This last axiom is crucial to justify an equivalent characterization of the “displays at most as much dissimilarity as” partial order. Our main theorem identifies a novel non-parametric criterion, based on inclusion of the zonotope set representations of the data, which is equivalent to the dissimilarity partial order thus identified.

The zonotope inclusion criterion is relevant in many contexts. One application could be in evaluating the impact of certain educational policies for the patterns of dissimilarity between multi-ethnic distributions of students across schools in a district. This problem is commonly referred to as schooling segregation. We can use zonotopes to conclude about robust changes in segregation when comparing the actual situation and a counterfactual distribution that would have emerged in the absence of the policy. While the education policy itself may have little to do with splitting, merging or mixing schools, the zonotope inclusion signals that one can always move from the counterfactual to the actual allocation of students through operations that are unanimously understood as segregation-reducing.

In some cases, zonotopes inclusion is rejected by the data. The dissimilarity indices analyzed in the paper allow to produce conclusive evaluations about the changes in dissimilarity. The implied ranking is always consistent with the implications of the “elementary” transformations. Evaluations based on one or few dissimilarity indicators, however, are not robust and can always be challenged on the perspective offered by alternative measures. The complete characterization of the dissimilarity indicators presented here is left for future research.