Abstract
The analysis of many phenomena requires partitioning societies into groups and studying the extent at which these groups are distributed with different intensities across relevant nonordered categorical outcomes. When the groups are similarly distributed, their members have equal chances to achieve any of the attainable outcomes. Otherwise, a form of dissimilarity between groups distributions prevails. We characterize axiomatically the dissimilarity partial order of multigroup distributions defined over categorical outcomes. The main result provides an equivalent representation of this partial order by the ranking of multigroup distributions originating from the inclusion of their zonotope representations. The zonotope inclusion criterion refines (that is, is implied by) majorization conditions that are largely adopted in mainstream approaches to multigroup segregation or univariate and multivariate inequality analysis.
Introduction
The analysis of many phenomena requires partitioning societies into groups and studying the extent at which these groups are distributed with different intensities across relevant nonordered categorical outcomes. For instance, residential segregation occurs in situations in which ethnic groups sort with different intensity across neighborhoods of a city. Likewise, school or occupational segregation is concerned with the uneven distribution of ethnic groups across schools or jobs.
In other cases, the interest is on the way shares of one or many attributes (such as income, wealth or consumption of different goods) are assigned across population units (such as countries, households or individuals) and the extent at which these distributions differ from the distribution of a normatively relevant benchmark, such as the demographic weights of the units. In this case, the focus is on uni or multidimensional inequality and the often invoked anonymity principle would regard the way units are ordered as irrelevant.
All these examples are concerned with the extent of dissimilarity between two or more distributions defined over classes of nonordered realizations. There is widespread agreement in the literature about what constitutes lack of segregation or equality: These are situations in which the groups are similarly distributed across the classes of realizations. The relevant notion of similarity we refer to dates back to Gini (1914), who argues that two (or more) groups are similarly distributed whenever “the populations of the two groups take the same values with the same frequency.”^{Footnote 1} Although a wellestablished methodology exists for analyzing dissimilarity between two distributions, the literature disagrees about the way such comparisons can be extended to the multigroup setting.
This paper develops the axiomatic foundations for the measurement of multigroup dissimilarity and provides equivalent testable conditions. We embrace a convenient (and equivalent) way of representing empirical discrete distributions through matrix notation. Each distribution matrix displays (by row) the distributions of individuals belonging to one of many (at least two) groups across classes of realizations (by column). The example below refers to distributions of three groups across a variable number of classes:
Entries of these matrices can be interpreted as frequencies so that, for instance, the share of group 2 in class 3 in \(\mathbf {A}\) is 80%. In the context of school segregation analysis, each of the two matrices above portrays the way students from each of three distinct ethnic groups are distributed across schools in a schooling district. Matrix \(\mathbf {A}\) depicts a district with four schools whereas matrix \(\mathbf {B}\) depicts another district with three schools.
Many existing criteria can be employed to compare distribution matrices \(\mathbf {A}\) and \(\mathbf {B}\) by the extent of dissimilarity displayed by their rows, such as dissimilarity indices or majorization conditions. The ranking produced by one or few indices, however, can be challenged by the use of alternative, yet plausible, measures whereas majorization conditions are robust to this criticism but could be empirically untractable. In this paper, we consider all dissimilarity orderings that are consistent with some normatively relevant axioms and we focus on the intersection of all such orderings as a robust criterion for dissimilarity analysis. It is well known (see Donaldson and Weymark 1998) that such criterion leads to a partial order of distribution matrices which induces unanimity in the way matrices are ordered by all underlying dissimilarity orderings satisfying the desirable axioms. The axioms that we consider characterize the ordering \(\mathbf {B}\) “displays at most as much dissimilarity as” \(\mathbf {A}\) by the possibility of obtaining \(\mathbf {B}\) from \(\mathbf {A}\) thorough sequences of elementary operations that either preserve dissimilarity between the matrices’ rows (such as permuting the labels of groups and classes, adding and deleting classes which are empty, splitting proportionally classes) or reduce it (such as merging groups frequencies across two classes of the same distribution matrix) and by some consistency properties of the dissimilarity orderings (with respect to the possibility of producing convex mixtures of classes).
The axioms that we study allow to compare only matrices with the same number of groups but extend dissimilarity comparisons to matrices with a different number of classes. Such extension is relevant for empirical applications. Moreover, the possibility of considering comparisons between distribution matrices with a different number of classes will make explicit the normative content of this approach by highlighting the combination of operations that make possible to rank distribution matrices with the same number of classes.
The main result, in Theorem 1, establishes that the partial order of distribution matrices consistent with our axiomatic model is equivalent to the partial order of distribution matrices induced by the inclusion of their zonotope representations. A zonotope is a convex geometric set representation of the data, defined in the space of groups frequencies, which is originated by the Minkowski sum (i.e., element by element sum of fractions) of all column vectors of a distribution matrix.^{Footnote 2} In inequality analysis, zonotopes have been used to derive interesting multivariate extensions of the Lorenz curve (Koshevoy and Mosler 1996; Mosler 2012). Theorem 1 shows that the zonotope inclusion criterion is also relevant for dissimilarity analysis, for at least three reasons.
First, we demonstrate that the zonotope inclusion criterion is consistent with the implications of operations that unambiguously preserve or reduce dissimilarity. Each of these operations is found to have clear and intuitive consequences on the shape of the zonotope and hence provides a normative justification for using the zonotope inclusion criterion. Second, the zonotope inclusion criterion is a refinement of relevant majorization conditions and it is hence implied by them. We demonstrate that uniform and matrix majorization criteria, which are widely adopted criteria in robust multivariate distributional analysis Marshall et al. (2011), are related to the dissimilarity measurement model developed here.^{Footnote 3} In the multigroup setting, some matrices that cannot be ranked by matrix majorization can still be robustly ranked by zonotope inclusion, whereas the two criteria coincide only in the twogroups setting. As a counterexample, we use matrices in (1) to show in Appendix A.15 that the zonotope of matrix \(\mathbf {B}\) is included in the zonotope of matrix \(\mathbf {A}\), whereas \(\mathbf {B}\) is not majorized by \(\mathbf {A}\). Third, the zonotope inclusion criterion can be empirically tested, whereas this is seldom the case for majorization conditions.
Theorem 1 and the corollaries implied by it contribute to the literature along the following lines. First, the results identify the differences between the zonotope inclusion criterion and majorization conditions. In particular, \(\mathbf {B}\) being matrix majorized by \(\mathbf {A}\) postulates the existence of a sequence of dissimilarity reducing or preserving transformations mapping the classes of \(\mathbf {A}\) into those of \(\mathbf {B}\). The zonotope inclusion condition focuses instead separately on each class of \(\mathbf {B}\) and requires that any such class could be obtained through dissimilarity reducing or preserving transformations of the classes of \(\mathbf {A}\). There is no guarantee that such operations can be organized into a sequence, making the zonotope inclusion criterion a refinement of matrix majorization (as observed in Dahl 1999).
Second, Theorem 1 provides an axiomatic justification for using zonotope inclusion as a multigroup criterion that is more robust than (i.e., implies) alternative refinements of matrix majorization based on sequential comparisons of twogroups distributions or on specific dissimilarity indices. Although any twogroups projection of a multigroup zonotope originates a zonotope itself, assessing inclusion among all twogroups projections is not sufficient to conclude about inclusion of the multigroup zonotopes. This is because the zonotope inclusion criterion fails to satisfy a consistency property of partial orders, requiring that if two distribution matrices differ only in terms of two groups distributions (i.e., two rows), then the two matrix could be ranked by focusing on dissimilarity in the submatrices associated with these groups Moulin (2016). Failing this property is desirable in a multigroup context, because the dissimilarity ranking of matrices that differ by two or few groups distributions should not only depend on dissimilarity between these distributions, but also on the extent at which these distributions are dissimilar from the rest. This feature makes the dissimilarity criterion robust against potential aggregation biases (originating, for instance, the Simpson’s paradox, see Blyth 1972).
Third, Theorem 1 rationalizes the normative underpinnings of a variety of sparse and apparently unrelated results on the measurement of (multigroup) segregation and multivariate and univariate inequality, which are shown to be embedded within the dissimilarity model.
The paper is organized as follows. Relevant notation and a definition of the zonotope inclusion ordering is provided in Section 2. Axioms and the main result are in Section 3. Section 4 describes the usefulness of our results for related orders. Section 5 concludes. All proofs are collected in a dedicated appendix.
Using zonotopes to test dissimilarity
Notation
A distribution matrix of size \(d\times n\) depicts a set of distributions (indexed by rows) of \(d\ge 1\) groups across \(n\ge 2\) disjoint nonordered classes (indexed by columns), representing categories of realizations. We develop dissimilarity comparisons of distribution matrices with a fixed number d of groups but variable number of classes. These matrices are collected in the set
where \(a_{ij}\) is interpreted as the proportion of group i observed in class j and the column vector \(\mathbf {a}_{j}\) collects the proportions of all groups attaining realization j. Matrices \(\mathbf {A},\mathbf {B}\in \mathcal {M}_3\) in (1) offer two examples. The distribution matrices in \(\mathcal {M}_{d}\) are row stochastic, meaning that matrix \(\mathbf {A}\in \mathcal {M}_{d}\) represents a collection of d elements of the unit simplex \(\Delta ^{n_{A}}\). We let \(\overline{a}_{j}:=\sum _{i=1}^d{a_{ij}}\) denote the "size" of class j, obtained by weighting uniformly the groups occupying it. For instance, the size of class 1 of matrix \(\mathbf {A}\) in (1) is \(\overline{a}_1=\frac{19}{28}\).
We follow the convention of using boldface letters to indicate column vectors, so that \(\mathbf {i}_j\) is a column vector corresponding to column j of an identity matrix \(\mathbf {I}_n\) of size \(n\times n\), wherease \(\mathbf {1}_{n}=(1,\ldots ,1)'\) and \(\mathbf {0}_{n}:=(0,\ldots ,0)'\) are the column vector with all n entries respectively equal to 1 or 0. The superscript always denotes transposition.
Every distribution matrix lies inbetween two extreme cases. The first case is that of perfect similarity, occurring when the distributions of the groups coincide and can be represented by the same row vector \(\mathbf {s}'\in \Delta ^{n}\). This situation is depicted by matrix \(\mathbf {S}\), whose d rows are all equal to \(\mathbf {s}'\). A maximal dissimilarity matrix \(\mathbf {D}\) is a disjointrowsupport matrix where each class is occupied at most by one group, but one group may occupy different classes.^{Footnote 4}
If a distribution matrix displays the structure of \(\mathbf {S}\), then it is not possible to forecast the group belonging from knowledge of the realization. Conversely, if a distribution matrix is as \(\mathbf {D}\), then it is always possible to forecast the group from the knowledge of the class of the realizations. Any distribution matrix displays a structure which lies inbetween that of \(\mathbf {S}\) and \(\mathbf {D}\).
We also consider transformation matrices that, when multiplied by a distribution matrix, produce effects on the extent of dissimilarity between the rows of such matrix. Denote by \(\mathcal {R}_{n,m}\) the set of \(n\times m\) row stochastic matrices whose rows lie in \(\Delta ^{m}\).^{Footnote 5} Moreover, denote \(\mathcal {R}_{n}\) whenever \(m=n\), while \(\mathcal {D}_{n}\subseteq \mathcal {R}_{n}\) is the set of doubly stochastic matrices whose rows and columns lie in \(\Delta ^{n}\). The set collecting all \(n\times n\) permutation matrices is denoted by \(\mathcal {P}_n\).
The zonotope set
Geometric representations of the data are useful to derive empirical tests for ranking uni and multivariate distributions. Lorenz curves, for instance, are the workhorse of robust income inequality analysis. A Lorenz curve is obtained by arranging income observations in increasing order and then plotting the cumulative sum of these incomes shares against the cumulative sample frequencies. If the Lorenz curve of one income distribution lies above the Lorenz curve of another income distribution, then one can robustly rank the former distribution as less unequal than the latter.
Lorenz curves do not provide sufficient structure for ranking multivariate distributions, insofar each Lorenz curve allows to compare only one distribution at a time with a normatively relevant one, usually the distribution of population shares. Lorenz curves may be useful to compare distributions of two groups across relevant units, such as in school segregation analysis. In this case, Lorenz curves (known as segregation curves) portray the degree of dissimilarity between the distribution of a group’s members across schools and the distribution of another group across the same schools. Multigroup extensions are however problematic even in this domain.
In this section, we consider using the zonotope representation of the data to implement multigroup comparisons and we investigate the robust and testable ordering of distribution matrices generated by the zonotope inclusion criterion.
The zonotope set \(Z(\mathbf {A})\subseteq [0,1]^d\) of a matrix \(\mathbf {A}\in \mathcal {M}_d\) is a convex polytope lying on the hypercube \([0,1]^d\) that is symmetric with respect to the point \(\frac{1}{2} \mathbf {1}_d\) (see McMullen 1971). It is defined as follows:
Elements in \(Z(\mathbf {A})\) are identified by the Minkowski sum of the vectors with coordinates given by \(\mathbf {A}\)’s classes. In Fig. 1a we represent the 2dimensions zonotope of the distribution matrix \(\mathbf {E}\in \mathcal {M}_2\), defined as follows:
The dimensionality of the example \(\mathbf {E}\) helps visualizing the way \(Z(\mathbf {E})\) is constructed. First, vectors with coordinates corresponding to classes of \(\mathbf {E}\) are plotted in the unit square and connected to the origin with line segments. In the figure, these vectors are marked with different symbols. For instance, the black square represents the fourth class of \(\mathbf {E}\). Then, the resulting segments are tied together in any possible arrangement. In the figure, adding together the vector corresponding to classes one and three gives the vector with coordinates (0.7, 0.1), while adding this vector to the one representing the fourth class gives (0.9, 0.6). The resulting zonotope of \(\mathbf {E}\) is the grey area in the figure that contains all possible arrangements of these segments, or portions of them. Panel b) of Fig. 1 represents instead \(Z(\mathbf {A})\) taken from (1), which is defined on the threedimensional space. We report with solid lines only the visible edges of \(Z(\mathbf {A})\). The relevant facets of \(Z(\mathbf {A})\) originated by the sequential sum of \(\mathbf {A}\)’s classes, are coloured in light gray.
The example above highlights two situations of interest. The maximum dissimilarity zonotope is the ddimensional hypercube and corresponds to \(Z(\mathbf {D})\). Its diagonal is the similarity zonotope, which corresponds to \(Z(\mathbf {S})\). All distribution matrices displaying some dissimilarity originate zonotopes that lie in the maximum dissimilarity zonotope and that include the similarity zonotope. The shape of \(Z(\mathbf {D})\) and of \(Z(\mathbf {S})\) does not depend on the data in matrices \(\mathbf {S}\) and \(\mathbf {D}\), thus highlighting the irrelevance of withingroup heterogeneity for dissimilarity evaluations. More broadly, for each matrix in \(\mathcal {M}_d\) there exists only one zonotope representation, although the same zonotope may correspond to many distribution matrices.
The zonotope inclusion criterion
Zonotopes can be used to compare matrices by the extent of dissimilarity they exhibit. In this paper, we study the ranking of distribution matrices such as \(\mathbf {A}\) and \(\mathbf {B}\) that is generated by the inclusion of the zonotope representations of the two matrices, that is \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). Inclusion can be easily checked when \(d=2\) from inspection of the zonotopes graphs. Figure 2 shows an example where \(Z(\tilde{\mathbf {E}})\subseteq Z(\mathbf {E})\), for a distribution matrix \(\tilde{\mathbf {E}}\) obtained after performing an elementtoelement summation of classes 2 and 3 in \(\mathbf {E}\), thereby leading to the central class of \(\tilde{\mathbf {E}}\) that contains \(40\%\) of the population of both groups. This operation is obviously leveling disparities in the distributions of groups 1 and 2, although similarity is not achieved. The inclusion criterion is more difficult to visualize when \(d=3\). For instance, panel b) of Fig. 1 reports relevant facets of the zonotope \(Z(\mathbf {A})\) obtained at fixed proportions of group 1 distribution (in light gray). The figure also shows the corresponding facets of the zonotope \(Z(\mathbf {B})\) (in dark gray). In this specific example, it is sufficient to check inclusion of the facets of \(Z(\mathbf {B})\) into the corresponding facets of \(Z(\mathbf {A})\) to conclude about \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\).^{Footnote 6} In more general situations when \(d>3\), visualization of zonotope inclusion is not possible and the inclusion criterion should be tested algorithmically.
For general distribution matrices A and B, the zonotope inclusion criterion \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) has an intuitive interpretation in terms of disproportionality of groups frequencies, which may lead to conclude about the dissimilarity displayed by \(\mathbf {B}\) and \(\mathbf {A}\). To see this, define an isopopulation line (when \(d=2\)) or (hyper)plane (when \(d\ge 3\)) as the set of all combinations of proportions of the groups collected in vectors \(\mathbf {z}\in Z(\mathbf {A})\) that add up to \(p\in [0,1]\), such that \(\frac{1}{d}\mathbf {1}_{d}'\cdot \mathbf {z}=p\). In other words, p is the average “size” of \(\mathbf {z}\), obtained by weighting equally all groups. Figure 2 depicts an example, based on distribution matrices \(\mathbf{E}\) and, \(\tilde{\mathbf{E}}\) in which the dashed line segments \(p'\), \(p''\) and \(p'''\) correspond to isopopulation lines. In general, \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) is verified if the set of all proportions of the groups adding up to p in \(\mathbf {B}\) is included in (i.e., is less dispersed than) the corresponding set of all proportions of the groups adding up to p in \(\mathbf {A}\). The criterion is robust, given that the inclusion should be verified for all p’s, thus implying that any disproportional allocation of groups attainable by merging classes of \(\mathbf {B}\) can also be obtained by merging the classes of \(\mathbf {A}\), but not the reverse. Proportionality is always attained in \(Z(\mathbf {S})\), in which case there is only one attainable allocations \(\mathbf {z}\in Z(\mathbf {S})\) lying on any isopopulation line p, that is \(\mathbf {z}=\mathbf {1}_dp\). Conversely, disproportionality is maximal in \(Z(\mathbf {D})\), in which case every attainable allocation lying on the isopopulation line p can be obtained from the original data.
The zonotope inclusion criterion always ranks \(Z(\mathbf {S})\subseteq Z(\mathbf {A}) \subseteq Z(\mathbf {D})\) for any \(\mathbf {A}\in \mathcal {M}_d\) and for any d. The inclusion criterion, however, entails only a partial order of distribution matrices: two matrices cannot be ordered if their respective zonotope representations intersect. In the following section, we characterize the normative content of the inclusion criterion in terms of unanimous agreement in ranking \(\mathbf {B}\) as displaying less dissimilarity than \(\mathbf {A}\) by all orderings consistent with some basic dissimilarity axioms.
Characterizations
The dissimilarity partial order
We investigate the possibility of ordering distribution matrices according to the dissimilarity they display. A dissimilarity ordering is a complete and transitive binary relation \(\preccurlyeq\) on the set \(\mathcal {M}_d\) with symmetric part \(\thicksim\), that ranks \(\mathbf {B}\preccurlyeq \mathbf {A}\) whenever \(\mathbf {B}\) is at most as dissimilar as \(\mathbf {A}\).^{Footnote 7} Given \(\mathbf {A}\in \mathcal {M}_d\), any dissimilarity ordering should always rank \(\mathbf {S}\preccurlyeq \mathbf {A}\preccurlyeq \mathbf {D}\) for any matrix that can be represented as \(\mathbf {S}\) and \(\mathbf {D}\). These matrices are respectively regarded to as equivalent representations of perfect similarity or of maximal dissimilarity, the focus being on differences across group distributions and not on the degree of heterogeneity in the distribution of each group across realizations.^{Footnote 8}
One direct implication of this feature of the dissimilarity orderings is that distribution matrices that differ in their number of classes (\(n_A\ne n_B\)) can be regarded as indifferent from the dissimilarity orderings perspective, as long as their group distributions differ across matrices, but the dissimilarity that they display within each matrix coincides across matrices.^{Footnote 9} For this reason, we focus on matrices in \(\mathcal {M}_d\), which must have the same number of rows but may differ in the number of classes.
Each dissimilarity ordering induces a complete ranking of distribution matrices. In this paper, we are interested in the robust ranking of distribution matrices generated by the intersection of the dissimilarity orders satisfying some desirable properties. This is a partial order Donaldson and Weymark (1998), that we characterize axiomatically and for which we provide equivalent representations.
Axioms and preliminary results
Axioms are based on elementary operations that, when applied to distribution matrices, can reduce or preserve dissimilarity among groups. We focus on the dissimilarity orderings that rank distribution matrices consistently with the effects of these operations. In order to ease the understanding of the axioms, we contextualize the consequences of such operations in terms of dissimilarity in the distribution of groups of students with different ethnic background across the schools in a schooling district. This is commonly referred to as a problem of schooling segregation.
The first axiom defines the context, introducing an anonymity property with respect to the labels (and hence the arrangement) of the classes of a distribution matrix.
Axiom 1
IPC (Independence from Permutations of Classes) For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) with \(n_{A}=n_{B}=n\), if \(\mathbf {B}=\mathbf {A}\cdot \varvec{\Pi }_{n}\) for a permutation matrix \(\varvec{\Pi }_{n}\in \mathcal {P}_{n}\) then \(\mathbf {B}\thicksim \mathbf {A }\).
Axiom IPC restricts the focus to evaluations where the classes of a matrix can be freely permuted without affecting the extent of dissimilarity it displays. In the context of schooling segregation, the axiom posits that the names of the schools are irrelevant to conclude about the dissimilarity in the distributions of students across these schools. This is arguably the case if the schools cannot or should not be ordered according to their performances, their quality or their budget. Another implication of axiom IPC is that any distribution matrix that is obtained by permuting the columns of matrix \(\mathbf {D}\) has to be regarded to as an equivalent representation of maximal dissimilarity.
Next, we consider two transformations that extend comparability to distribution matrices that differ in the number of classes. The first transformation has to do with the insertion or elimination of empty classes, i.e., classes that are not occupied by groups. The operation consists in adding/eliminating column vectors of size d with only zero entries to/from the original distribution matrix. In the schooling segregation example, the operation corresponds to adding/eliminating schools with no students to/from the same school district. The presence of these “empty” schools in the district is irrelevant for assessing dissimilarity of groups distributions across the remaining schools of the district.
Axiom 2
IEC (Independence from Empty Classes) For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\), if \(\mathbf {B}=\left( \mathbf {A}, \mathbf {0}_{d}\right)\) then \(\mathbf {B}\thicksim \mathbf {A}\).
The IEC axiom emphasizes dissimilarity originated from nonempty columns of a distribution matrix. If \(\mathbf {A}\) and \(\mathbf {B}\) differ only because of \(n_An_B\) empty classes in one of the two matrices, then the dissimilarity in \(\mathbf {A}\) should be regarded to as equivalent to that in \(\mathbf {B}\). When combined with IPC, the axiom IEC allows to regard as indifferent all matrices obtained by adding or delating an empty class in any position.
The second transformation considered increases the number of classes by splitting proportionally (the groups frequencies in) a class into two new classes. This transformation requires to replicate one column of a distribution matrix and then to scale the entries of the original and of the replicated columns by the splitting coefficients \(\beta \in (0,1)\) and \(1\beta\), respectively. This operation guarantees that the resulting distribution matrix is row stochastic and that the degree of proportionality of the groups frequencies in the new columns coincides with that in the original column. In the schooling segregation example, splitting a school would require to randomly allocate its students population (i.e., irrespectively of their group assignment) into two smaller institutes, so that ethnic proportions in the two new institutes are not altered. Frankel and Volij (2011) advocate a similar property (called composition invariance) in the study of multigroup school segregation (see also James and Taeuber 1985). The Independence from Split of Classes (ISC) axiom posits that the transformation described above is a source of indifference for every dissimilarity ordering.
Axiom 3
ISC (Independence from Split of Classes) For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\) with \(n_{B}=n_{A}+1\), if \(\exists \,j\) such that \(\mathbf {b}_{j}=\beta \mathbf {a}_{j}\) and \(\mathbf {b} _{j+1}=(1\beta )\mathbf {a}_{j}\) with \(\beta \in (0,1)\), while \(\mathbf {b} _{k}=\mathbf {a}_{k}\) \(\forall k<j\) and \(\mathbf {b}_{k+1}=\mathbf {a}_{k}\) \(\forall k>j\), then \(\mathbf {B}\thicksim \mathbf {A}\).
A split transformation increases the number of classes and modifies the shape of a distribution matrix, but it does not alter the proportionality of the groups. For this reason, it is regarded to as dissimilarity preserving.
The merge of classes transformation complements the split operation. A merge of classes is implemented by vector summation of two adjacent columns of a distribution matrix, irrespectively of the groups composition of each column. The operation has an immediate interpretation in the schooling segregation example: it consists in merging all students from two neighboring schools into a single, larger school. Each ethnic group in the school of destination is increased by an amount equal to the proportion of the corresponding group in the school of departure, which is then emptied. If one or both schools are empty, segregation does not increase nor decreases. Consider, instead, the case of two ethnic groups that are similarly distributed across almost all schools in a district, apart from two schools, such that a group is overrepresented compared to the other in one school, and underrepresented in the other school. Merging each of these two schools with other schools in the district would reduce the compositional differences, without eliminating them. A merge of these two schools would, instead, establish proportionality in ethnic composition across all schools, leading to perfect similarity. The Dissimilarity Decreasing Merge of Classes (MC) axiom states that every merge of classes transformation cannot increase dissimilarity.
Axiom 4
MC (Dissimilarity Decreasing Merge of Classes) For any \(\mathbf {A},\,\mathbf {B}\in \mathcal {M}_{d}\) with \(n_{A}=n_{B}\), if \(\mathbf {b}_{j}=\mathbf {0}_{d}\), \(\mathbf {b}_{j+1}=\mathbf {a}_{j}+\mathbf {a} _{j+1}\) while \(\mathbf {b}_{k}=\mathbf {a}_{k}\ \forall k\ne j,j+1\), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).
Consider obtaining \(\mathbf {B}\) from \(\mathbf {A}\) with a merge transformation adding together, distribution by distribution, the group proportions observed in two classes \(\mathbf {a}_j\) and \(\mathbf {a}_{j+1}\) whenever \(\mathbf {a}_{j+1}=\beta \mathbf {a}_{j}\), \(\beta >0\), such that \(\mathbf {b}_j=(1+\beta )\mathbf {a}_j\) and \(\mathbf {b} _{k}=\mathbf {a}_{k}\) \(\forall k<j\) while \(\mathbf {b}_{k}=\mathbf {a}_{k+1}\) \(\forall k>j\). This operation leaves dissimilarity unaffected. The operation is opposite to a split, but supports the same indifference class, gathering all matrices \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\) such that \(\mathbf {B}\thicksim \mathbf {A}\) for all orderings consistent with axiom ISC, even if not consistent with MC.
Axioms MC, IEC, ISC and IPC are independent. The transitive closure of all dissimilarity orderings satisfying these axioms defines a partial order of distribution matrices (see Donaldson and Weymark 1998), which is represented by the matrix majorization criterion. We refer to this partial order as \(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\), indicating that matrix \(\mathbf {B}\) is matrix majorized by \(\mathbf {A}\) whenever there exists \(\mathbf {X}\in \mathcal {R}_{n_A,n_B}\) such that \(\mathbf {B}=\mathbf {A}\mathbf {X}\) (Marshall et al. 2011; Dahl 1999).
Proposition 1
For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) the following statements are equivalent:

(i)
\(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MC, ISC, IEC and IPC.

(ii)
\(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\).
The notion of matrix majorization has been investigated in a variety of contexts (see p. 625 in Marshall et al. (2011) and literature therein), the most relevant being that of comparisons of informativeness of statistical experiments with a finite number of outcomes.^{Footnote 10} The characterization in Proposition 1, which is alternative to Grant et al. (1998), Frankel and Volij (2011) and Lasso de la Vega and Volij (2014), shows that every informativeness comparison of matrices in \(\mathcal {M}_d\) verifies the existence of dissimilarity preserving and reducing transformations mapping the most informative distribution matrix into the least informative one.
While appealing, matrix majorization entails the possibility of ranking distribution matrices only if there exists a sequence of relevant transformations of the data that can be represented by a row stochastic matrix. This requirement is very stringent in some cases. Consider for instance matrices \(\mathbf {A}\) and \(\mathbf {B}\) in (1). It is shown in Appendix A.15 that every column of \(\mathbf {B}\) can be obtained by splitting and merging the columns of \(\mathbf {A}\). Yet, these operations cannot be arranged to form a sequence in this specific example. As a consequence, these operations cannot be represented in the form of a row stochastic matrix, thereby yielding that \(\mathbf {A}\) does not matrix majorize \(\mathbf {B}\). In fact, as shown in the appendix, there is a unique admissible transformation matrix with nonnegative entries, denoted
yielding \(\mathbf {B}=\mathbf {A}\mathbf {X}\). The transformation matrix \(\mathbf {X}\) is, clearly, not row stochastic (for a related example, see Koshevoy 1995).
We address the concerns raised by the example above by introducing a new class of dissimilarity axioms. These axioms relax the possibility of ranking distribution matrices exclusively by mean of (sequences of) merge transformations, invoking instead a form of consistency of the dissimilarity orderings with respect to convex combinations of (columns of) distribution matrices. The first axiom, denoted StrongMixC, states that if there are matrices \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) that are ranked as not more dissimilar than \(\mathbf {A}\), then the convex mix of such matrices yields a matrix \(\mathbf {B}\) that cannot display more dissimilarity than \(\mathbf {A}\). Before stating the axiom, recall that any element of the convex hull (denoted conv) of matrices \(\mathbf {B}^j=(\mathbf {b}^j_1,\ldots ,\mathbf {b}^j_{n})\in \mathcal {M}_d\) is a matrix \(\mathbf {B}=(\mathbf {b}_1,\ldots ,\mathbf {b}_{n})\in \mathcal {M}_d\) such that \(\mathbf {b}_k=\sum _{j=1}^{m}{w_j\mathbf {b}_k^j}\) \(\forall k\), for any set of weights \(w_j\in [0,1]\) satisfying \(\sum _{j=1}^{m}{w_j}=1\). The axiom name follows from the fact that every mix of matrices can be interpreted as a specific mix of classes that assigns uniform weights to the classes of the same matrix.
Axiom 5
StrongMixC (Dissimilarity Consistency with Uniform Mixing of Classes) Consider a \(d\times n\) matrix \(\mathbf {A}\in \mathcal {M}_{d}\) and a sequence \(j=1,\ldots ,m\), \(m\ge 2\) of \(d\times n\) matrices \(\mathbf {B}^j\in \mathcal {M}_d\) such that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\). If \(\mathbf {B} \in conv\{\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\}\), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).
The axiom StrongMixC postulates a “betweenness” property (see Dekel 1986) for dissimilarity orderings, meaning that all orderings satisfying it regard a new distribution \(\mathbf {B}\) whose classes are obtained as convex combinations of the respective classes of \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) as not more dissimilar than \(\mathbf {A}\). The fact that such convex combination assigns the same weight \(w_j\) to each class of a matrix \(\mathbf {B}^j\), with \(\sum _j{w_j}=1\), guarantees that \(\mathbf {B}\in \mathcal {M}_d\). Matrix \(\mathbf {B}\) may be regarded as more dissimilar than some of the matrices \(\mathbf {B}^j\), but it cannot display more dissimilarity than \(\mathbf {A}\) given that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\).
The normative appeal of axiom StrongMixC rests in its relation with operations of mixing of columns of different matrices. Such operations are regarded to as unambiguously dissimilarity not increasing, given their relation with merge of classes operations. We contextualize this point in terms of school segregation analysis. The axiom StrongMix implies that every school or merge of schools from schooling district \(\mathbf {B}\) (i.e., columns of the distribution matrix) can always be obtained through a convex combination of schools or merges of schools issued from the schooling districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\). Formally, let \(\mathcal {V}^{01}_n\) be the set of \(n\times 1\) vectors whose elements are either 0 or 1. Nonzero entries of vector \(\varvec{\gamma }\in \mathcal {V}^{01}_n\) identify classes (or one class) of a distribution matrix, so that \(\mathbf {B}\varvec{\gamma }\) yields a new school that is obtained by merging some of \(\mathbf {B}\)’s schools (or by keeping one of its schools). Any mixing operation underlying axiom StrongMixC always grants that:
where conv is the convex hull of such vectors.
Arguably, every merge of schools smooths the extent of ethnic disparities within a schooling district and can contribute to reduce segregation. Condition (4) implies that there are less opportunities to reduce segregation by merging some of the schools in schooling district \(\mathbf {B}\) compared to the extent of opportunities for reducing segregation that are available by merging schools in districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\). In fact, the former district/matrix can always be obtained as a combination of the latter matrices, whereas the reverse may not be true. As an example, consider the (rather extreme) case in which \(\mathbf {B}\) is the similarity matrix: in this case there are no opportunities at all to reduce segregation by merging \(\mathbf {B}\)’s classes, being segregation already at its minimum.
Condition (4) is always granted by axiom StrongMixC, which imposes additional structure.^{Footnote 11} We consider a new axiom, denoted MixC, that regards every matrix \(\mathbf {B}\in \mathcal {M}_d\) that satisfies condition (4) as displaying not more dissimilarity than \(\mathbf {A}\).
Axiom 6
MixC (Dissimilarity Consistency with Mixing of Classes) Consider a \(d\times n\) matrix \(\mathbf {A}\in \mathcal {M}_{d}\) and a sequence \(j=1,\ldots ,m\), \(m\ge 2\) of \(d\times n\) matrices \(\mathbf {B}^j\in \mathcal {M}_d\) such that \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) \(\forall j\). If \(\mathbf {B}\in \mathcal {M}_d\) satisfies condition (4), then \(\mathbf {B}\preccurlyeq \mathbf {A}\).
The axiom MixC values the fact of having less opportunities for reducing segregation in school district \(\mathbf {B}\) compared to districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\), in the sense that if the latter districts display less school segregation than \(\mathbf {A}\), then \(\mathbf {B}\) should also display less segregation than \(\mathbf {A}\). Axiom MixC extends the orderings \(\mathbf {B}^j\preccurlyeq \mathbf {A}\) for every \(j=1,\ldots ,m\) to \(\mathbf {B}\preccurlyeq \mathbf {A}\). The axiom may allow to compare cases such as those in the example related to the transformation matrix in (3): even if there is no sequence of merge of classes operations mapping \(\mathbf {A}\) into \(\mathbf {B}\), it may be sufficient to conclude that every class of \(\mathbf {B}\) is obtained by merging classes of \(\mathbf {A}\) to verify condition (4) and thus rank \(\mathbf {B}\preccurlyeq \mathbf {A}\), as we show in Appendix A.15.
The axiom MixC does not explicitly mention the way \(\mathbf {B}\) is constructed. One way to obtain it (which we rely upon in the proofs) is by considering weights \(w_j^k\in [0,1]\), with \(\sum _{j=1}^{m}{w_j^k}=1\), that are specific to each class k, such that \(\mathbf {b}_k=\sum _{j=1}^{m}{w_j^k\mathbf {b}_k^j}\) \(\forall k\). These weights are more general than those considered by axiom StrongMixC. When configuration \(\mathbf {B}\) is obtained in such a way and (4) is satisfied, the axiom MixC posits that mixing students across school of districts \(\mathbf {B}^1,.\dots ,\mathbf {B}^{m}\) that are not more segregated than \(\mathbf {A}\) cannot generate a new schooling district \(\mathbf {B}\) that is more segregated than \(\mathbf {A}\).
The fact that axiom StrongMixC is always consistent with condition (4) proves the next statement.
Remark 1
If \(\preccurlyeq\) is consistent with MixC then it is consistent with StrongMixC.
As a consequence, the MixC axiom implies a ranking of distribution matrices that is less partial compared to that characterized by axiom StrongMixC, in the sense that all dissimilarity orderings consistent with StrongMixC are capable of ranking unanimously only a subset of all matrices that can be ordered by the orderings consistent with MixC.
We now investigate if the partial orders of distribution matrices supported by matrix majorization and by the zonotope inclusion criterion are consistent with these new axioms. The next Remark shows that matrix majorization \(\preccurlyeq ^R\) is consistent with StrongMixC.
Remark 2
For \(\mathbf {A},\mathbf {B}^j\in \mathcal {M}_d\), \(j=1,\ldots ,m\), let \(\mathbf {B}\in conv\{\mathbf {B}^1,\ldots ,\mathbf {B}^m\}\): \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) \(\forall j\) \(\Rightarrow\) \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\).
When axiom StrongMixC is paired with dissimilarity preserving axioms, it allows to characterize matrix majorization without resorting on axiom MC, which is hence implied by all the axioms considered.
Remark 3
If \(\preccurlyeq\) satisfies IPC, ISC, IEC and StrongMixC then it satisfies MC.
Axiom MC becomes redundant when characterizing dissimilarity partial orders if the StrongMixC axiom is combined with all the dissimilarity preserving axioms. The axiom StrongMixC yields a new characterization of \(\preccurlyeq ^R\) which is alternative to the one presented in Proposition 1.
Proposition 2
For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\), the following statements are equivalent:

(i)
\(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms StrongMixC, ISC, IEC and IPC.

(ii)
\(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\).
Proposition 2 highlights that, even if axiom MC is implied by StrongMixC, ISC, IEC and IPC (see Remark 3), these axioms still lead to the partial order of matrix majorization as in Proposition 1 and not to another partial order that is capable of ranking more matrices. Weakening the StrongMixC axiom towards MixC may help characterizing such partial order. In the twogroups case (\(d=2\)), Proposition 2 can be reformulated by weakening StrongMixC in favor of axiom MixC, since matrix majorization \(\preccurlyeq ^R\) is consistent with axiom MixC when \(d=2\).
Remark 4
For \(\mathbf {A},\mathbf {B}^j,\mathbf {B}\in \mathcal {M}_2\) such that \(\mathbf {B}\) and \(\mathbf {B}^j\), \(j=1,\ldots ,m\), satisfy condition (4): \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) \(\forall j\) \(\Rightarrow\) \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\).
In the multigroup context (\(d\ge 3\)), however, matrix majorization is not consistent with axiom MixC. A counterexample is given in Appendix A.15, where we use the matrices in (1) to identify matrices \(\mathbf {B}^j\in \mathcal {M}_d\), \(j=1,2,3\) such that \(\mathbf {B}^j\preccurlyeq ^R\mathbf {A}\) and we then show that \(\mathbf {B}\) satisfies condition (4) but cannot be obtained as in axiom StrongMixC and hence \(\mathbf {B}\not \preccurlyeq ^R\mathbf {A}\). As a consequence, matrix majorization is only sufficient but not necessary to conclude about unanimity in the ranking by all dissimilarity orderings \(\preccurlyeq\) consistent with axioms MixC, IPC, IEC and ISC. Conversely, the zonotope inclusion criterion is always consistent with axiom MixC.
Remark 5
For \(\mathbf {A},\mathbf {B}^j,\mathbf {B}\in \mathcal {M}_d\) such that \(\mathbf {B}\) and \(\mathbf {B}^j\), \(j=1,\ldots ,m\), satisfy condition (4): \(Z(\mathbf {B}^j)\subseteq Z(\mathbf {A})\) \(\forall j\) \(\Rightarrow\) \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\).
The zonotope inclusion criterion is consistent with MixC and, from Remark 1, it is also consistent with StrongMixC. However, the StrongMixC axiom is not necessary to characterize zonotope inclusion: the operations underlying StrongMixC do not allow, alone, to break down the zonotope inclusion ordering of any pair \(\mathbf {A}\) and \(\mathbf {B}\) into the existence of simpler mixing transformation mapping one matrix into the other. The main result of the paper shows that the MixC axiom provides the needed structure to establish a characterization of the zonotope inclusion order. Dissimilarity orderings consistent with StrongMixC but not with MixC can be represented by matrix majorization (Proposition 2), but this guarantees only a sufficient condition for zonotope inclusion.
Main result and discussion
Theorem 1
For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\), the following statements are equivalent:

(i)
\(\mathbf {B}\ \preccurlyeq \ \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MixC, ISC, IEC, IPC.

(ii)
\(Z(\mathbf {B}) \ \subseteq \ Z(\mathbf {A})\).
The theorem provides a novel complete characterization of the zonotope inclusion criterion in terms of dissimilarity. The zonotope inclusion criterion originates a partial order of distribution matrices. If the inclusion test fails, then consensus on the ranking of distribution matrices by all dissimilarity orderings consistent with MixC, IPC, IEC and ISC cannot be reached. Nonetheless, this partial order is “less partial” than matrix majorization (that is, zonotope inclusion is a refinement of \(\preccurlyeq ^R\)) and is thus capable of ordering a larger set of cases compared to it.
Remark 6
Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\): \(\mathbf {B}\preccurlyeq ^{R}\mathbf {A} \Rightarrow Z(\mathbf {B}) \subseteq Z(\mathbf {A})\).
The remark shows that there are matrices that cannot be ranked by dissimilarity orderings consistent with StrongMixC or, equivalently and alternatively, with axiom MC but that can be ordered only by resorting to axiom MixC, which provides the additional structure that is needed to refine \(\preccurlyeq ^R\). The rationale for this refinement is clarified in the proof of Theorem 1. There, we make use of the operations underlying axioms StrongMixC, IEC, IPC and ISC to characterize the distribution matrices that form the basis of the set of all matrices that are matrix majorized by any given \(\mathbf {A}\). We also show that some of the classes in each of the basis matrices identify vertices of \(Z(\mathbf {A})\) (and that \(Z(\mathbf {A})\) is the convex hull of its vertices). While the convex hull of the basis matrices obtained by using the weights implied by axiom StrongMixC is sufficient to characterize the full set of matrices \(\mathbf {B}\) such that \(\mathbf {B}\preccurlyeq ^R\mathbf {A}\), the same operation is not sufficiently flexible to characterize the entire set of matrices \(\mathbf {B}\) such that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). Instead, when condition (4) is applied to the permutations of \(\mathbf {A}\) (all regarded as dissimilar as \(\mathbf {A}\) itself), it identifies the vertices of \(Z(\mathbf {A})\), so that the convex mix of those gives \(Z(\mathbf {A})\). Some of the matrices identified in this way cannot be obtained using the weights implied by StrongMixC, which demonstrates Remark 6.
The reverse implication of Remark 6 is not true in general. The matrices \(\mathbf {A}\) and \(\mathbf {B}\) in (1) provide a counterexample in which \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) but the two matrices cannot be ranked by matrix majorization. Matrix majorization and zonotope inclusion orderings may coincide only under specific circumstances, such as those identified in Theorem 2 in Koshevoy (1995) or in cases where dissimilarity comparisons are limited to distributions where \(d=2\) (Dahl 1999; Lasso de la Vega and Volij 2014). We provide in the appendix a new geometric proof of the latter statement.
Remark 7
Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_2\): \(Z(\mathbf {B})\subseteq Z(\mathbf {A}) \Rightarrow \mathbf {B}\preccurlyeq ^{R}\mathbf {A}\) .
The set of matrices that can be ordered in terms of dissimilarity can be further extended by considering the transitive closure originated by all binary relations \(\preccurlyeq\) satisfying axioms in Theorem 1 and a new axiom, the Independence of Permutation of Groups (IPG), which introduces a property of invariance of the dissimilarity orderings with respect to the labeling of the groups.
Axiom 7
IPG (Independence from Permutations of Groups) For any \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\), if \(\mathbf {B}=\varvec{\Pi } _{d}\cdot \mathbf {A}\) for a permutation matrix \(\varvec{\Pi }_{d}\in \mathcal {P}_{d}\) then \(\mathbf {B}\thicksim \mathbf {A}\).
In the context of segregation analysis, the axiom provides a natural multigroup extension of the symmetry of types property Hutchens (2015). Together with ISC, IEC, IPC and MixC, the axiom extends dissimilarity comparisons based on zonotope inclusion orderings that are not concerned with the labeling of the groups. A proof of the following corollary rests on the properties of the zonotope.
Corollary 1
For any \(\mathbf {A}\in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all axioms satisfying MixC, ISC, IEC, IPC and IPG if and only if \(\exists \varvec{\Pi }_d\in \mathcal {P}_d\) such that \(Z(\mathbf {B})\subseteq Z(\varvec{\Pi }_d\mathbf {A}).\)
Dissimilarity indices
The partial order of dissimilarity in Theorem 1 can be represented in terms of agreement of dissimilarity indices satisfying desirable properties. A dissimilarity index is a multivariate function \(D:\mathcal {M}_d\rightarrow \mathbb {R}_+\) mapping a distribution matrix into a number, which can be interpreted as the level of dissimilarity among the d distributions represented in that matrix. These indices measure dissimilarity as the average of withinclass dispersion of group frequencies.
Consider first the dissimilarity orderings consistent with StrongMixC. In this case, dispersion of groups frequencies within each class can be quantified by a function h in the class \(\mathcal {H}\) of real valued convex functions defined on \(\Delta ^{d}\). Dispersion in class j contributes to the overall dissimilarity proportionally to the size of the class j, \(\overline{a}_{j}\). The dissimilarity index \(D_{h}\) with \(h\in \mathcal {H}\) aggregates these evaluations as follows:^{Footnote 12}
where \(a_{ij}/\overline{a}_{j}\) is the proportion of group i relative to the size of class j when groups are uniformly weighted. Dissimilarity is minimized when \(a_{ij}/\overline{a}_{j}=1/d\) for each of the d groups in all classes. Hence, by normalizing h so that \({h\left( \frac{1}{d}\mathbf {1}_{d}'\right) =0}\), the index takes value 0 when perfect similarity is reached. Dissimilarity is instead maximal when for every j there exists a i such that \(a_{ij}=\overline{a}_{j}\). The following proposition sets out a dominance condition in terms of dissimilarity indices.
Proposition 3
For any \(\mathbf {A},\mathbf {B} \in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms StrongMixC, ISC, IEC, IPC if and only if \(D_h(\mathbf {B})\ \le \ D_h(\mathbf {A})\) for all \(h\in \mathcal {H}\).
Using Proposition 1, we conclude that matrix majorization entails a necessary and sufficient condition for assessing agreement in dissimilarity evaluations for all indices described in Proposition 3. Matrix majorization is refined by the zonotope inclusion criterion. We provide an equivalent representation of the latter criterion in terms of dissimilarity measures based on the socalled price dominance criterion (Kolm 1977; Koshevoy and Mosler 1996; Andreoli and Zoli 2020).^{Footnote 13} Consider a set of “prices” (or normative weights) \(\mathbf {p}=(p_1,\ldots ,p_d)'\), which take on real values and therefore could also be negative, allowing to draw evaluations of the relative groups composition of each class of a distribution matrix by the implied “budget” (or weighted average) \(\mathbf {p}'\mathbf {a}_j/\overline{a}_j\). In a case of perfect similarity, \(\mathbf {a}_j/\overline{a}_j=\frac{1}{d}\mathbf {1}_d\) for all classes j. Therefore, perfect equality within each group i for all average realizations \(a_{ij}/\overline{a}_{j}\) across all classes j indicates lack of dissimilarity. The same consideration applies if all realizations in each class are weighted in \(\mathbf {p}'\mathbf {a}_j/\overline{a}_j\) irrespective of the weighting vector \(\mathbf {p}\). Each class contributes additively to dissimilarity, with evaluations indexed by a convex functional \(\phi : \mathbb {R}\rightarrow \mathbb {R}\) such that \(\phi (\mathbf {p}'\mathbf {a}_j/\overline{a}_j)\), which is introduced to quantify the inequality across the realizations of all classes. If we quantify this aggregate inequality by \(\frac{1}{d}\sum _{j=1}^{n_A}{\overline{a}_j\phi (\mathbf {p}'\cdot \mathbf {a}_j/\overline{a}_j)}\), then its minimum level can be reached at \(m:=\phi (\sum _{i=1}^d{p_i/d})\). The minimum bound for the dissimilarity index
is therefore 0.^{Footnote 14}
Proposition 4
For any \(\mathbf {A},\mathbf {B} \in \mathcal {M}_d\), \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MixC, ISC, IEC, IPC if and only if \(D_{\mathbf {p},\phi }(\mathbf {B})\ \le \ D_{\mathbf {p},\phi }(\mathbf {A})\) for all \(\mathbf {p}\in \mathbb {R}^d\) and for every \(\phi\) convex.
Proposition 4 provides the class of dissimilarity indices that are related to zonotope inclusion and defines a dominance condition that is weaker than that implied by Proposition 3.
Related orders
This section highlights the relevance of the dissimilarity model for the analysis of multigroup segregation and multivariate inequality. First, we argue that the zonotope inclusion criterion can be meaningfully used in the analysis of segregation with many (more than two) groups. We also demonstrate that widely used multigroup segregation indices characterized in the literature are consistent with dissimilarity preserving or reducing axioms. Second, we analyze the implications of the dissimilarity model for multivariate orderings of dispersion, where the focus is on dissimilarity between the distributions of certain attributes and a normatively relevant benchmark distribution. In this case, the configuration that displays less dissimilarity among the underlying distributions and with respect to the benchmark distribution, also exhibits less multivariate inequality/dispersion. We prove that the zonotope inclusion criterion weakens some of the most widely adopted robust criteria in the multivariate inequality literature. Third, we emphasize the relevance of the dissimilarity axioms for the analysis of univariate inequality.
Segregation
Segregation arises when individuals with different characteristics (such as their ethnic origin or gender) are distributed unevenly across the neighborhoods of a city, or across the schools of a schooling district, or across jobs within a firm. In segregation analysis, the realizations of interest are categorical and notordered. Mainstream approaches to segregation focus on the twogroups case and postulate consistency with the partial order generated by nonintersecting segregation curves Duncan and Duncan (1955) as a baseline.
A segregation curve is obtained by ordering the classes of \(\mathbf {A}\) by increasing magnitude of the ratio \(a_{2j}/a_{1j}\) evaluated for each class j. It gives the proportions of group 1 and of group 2 that are observed in the classes where group 2 is relatively overrepresented. The graph of the segregation curve coincides with the lower boundary of the zonotope representing the cumulative shares of groups 1 and group 2 across categories.
The ranking of twogroups distributions generated by nonintersecting segregation curves can be characterized through elementary segregationreducing operations (see Hutchens 1991) or, alternatively, by matrix majorization (Lasso de la Vega and Volij 2014; Hutchens 2015). When segregation curves cross, distributions can be ranked by segregation indices consistent with the segregation curve ordering (Reardon and Firebaugh 2002; Reardon 2009). Alternatively, segregation curves have been used to assess the dissimilarity between each group distribution and the population distribution as in AlonsoVillar and del Rio (2010).
We are not, however, aware of any ordering generated by a multigroup expansions of the segregation curves ordering. Frankel and Volij (2011) have provided normative justifications for using matrix majorization as a robust segregation criterion for ranking multigroup distributions (see also Flückinger and Silber 1999; Chakravarty and Silber 2007). However, matrix majorization is a demanding condition that is not testable in the multigroup setting. Results in this paper deliver three contributions to this literature.
First, Proposition 1 clarifies that the operations of merge (or, equivalently, StrongMixC), split, permutation and insertion/elimination of empty classes characterize the ranking produced by nonintersecting segregation curves when \(d=2\). The same axioms characterize matrix majorization when \(d\ge 2\), thus showing that every segregation comparison involves a dissimilarity comparison. The dissimilarity axiom MixC allows to weaken this criterion to a less partial ordering compared to matrix majorization, which can also be interpreted in terms of segregation.
Second, we promote the zonotopes inclusion criterion as a relevant multigroup extension of the segregation curve dominance criterion. The zonotope inclusion criterion is new in the segregation literature, it is testable and it allows to deal with the multidimensional nature of the data. In the case \(d=2\), segregation curve dominance is always consistent with zonotope inclusion, insofar the segregation curve can be understood as the lower boundary of a zonotope.^{Footnote 15} Moreover, segregation curve dominance is also equivalent to matrix majorization.
In the multigroup setting (\(d\ge 3)\), however, a dominance criterion based on comparisons of segregation curves across all pairs of groups provides only a necessary condition for zonotope inclusion. Furthermore, in this context the zonotope inclusion criterion is weaker than matrix majorization, thus providing a natural refinement to it. Theorem 1 characterizes it in terms of MixC axiom, thus proving the link between multigroup segregation and dissimilarity.
Third, Proposition 3 identifies and characterizes the class of multigroup segregation indices that are coherent with the family \(D_h\). Below are some examples of wellknown segregation indices belonging to this class.
The Duncan and Duncan’s dissimilarity index for a matrix \(\mathbf {A}\in \mathcal {M}_{2}\) is \(D(\mathbf {A}):=\frac{1}{2}\sum _{j=1}^{n_{A}}\left {a_{1j}a_{2j}} \right\). It measures dissimilarity as the average absolute distance between the elements \({a_{1j}/\overline{a}_{j}}\) and \({a_{2j}/\overline{a} _{j}}\) in each class. By setting
it follows that \(D_h(\mathbf {A})=D(\mathbf {A})\).
In the multigroup context (\(\mathbf {A}\in \mathcal {M}_{d}\)), segregation can be measured by the Atkinsontype segregation index, defined as \(A_{\omega }(\mathbf {A}):=1\sum _{j=1}^{n_{A}}\prod \nolimits _{i=1}^{d}{\left( a_{ij}\right) }^{\omega _{i}}\) for \(\omega _{i}\ge 0\) such that \(\sum _{i=1}^{d}{\omega _{i}=1}\). By setting
it follows that \(D_h(\mathbf {A})=A_{\omega }(\mathbf {A})\). Convexity of h stems from the features of the weighting scheme.
The mutual information index characterized in Frankel and Volij (2011) and Moulin (2016) is \(M(\mathbf {A}):=\log _{2}(d)\sum _{j=1}^{n_{A}}\left( \frac{\overline{a}_{j}}{d}\right) \sum _{i=1}^{d}{\frac{a_{ij}}{\overline{a}_{j}}}\cdot \log _{2}\left( \frac{\overline{a}_{j}}{a_{ij}}\right)\) with \({\frac{a_{ij}}{\overline{a}_{j}}}\cdot \log _{2}\left( \frac{\overline{a}_{j}}{a_{ij}}\right)\) set equal to 0 if \(a_{ij}=0\). By setting
it follows that \(D_h(\mathbf {A})=M(\mathbf {A})\). Convexity of h stems from the log operator.
Multivariate majorization and the Lorenz zonotope
In this section, the focus is on the multivariate majorization criteria that are adopted in robust inequality analysis. We argue that every inequality comparison involves the assessment of the dissimilarity between some relevant distributions and a benchmark distribution. A canonical example is that in which the distribution matrices \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\) represent multivariate distributions of commodities. A matrix represents the way in which shares of each commodity (by row) are allocated to certain classes, which can represent the demographic units (e.g. families or individuals) that consume these commodities. Units are not ordered in any meaningful way.
Multidimensional inequality arises from the dissimilarity between the d distributions under analysis and the distribution of the demographic weight of the n units. It is common to assume that every unit receives a uniform weight equal to 1/n. Under these circumstances, the next corollary, which follows from Proposition 1, formalizes the relation between multivariate inequality analysis and dissimilarity.
Corollary 2
Let \(\mathbf {A},\ \mathbf {B}\in \mathcal {M}_{d}\). Then
for every dissimilarity ordering \(\preccurlyeq\) satisfying axioms MC (or StrongMixC), ISC, IEC and IPC if and only if there exists \(\mathbf {X}\in \mathcal {R}_{n_{A},n_{B}}\) such that (i) \(\mathbf {B}=\mathbf {A} \mathbf {X}\) and (ii) \(\frac{n_A}{n_{B}}\mathbf {1}_{n_{B}}' = \mathbf {1}_{n_{A}}' \mathbf {X}\).
When \(n_A=n_B=n\), matrix \(\mathbf {X}\) in the corollary is doubly stochastic (\(\mathbf {X}\in \mathcal {D}_{n}\)). The condition \(\mathbf {B}=\mathbf {A}\cdot \mathbf {X}\) with \(\mathbf {X} \in \mathcal {D}_{n}\) implied by (6), often referred to as uniform majorization, is widely adopted in robust univariate and multivariate inequality analysis (see p. 613 in Marshall et al. 2011). All social welfare functions that are increasing and Schurconcave (i.e. display some degree of inequality aversion) would rank the two multivariate distributions accordingly.
Uniform majorization is a demanding criterion, alike matrix majorization, insofar it posits that inequality can be reduced only when every row of a distribution matrix \(\mathbf {B}\) is obtained from the corresponding row of another distribution matrix \(\mathbf {A}\) through a common set of transformations implied by the matrix \(\mathbf {X}\in \mathcal {D}_n\). The resulting ordering of distribution matrices is therefore partial. Koshevoy (1995) and Koshevoy and Mosler (1996) have studied a less partial order of multivariate distributions that is based on the Lorenz zonotopes inclusion criterion. A Lorenz zonotope, denoted \(LZ(\mathbf {A})\) with \(\mathbf {A}\in \mathcal {M}_d\), is a \(d+1\) dimensional zonotope of a distribution matrix \(\mathbf {A}\) augmented by the population distribution vector, that is \(LZ(\mathbf {A}):= Z(\tilde{\mathbf {A}})\) with \(LZ(\mathbf {A})\in \mathbb {R}_+^{d+1}\) and \(\tilde{\mathbf {A}}\) defined as in (6). The Lorenz zonotope inclusion criterion induces a partial order of distribution matrices that provides a testable refinement of uniform majorization.
The following chain of implications clarifies the relation between multivariate inequality and dissimilarity. The proof follows from previous results.
Remark 8
Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\) such that \(d\ge 2\) and \(n_A=n_B=n\). Then:
The first implication, showing that uniform majorization implies matrix majorization, has been discussed in the previous section. The Lorenz zonotope inclusion criterion defines an inequality partial order of distribution matrices that is less partial than (i.e., is implied by) matrix majorization. It follows that the ranking of distribution matrices given by \(LZ(\mathbf {B})\subseteq LZ(\mathbf {A})\) is always consistent with the implications of a merge transformation (or, alternatively, a convex combination underlying StrongMixC axiom) on matrices \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {B}}\). Any such transformation bears two implications for multidimensional inequality.
First, a merge transformation reduces dissimilarity between the distribution of each dimension separately and the benchmark distribution, implying a reduction of inequality in each dimension. Second, a merge transformation reduces the dissimilarity across dimensions, implying an increase in correlation between dimensions. This aspect is controversial, since the Lorenz zonotopes inclusion criterion may fail to rank distribution matrices that are instead unanimously ordered by social welfare functions satisfying aversion to correlation increasing transfers (Epstein and Tanny 1980; Atkinson and Bourguignon 1982; Decancq 2012), a desirable property in multidimensional inequality analysis stating that any transformation that rises the degree of association in realizations is bound to decrease social welfare Andreoli and Zoli (2020). The Extended Lorenz zonotope inclusion criterion proposed in Mosler (2012) addresses these concerns.
Although the Lorenz zonotope inclusion criterion may be problematic for multidimensional welfare analysis, it is still a relevant criterion for assessing dissimilarity between distributions. The criterion \(LZ(\mathbf {B})\subseteq LZ(\mathbf {A})\) can be weakened by looking at inclusions of the projections of the Lorenz Zonotope in the space of outcomes, that is \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). The latter is useful to analyze inequalities that arise from differences between distributions, regardless of the degree of inequality of each of these distributions. This feature is relevant, for instance, for constructing robust inequality of opportunity comparisons (see, for instance, Roemer and Trannoy 2016; Andreoli et al. 2019).^{Footnote 16}
Corollary 2 provides a characterization result that extends robust inequality assessments based on uniform majorization to matrices that possibly differ in size (\(n_A\ne n_B\)) but with the same number d of dimensions.
Income inequality
Corollary 2 also holds in the case \(d=1\). This case is of particular interest for social welfare analysis, as it rationalizes empirical comparisons of income inequality. In this section, we argue that every income inequality comparison involves a dissimilarity comparison, but not the reverse.
Empirical comparisons of income inequality consist in assessing the way total income in a sample of n income recipient units (such as households or individuals) is split across these units. We can hence represent a distribution of income shares by the nvariate vectors \(\mathbf {a}', \mathbf {b}'\in \mathcal {M}_1\), with \(\mathbf {a}'\cdot \mathbf {1}_{n}=\mathbf {b}'\cdot \mathbf {1} _{n}=1\). Each entry of the vectors corresponds to an income share allocated to a given unit. Anonymity is often invoked by the literature addressing income inequality measurement, thereby implying that any permutation of the units does not affect the extent of inequality displayed by \(\mathbf {a}'\) of \(\mathbf {b}'\).
As per condition (6) in Corollary 2, every empirical income inequality comparison involves a dissimilarity comparison between the distribution of income shares owned by each of the n units and the units’ weights. Furthermore, the ranking of distribution matrices induced by the LZ inclusion is consistent with Lorenz curves partial order. The chain of implications in Remark 8 hence runs in both directions when \(d=1\): Lorenz zonotopes inclusion implies unanimity for all social welfare functions that are increasing and concave in income which is equivalent to uniform majorization. All those conditions imply that income inequality analysis always subsumes a dissimilarity comparison.
A similar result holds even when the benchmark distribution of population weights is not uniform. Ebert and Moyes (2003) analyze the relation between welfare evaluations, Lorenz dominance and equivalence scales for incomes when population weights may differ among units and across distribution matrices. In this case, the interest is in ranking matrices such as \(\tilde{\mathbf {A}}:=(\varvec{\omega },\mathbf {a})'\), where \(\varvec{\omega }=(\omega _1,\ldots ,\omega _n)'\) and \(\omega _j\) can be understood as individual j’s weight. Using Corollary 2, every welfareconsistent measure of inequality can be written as an average of convex transformations of equivalized incomes, scaled by their demographic weights. This is formalized by the inequality index \(D_{h}(\tilde{\mathbf {A}}) =\sum _{j=1}^n \omega _jh(a_{j}/\omega _j)\) with \(h\in \mathcal {H}\) convex and \(a_{j}/\omega _j\) is j’s equivalent income.^{Footnote 17} From Proposition 3 (in combination with Remark 7), the zonotope inclusion criterion \(Z((\varvec{\omega },\mathbf {b})')\subseteq Z((\varvec{\omega },\mathbf {a})')\) provides a sufficient test for welfare dominance.
A well known result in inequality measurement is that an income distribution \(\mathbf {b}'\) displays less inequality than another distribution \(\mathbf {a}'\) if it can be obtained from the latter through a finite sequence of progressive (PigouDalton, PD) transfers of income from rich donors to poor recipients, without switching their relative positions in the income ranking (Hardy et al. 1934; Marshall et al. 2011).^{Footnote 18} In the univariate case (\(d=1\)), Corollary 2 implies that every sequence of PD transfers of incomes can be rationalized by a specific sequence of more fundamental dissimilarity preserving and reducing operations that are concerned with the way income shares and weights are shifted across units:
Corollary 3
Every PD transfer operation can be decomposed into a sequence of split of classes and merge of classes operations.
Split and merge operations can hence be seen as inequality reducing transformations that are more elementary than PD transfers. The proof of Corollary 3 rests on the fact that any Ttransform, an equivalent matrix representation of a PD transfer (see p. 33 in Marshall et al. 2011), can be exactly decomposed into the product of matrices representing split and merge operations. It follows that any univariate inequality comparison based on uniform majorization can be seen as a dissimilarity comparison but not the reverse, insofar the dissimilarity preserving and reducing operations of respectively split and merge characterize matrix majorization of which uniform majorization is a particular case.
The interesting and new result provided by Corollary 2 is that there always exists a sequence of split and merge operations that supports uniform majorization even in the multidimensional case (\(d\ge 2\)), although the same sequence cannot be generally rearranged to represent PD transfers Kolm (1977).
Concluding remarks
A large and sparse literature on segregation and inequality measurement has proposed criteria for ranking multigroup distributions according to the dissimilarity they exhibit. This paper establishes the axiomatic foundations of the dissimilarity criterion. We do so by developing a parsimonious axiomatic model which is based on dissimilarity preserving operations and a dissimilarity reducing operation, the merge, which consists in aggregating, distribution by distribution, the proportion of people observed in two separate classes. We study the partial order of distribution matrices originated by the transitive closure of all binary dissimilarity relations consistent with the operations and with a mixing axiom. This last axiom is crucial to justify an equivalent characterization of the “displays at most as much dissimilarity as” partial order. Our main theorem identifies a novel nonparametric criterion, based on inclusion of the zonotope set representations of the data, which is equivalent to the dissimilarity partial order thus identified.
The zonotope inclusion criterion is relevant in many contexts. One application could be in evaluating the impact of certain educational policies for the patterns of dissimilarity between multiethnic distributions of students across schools in a district. This problem is commonly referred to as schooling segregation. We can use zonotopes to conclude about robust changes in segregation when comparing the actual situation and a counterfactual distribution that would have emerged in the absence of the policy. While the education policy itself may have little to do with splitting, merging or mixing schools, the zonotope inclusion signals that one can always move from the counterfactual to the actual allocation of students through operations that are unanimously understood as segregationreducing.
In some cases, zonotopes inclusion is rejected by the data. The dissimilarity indices analyzed in the paper allow to produce conclusive evaluations about the changes in dissimilarity. The implied ranking is always consistent with the implications of the “elementary” transformations. Evaluations based on one or few dissimilarity indicators, however, are not robust and can always be challenged on the perspective offered by alternative measures. The complete characterization of the dissimilarity indicators presented here is left for future research.
Notes
Gini (1914, p. 189), translated from Italian, formalizes similarity as proportionality: “If n is the size of group \(\alpha\), m is the size of group \(\beta\), \(n_{x}\) the size of group \(\alpha\) assigned to class x and \(m_{x}\) the size of group \(\beta\) assigned to the same class, then it should hold [under similarity] that, for any value of x, \(\frac{n_{x}}{m_{x}}=\frac{n}{m}\) .”
See McMullen (1971) and Ziegler (1995) for a formal analysis of the zonotopes geometric properties. Dahl (1999) examines the zonotope inclusion criterion in the context of information analysis, whereas a related zonotope inclusion criterion (based on Lorenz zonotopes) provides a nonparametric test for robust comparisons of multivariate inequality (Koshevoy 1995; Koshevoy and Mosler 1996; Andreoli and Zoli 2020).
Relevant application of uniform and matrix majorization criteria are in linear algebra (Dahl 1999; Hasani and Radjabalipour 2007), in the comparison of statistical experiments (Blackwell 1953; Torgersen 1992), in information theory Grant et al. (1998), in the study of bivariate dependence orderings for categorical variables Giovagnoli et al. (2009) as well as in the analysis of inequality (see, for instance, Chapter 14 in Marshall et al. 2011; Tsui 1995; Gajdos and Weymark 2005; Weymark 2006) and segregation (see Frankel and Volij 2011).
The condition \(d\le n\) is necessary for \(\mathbf {D}\) to exist. If \(\mathbf {A}\) is such that \(d>n\), then it can display some dissimilarity, but never maximal dissimilarity.
The entries \(x_{ij}\) of matrix \(\mathbf {X}\in \mathcal {R}_{n,m}\) can be interpreted as the probability that the population in class i in the distribution of origin “migrates” to class j in the distribution of destination.
See Appendix A.15 for a formal proof.
For any \(\mathbf {A},\ \mathbf {B},\ \mathbf {C}\in \mathcal {M}_d\) the relation \(\preccurlyeq\) is transitive if \(\mathbf {C}\preccurlyeq \mathbf {B}\) and \(\mathbf {B}\preccurlyeq \mathbf {A}\) then \(\mathbf {C}\preccurlyeq \mathbf { A}\) and complete if either \(\mathbf {A}\preccurlyeq \mathbf {B}\) or \(\mathbf {B}\preccurlyeq \mathbf {A}\) or both, in which case \(\mathbf {B} \thicksim \mathbf {A}\).
Rao (1982) distinguishes the notion of dissimilarity from that of diversity. The former reflects differences or similarities between populations/groups heterogeneity, the latter reflects instead heterogeneity within the same population/group (see also Nehring and Puppe 2002). To see this, let \(\mathbf {s}'\) be any row of a perfect similarity matrix \(\mathbf {S}\). Any two perfect similarity matrices \(\mathbf {S}\) and \(\widetilde{\mathbf {S}}\) such that \(\mathbf {s}'\) is a uniform distribution across classes (high diversity) whereas \(\widetilde{\mathbf {s}}'\) is a distribution concentrating the mass in few or one realization (low diversity) are regarded as equivalent by every dissimilarity ordering, i.e. \(\mathbf {S}\thicksim \widetilde{\mathbf {S}}\).
We provide an example along the lines of the previous footnote. There are many matrices of different size displaying the same structure as the perfect similarity matrix \(\mathbf {S}\), for instance \(\tilde{\mathbf {S}}\) such that \(\tilde{\mathbf {s}}'\) is of size \(\tilde{n}>n\), where n is the size of \(\mathbf {s}'\). Yet, for any of such matrices, \(\mathbf {S}\thicksim \tilde{\mathbf {S}}\).
Blackwell (1953) has formalized a precise condition for “\(\mathbf {A}\) is at least as informative as \(\mathbf {B}\)” which coincides with \(\mathbf {B}\preccurlyeq ^R \mathbf {A}\), insofar the observation of an experiment outcome (a class) in \(\mathbf {A}\) is more informative about the underlying signal (the group) than it is in \(\mathbf {B}\). If \(\mathbf {A}=\mathbf {D}\), then the experiment outcome identifies the signal and \(\mathbf {A}\) would be at least as much informative than any matrix \(\mathbf {B}\), whereas if \(\mathbf {B}=\mathbf {S}\), then it is impossible to disentangle the underlying signal from observation of the experiment outcome and \(\mathbf {B}\) would be less informative than any matrix \(\mathbf {A}\).
If \(\mathbf {B}\) is obtained as in StrongMixC, then \(\mathbf {B}\varvec{\gamma }=\sum _{j=1}^m{w_j\mathbf {B}^j\varvec{\gamma }}\), \(\forall \varvec{\gamma }\in \mathcal {V}^{01}_n\), with \(\sum _j{w_j}=1\), which satisfies (4) because in this case \(\varvec{\gamma }^j=\varvec{\gamma }\) \(\forall j\).
For notational convenience empty classes receive weight \({\overline{a}=0}\) and therefore do not contribute to the overall dissimilarity.
Lemma 3 in Appendix formalizes this equivalence within the setting of this paper.
Notice that the theoretical bound does not depend on the sizes of \(\mathbf {A}\) but only on its structure: if \(\mathbf {A}=\mathbf {S}\), then \(D_{\mathbf {p},\phi }(\mathbf {S})=0\) for any \(\mathbf {p}\in \mathbb {R}^d\) and \(\phi\) convex.
For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{2}\), \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) if and only if the segregation curve of \(\mathbf {B}\) lies nowhere below the segregation curve of \(\mathbf {A}\).
The result (see Lemma 1 in the Appendix) follows from the homogeneity and convexity of \(g:\mathbb {R}^2\rightarrow \mathbb {R}\), yielding \(g(\omega _j,a_j)=\omega _jg(1,a_j/\omega _j)=\omega _jh(a_j/\omega _j)\) with h convex. Here we assume that income and population weights have unit size. If this is not the case, then \(a_j/\omega _j\) is proportional to j’s equivalent income.
Consider a distribution \(\mathbf {a}'\). A PD transfer consists in a movement of a mass \(\epsilon >0\) from class j to class k such that \(a_{j}>a_{k}\), yielding \(b_j=a_j\epsilon\), \(b_k=a_k+\epsilon\) and \(b_\ell = a_\ell\) \(\forall \ell \ne j,k\) such that \(b_j\ge b_k\). As a consequence of this transformation, \(\mathbf {b}'=\mathbf {a}'\mathbf {X}\), \(\mathbf {X}\in \mathcal {D}_n\) (Lorenz dominance) and \(\sum _j{a_j}=\sum _j{b_j}\).
See the preliminary results in Appendix A.1.
The two operations of permutation and insertion of classes transform \(\mathbf {A}\) into
$$\begin{aligned} \mathbf {A}\cdot \varvec{\Pi }_{n_{A}}\cdot \mathbf {Y}:=(\mathbf {a}_{1}, \underbrace{\mathbf {0}_{d},\ldots ,\mathbf {0}_{d}}_{n_{B}1\;\text {times} },\ldots ,\mathbf {a}_{n_{A}},\underbrace{\mathbf {0}_{d},\ldots ,\mathbf {0} _{d}}_{n_{B}1\;\text {times}})\text {.} \end{aligned}$$Formally: \(\mathbf {A}\cdot \mathbf {X}\cdot \varvec{\Pi }_{n_{A}n_{B}}=\left( \lambda _{11}\mathbf {a}_{1},\ldots ,\lambda _{n_{A}1}\mathbf {a} _{n_{A}},\ldots ,\lambda _{1n_{B}}\mathbf {a}_{1},\ldots ,\lambda _{n_{A}\;n_{B}}\mathbf {a}_{n_{A}}\right)\).
Note that all orderings \(\preccurlyeq\) satisfying IEC, ISC and IPC rank at unanimity \(\tilde{\mathbf {B}}\thicksim \tilde{\mathbf {A}}\) if and only if \(\tilde{\mathbf {A}}=\tilde{\mathbf {B}}\mathbf {Y}\) and \(\mathbf {Y}\in \mathcal {P}_n\). In fact, any other admissible matrix \(\mathbf {Y}\in \mathcal {R}_{n}^{IEC}\cup \mathcal {R}^{ISC}\) would not guarantee comparability of \(\tilde{\mathbf {A}}\) and \(\tilde{\mathbf {B}}\mathbf {Y}\), that is \(\mathbf {1}_d^{\prime }\tilde{\mathbf {A}}=\frac{d}{n}\mathbf {1}_{n}^{\prime }\) whereas \(\mathbf {1}_d^{\prime } \tilde{\mathbf {B}}\mathbf {Y}\ne \frac{d}{n}\mathbf {1}_{n}^{\prime }\).
Note further that matrix \(\tilde{\mathbf {A}}\) has uniform margins, implying that \(\mathbf {X}^{j}\) must be row and column stochastic.
In fact, zonotopes are convex hulls of the underlying vertices, being defined by the sequential cumulation of the classes of the distribution matrices (see also Koshevoy 1995).
References
AlonsoVillar O, del Rio C (2010) Local versus overall segregation measures. Math Soc Sci 60:30–38
Andreoli F, Zoli C (2020) From unidimensional to multidimensional inequality: a review. METRON 78:5–42
Andreoli F, Havnes T, Lefranc A (2019) Robust inequality of opportunity comparisons: theory and application to early childhood policy evaluation. Rev Econ Stat 101(2):355–369. https://doi.org/10.1162/rest_a_00747
Atkinson AB, Bourguignon F (1982) The comparison of multidimensioned distributions of economic status. Rev Econ Stud 49(2):183–201
Blackwell D (1953) Equivalent comparisons of experiments. Ann Math Stat 24(2):265–272
Blyth CR (1972) On Simpson’s paradox and the surething principle. J Am Stat Assoc 67(338):364–366
Chakravarty SR, Silber J (2007) A generalized index of employment segregation. Math Soc Sci 53(2):185–195
Dahl G (1999) Matrix majorization. Linear Algebra Appl 288:53–73
Decancq K (2012) Elementary multivariate rearrangements and stochastic dominance on a Fréchet class. J Econ Theory 147(4):1450–1459
Dekel E (1986) An axiomatic characterization of preferences under uncertainty: weakening the independence axiom. J Econ Theory 40(2):304–318
Donaldson D, Weymark JA (1998) A quasi ordering is the intersection of orderings. J Econ Theory 78(2):382–387
Duncan OD, Duncan B (1955) A methodological analysis of segregation indexes. Am Soc Rev 20(2):210–217
Ebert U, Moyes P (2003) Equivalence scales reconsidered. Econometrica 71(1):319–343. https://doi.org/10.1111/14680262.00397
Epstein LG, Tanny SM (1980) Increasing generalized correlation: a definition and some economic consequences. Can J Econ 13(1):16–34
Flückinger Y, Silber J (1999) The measurement of segregation in the labour force. PhysicaVerlag, Heidelberg
Frankel DM, Volij O (2011) Measuring school segregation. J Econ Theory 146(1):1–38
Gajdos T, Weymark JA (2005) Multidimensional generalized Gini indices. Econ Theor 26(3):471–496. https://doi.org/10.1007/s001990040529x
Gini C (1914) Di una misura di dissomiglianza tra due gruppi di quantità e delle sue applicazioni allo studio delle relazioni statistiche. Atti del Regio Istituto Veneto di Scienze Lettere e Arti, LXXIII
Giovagnoli A, Marzialetti J, Wynn H (2009) Bivariate dependence orderings for unordered categorical variables. In: Pronzato L, Zhigljavsky A (eds) Optimal design and related areas in optimization and statistics, vol 28 of, optimization and its applications. Springer, New York
Grant S, Kajii A, Polak B (1998) Intrinsic preference for information. J Econ Theory 83(2):233–259
Hardy GH, Littlewood JE, Polya G (1934) Inequalities. Cambridge University Press, London
Hasani A, Radjabalipour M (2007) On linear preservers of (right) matrix majorization. Linear Algebra Appl 423(2–3):255–261
Hutchens RM (1991) Segregation curves, Lorenz curves, and inequality in the distribution of people across occupations. Math Soc Sci 21(1):31–51
Hutchens RM (2015) Symmetric measures of segregation, segregation curves and Blackwell’s criterion. Math Soc Sci 73:63–68
James DR, Taeuber KE (1985) Measures of segregation. Sociol Methodol 15:1–32
Kolm SC (1977) Multidimensional egalitarianisms. Q J Econ 91(1):1–13
Koshevoy G (1995) Multivariate Lorenz majorization. Soc Choice Welf 12:93–102. https://doi.org/10.1007/BF00182196
Koshevoy G, Mosler K (1996) The Lorenz zonoid of a multivariate distribution. J Am Stat Assoc 91(434):873–882
Lasso de la Vega C, Volij O (2014) Segregation, informativeness and Lorenz dominance. Soc Choice Welf 43(3):547–564. https://doi.org/10.1007/s0035501408013
Marshall AW, Olkin I, Arnold BC (2011) Inequalities: theory of majorization and its applications. Springer, Berlin
Martínez Pería FD, Massey PG, Silvestre LE (2005) Weak matrix majorization. Linear Algebra Appl 403:343–368
McMullen P (1971) On zonotopes. Trans Am Math Soc 159:91–109
Mosler K (2012) Multivariate dispersion, central regions, and depth: the lift zonoid approach, vol 165. Springer Science and Business Media, Berlin
Moulin H (2016) Entropy, desegregation, and proportional rationing. J Econ Theory 162:1–20
Nehring K, Puppe C (2002) A theory of diversity. Econometrica 70(3):1155–1198
Rao CR (1982) Diversity and dissimilarity coefficients: a unified approach. Theoret Popul Biol 21(1):24–43
Reardon SF (2009) Measures of ordinal segregation. In: Yves JS, Sean FR (eds) Occupational and residential segregation research on economic inequality, vol 17. Emerald Group Publishing Limited, Bingley, pp 129–155
Reardon SF, Firebaugh G (2002) Measures of multigroup segregation. Sociol Methodol 32:33–67
Roemer JE, Trannoy A (2016) Equality of opportunity: theory and measurement. J Econ Lit 54(4):1288–1332
Torgersen E (1992) Comparison of statistical experiments, vol 36 of encyclopedia of mathematics and its applications. Cambridge University Press, Cambridge
Tsui KY (1995) Multidimensional generalizations of the relative and absolute inequality indices: the AtkinsonKolmSen approach. J Econ Theory 67(1):251–265
Weymark JA (2006) The normative approach to the measurement of multidimensional inequality. Inequality and economic integration. Routlege, London, p 26
Ziegler G (1995) Lectures on polytopes, number 152 in graduate texts in mathematics. SpringerVerlag, New York
Funding
Open access funding provided by Università degli Studi di Verona within the CRUICARE Agreement. This paper forms part of the research projects The Measurement of Ordinal and Multidimensional Inequalities (Contract No. ANR16CE41000501  ORDINEQ) of the French National Agency for Research, the research project MOBILIFE (grant RBVR17KFHX) and PREOPP (grant RBVR19FSFA) of the University of Verona Basic Research scheme and the NORFACE/DIAL project IMCHILD: The impact of childhood circumstances on individual outcomes over the lifecourse (grant INTER/NORFACE/16/11333934/IMCHILD) of the Luxembourg National Research Fund (FNR), whose financial support is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs
Appendix A: Proofs
Appendix A.1 Useful additional results
The first result shows that matrix majorization admits an equivalent representation in terms of unanimous ranking for a well defined class of convex functions.
Lemma 1
For any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\), \(\mathbf {B }\preccurlyeq ^{R}\mathbf {A}\) if and only if
for all functions \(g:\mathbb {R}^{d}\rightarrow \mathbb {R}\) that are convex and homogeneous such that \(g(\mathbf {0}_{d}^{\prime })=0\).
For a formal proof, see Lemma 15.C.11 in Marshall et al. (2011).
The second result shows that the insertion of empty classes, split and merge operations can be represented through linear transformations involving row stochastic matrices. An operation of insertion of empty classes transforms \(\mathbf {A}\) into \(\mathbf {B}\) with \(n_{B}>n_{A}\) by augmenting \(\mathbf {A}\) of \(n_{B}n_{A}\) columns with zero entries. We denote by \(\mathcal {R}_{n_{A},n_{B}}^{IEC}\subset \mathcal {R}_{n_{A},n_{B}}\) the set of all matrices reproducing an insertion of empty classes when postmultiplied to a distribution matrix \(\mathbf {A}\). Hence \(\mathbf {Y}\in \mathcal {R}_{n_{A},n_{B}}^{IEC}\) is an identity matrix of size \(n_{A}\) augmented by \(n_{B}n_{A}\) columns with zero entries.
Let \(\mathcal {M}_{d}^{0}\subset \mathcal {M}_{d}\) define the set of matrices exhibiting at least one column of zeroes. For \(\mathbf {A}\in \mathcal {M} _{d}^{0}\), let \(\mathcal {J}_{A}^{0}\) denote the index set of all columns in \(\mathbf {A}\) with all zeroes and \(\mathcal {J}_{A}\) denote the index set of all the other columns of \(\mathbf {A}\). Let \(j\in \mathcal {J}_{A}\) such that \(j+1\in \mathcal {J}_{A}^{0}\). The matrix \(\mathbf {Z}_{[j]}\) incorporates an operation of split of classes applied to matrix \(\mathbf {A}\in \mathcal {M}_{d}^{0}\) that leads to matrix \(\mathbf {B}\in \mathcal {M}_{d}\) with \(\mathbf {b}_{j}=\lambda \mathbf {a}_{j}\) and \(\mathbf {b}_{j+1}=\mathbf {a} _{j+1}+(1\lambda )\mathbf {a}_{j}=(1\lambda )\mathbf {a}_{j}\) for \(\lambda \in [0,1]\). Let \(k\ne j\), the set of all transformation matrices \(\mathbf {Z}_{[j]}\) reproducing a split of classes is denoted by:
Also the merge of classes operation originates a distribution matrix \(\mathbf {B}=\mathbf {A}\cdot \mathbf {M}_{[j]}\), where the matrix \(\mathbf {M}_{[j]}\) performs a merge of class j towards \(j+1\). Such a matrix belongs to the set:
The third preliminary result provides an equivalent (algebraic and finite) condition for testing zonotope inclusion. Let denote first the sets \(\mathcal { V}_{n}:=\{\mathbf {v}:v_{j}\in [0,1],j=1,\ldots ,n\}\) and \(\mathcal {V} _{n}^{01}=\{\mathbf {v}:v_{j}\in \{0,1\},j=1,\ldots ,n\}\).
Lemma 2
Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\) of size \(d\times n\) such that \(\mathbf {1}_{d}^{\prime }{\mathbf {B}}=\mathbf {1} _{d}^{\prime }{\mathbf {A}}=\frac{d}{n}\mathbf {1}_{{n}}^{\prime }\). i) \(Z( \mathbf {B})\subseteq Z(\mathbf {A})\) if and only if ii) \(\forall \varvec{ \gamma }\in \mathcal {V}_{n}^{01}\) \(\exists \varvec{\theta }\in \mathcal {V }_{n}\): \(\mathbf {B}\varvec{\gamma }=\mathbf {A}\varvec{\theta }\).
Proof
Note that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) if and only if for any \(\mathbf {z}\in Z(\mathbf {B})\) then \(\mathbf {z}\in Z(\mathbf {A}),\) whereas for \(Z(\mathbf {B})\subset Z(\mathbf {A})\) we also have that \(\exists \tilde{ \mathbf {z}}\in Z(\mathbf {A})\) such that \(\tilde{\mathbf {z}}\not \in Z(\mathbf { B})\). Using the Minkowski sum properties, this means that \(\forall \varvec{\gamma }\in \mathcal {V}_{n}\) \(\exists \varvec{\theta }\in \mathcal {V}_{n}\): \(\mathbf {z}:=\sum _{j}{\gamma _{j}\mathbf {b}_{j}}=\sum _{j}{ \theta _{j}\mathbf {a}_{j},}\) that is \({\mathbf {B}}\varvec{\gamma }={ \mathbf {A}}\varvec{\theta }\) for \(\varvec{\gamma ,\theta }\in \mathcal {V}_{n}.\) Let assume, for ease of notation, that \(\varvec{\gamma }\in \mathcal {V}_{n}\) is ordered so that \(0\le \gamma _{1}\le \gamma _{2}\le \ldots \le \gamma _{{n}}\le 1\). Denote \(\delta _{1}=\gamma _{1}\) and \(\delta _{k}=\gamma _{k}\gamma _{k1}\) for \(k=2,\ldots ,{n}\). Note that \(\delta _{k}\in [0,1]\) \(\forall k\). Recall that \({\mathbf {i}_{k}}\) is the colum vector k of the identity matrix of size n and denote further \(\mathbf {1}_{(h,h+1,\ldots ,{n})}:=\sum _{k=h}^{{n}}{\mathbf {i}_{k}}\) for any \(1\le h\le {n}\) so that if \(h={n}1\) then \(\mathbf {1}_{({n}1,{n})}=\mathbf { i}_{{n}1}+\mathbf {i}_{{n}}\) and so on. We have that \(\mathbf {1} _{(h,h+1,\ldots ,{n})}\in \mathcal {V}_{{n}}^{01}\) \(\forall h\). For any \(\varvec{\gamma }\in \mathcal {V}_{{n}}\) there always exists a class j such that \(\varvec{\gamma }=\sum _{h=j}^{{n}}{\delta _{h}\mathbf {1} _{(h,h+1,\ldots ,{n})}}\), so that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) is equivalent to
for \(\varvec{\gamma },\varvec{\theta }\in \mathcal {V}_{{n}}\). We use this equivalence in the proof.
i) \(\Rightarrow\) ii). Immediate, since \(\mathcal {V}_n^{01}\subseteq \mathcal {V }_n\).
ii) \(\Rightarrow\) i). Assume that ii) holds, which can be equivalently stated as: for any \(1\le h\le {n}\) there exists \(\varvec{\theta } _{(h,h+1,\ldots ,{n})}\in \mathcal {V}_{{n}}\): \({\mathbf {B}}\mathbf {1} _{(h,h+1,\ldots ,{n})}={\mathbf {A}}\varvec{\theta }_{(h,h+1,\ldots ,{n})}\). Consider substituting into \({\mathbf {B}}\cdot \sum _{h=j}^{{n}}{\delta _{h}\cdot \mathbf {1}_{(h,h+1,\ldots ,{n})}}\), yielding for any j:
Since \(\sum _{h=j}^{{n}}{\delta _{h}\varvec{\theta }_{(h,h+1,\ldots ,n)}} \in \mathcal {V}_{n}\) for any \(\delta _{h}\in [0,1]\) and for any j, the latter equation implies (7) and then i). \(\square\)
The fourth and last preliminary result shows the equivalence between the zonotope inclusion criterion and price majorization. Although the proof largely draws on Koshevoy (1995), the setting we investigate is logically distinct, insofar the price majorization we use invokes ddimensional prices to evaluate inclusion of ddimensional zonotopes, whereas Koshevoy shows the equivalence with \(d+1\)dimensional extensions of the zonotope (the Lorenz zonotope). Denote by \(\mathcal {C}_{n}\) the set of column stochastic matrices, so that \(\mathbf {Y}\in \mathcal {C}_{n}\) \(\Leftrightarrow\) \(\mathbf {Y}^{\prime }\in \mathcal {R}_{n}\) . Define: "\(\mathbf {B}\) is price majorized by \(\mathbf {A}\)" whenever \(\forall \mathbf {p}\in \mathbb {R}^{d}\), \(\exists \mathbf {Y}\in \mathcal {D}_{n}\) such that \(\mathbf {p}^{\prime }\mathbf {B}=\mathbf {p}^{\prime }\mathbf {A} \mathbf {Y}\). Given that we consider matrices in \(\mathcal {M}_{d}\), then condition \(\mathbf {p}^{\prime }\mathbf {B1}_{n}=\mathbf {p}^{\prime }\mathbf {A} \mathbf {1}_{n}\) is satisfied by construction, therefore it suffices to consider only transformations \(\mathbf {Y}\in \mathcal {C}_{n}\) to obtain price majorization.
Lemma 3
Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\) of size \(d\times n\) such that \(\mathbf {1}_d^{\prime }{\mathbf {B}}=\mathbf {1}_d^{\prime }{ \mathbf {A}}=\frac{d}{n}\mathbf {1}_{{n}}^{\prime }\). i) \(Z(\mathbf {B} )\subseteq Z(\mathbf {A})\) if and only if ii) \(\forall {\mathbf {p}}\in \mathbb {R}^{d}\) \(\exists \mathbf {Y}\in \mathcal {C}_n\) such that \(\mathbf {p}^{\prime } \mathbf {B}=\mathbf {p}^{\prime }\mathbf {A}\mathbf {Y}\).
Proof
i) \(\Rightarrow\) ii). Assume that i) holds, then from Lemma (2) we have that \(\forall k\in \{1,\ldots ,{n}\}\) \(\exists \theta _{jk}\in [0,1]\) for \(j=1,2,\ldots ,{n}\) such that \({\mathbf {b}}_{k}=\sum _{j}{ \theta _{jk}{\mathbf {a}}_{j}}\). In compact notation:
Matrix \(\varvec{\Theta }\) may not be row stochastic but it is guaranteed that \({\mathbf {B}}\mathbf {1}_{{n}}={\mathbf {A}}\mathbf {1}_{{n}}\). Using the fact that \(\mathbf {1}_{d}^{\prime }{\mathbf {B}}=\mathbf {1}_{d}^{\prime }{ \mathbf {A}}\) we get from (8) that \(\mathbf {1}_{d}^{\prime }{\mathbf {A }}\varvec{\Theta }=\mathbf {1}_{d}^{\prime }{\mathbf {A}}\), which gives \(\sum _{i}{{a}_{ik}}=\sum _{i}{\sum _{j}{{a}_{ij}\theta _{jk}}}=\sum _{j}{\theta _{jk}\sum _{i}{{a}_{ij}}}\), \(\forall k\). Given that \(\sum _{i}{{a}_{ik}} =\sum _{i}{{a}_{ij}}=d/{n}\) for any \(k\ne j\), by definition of \(\mathbf {A}, \mathbf {B}\), we get \(\sum _{j}{\theta _{jk}}=1\), which means that \(\varvec{\Theta }\in \mathcal {C}_{n}\). Hence i) \(\Rightarrow {\mathbf {B}}= {\mathbf {A}}\varvec{\Theta },\ \varvec{\Theta }\in \mathcal {C} _{n}\Rightarrow \mathbf {p}^{\prime }{\mathbf {B}}=\mathbf {p}^{\prime }{ \mathbf {A}}\varvec{\Theta }\) for all \({\mathbf {p}}\in \mathbb {R}^{d}\) and \(\varvec{\Theta }\in \mathcal {C}_{n}\), which is ii).
ii) \(\Rightarrow\) i). Assume that ii) holds, which implies that \(\mathbf {p} ^{\prime }\mathbf {B}\) Lorenz dominantes \(\mathbf {p}^{\prime }\mathbf {A}\) for any \(\mathbf {p}\in \mathbb {R}^{d}\), that is \(\sum _{j=1}^{k}{\mathbf {p} ^{\prime }\mathbf {b}_{(j)}}\ge \sum _{j=1}^{k}{\mathbf {p}^{\prime }\mathbf {a} _{(j)}}\) \(\forall k=1,\ldots ,n\) where for each \(\mathbf {p}\in \mathbb {R}^{d}\) the classes of \(\mathbf {B}\) have been ordered by increasing magnitude of \(\mathbf {p}^{\prime }\mathbf {b}_{(j)}\), so that \(\mathbf {p}^{\prime }\mathbf {b }_{(j)}\le \mathbf {p}^{\prime }\mathbf {b}_{(j+1)}\) \(\forall j\) (and similarly for the classes of \(\mathbf {A}\)). This is equivalent (see Marshall et al. 2011) to:
where the max operator selects the ktuple of columns of \(\mathbf {A}\) and \(\mathbf {B}\) that yield the largest value when multiplied by any vector of prices \(\mathbf {p}\). An equivalent formulation is:
which implies
The previous condition identifies a situation where \({\mathbf {B}}\cdot ( \mathbf {i}_{j_{1}}+...+\mathbf {i}_{j_{k}})\) lies in the convex hull of \({ \mathbf {A}}\cdot (\mathbf {i}_{j_{1}}+...+\mathbf {i}_{j_{k}})\) , see Koshevoy (1995). Making use of the definition of convex hull inclusion, we can alternatively write:
or equivalently \({\mathbf {B}}\cdot (\mathbf {i}_{j_{1}}+...+\mathbf {i} _{j_{k}})={\mathbf {A}}\cdot \sum _{\forall j_{1},...,j_{k}}{\alpha _{j_{1},...,j_{k}}(\mathbf {i}_{j_{1}}+...+\mathbf {i}_{j_{k}})}\), \(\forall k\) , \(\forall j_{1},...,j_{k}\). Noticing that \(\bigcup _{k}\{(\mathbf {i} _{i_{1}}+...+\mathbf {i}_{i_{k}}): i_{1},...,i_{k}\in \{1,\ldots ,{n} \}\}=\mathcal {V}_{{n}}^{01}\) and that \(\bigcup _{k}\{\sum _{\forall i_{1},...,i_{k}}\alpha _{j_{1},...,j_{k}}(\mathbf {i}_{i_{1}}+...+\mathbf {i} _{i_{k}}):i_{1},...,i_{k}\in \{1,\ldots ,{n}\}\}=\mathcal {V}_{{n}}\), one has that (9) is equivalent to \(\forall \varvec{\gamma }\in \mathcal {V} _{{n}}^{01}\) \(\exists \varvec{\theta }\in \mathcal {V}_{{n}}\): \({\mathbf {B }}\varvec{\gamma }={\mathbf {A}}\varvec{\theta }\). From Lemma 2, the latter condition implies i). \(\square\)
Appendix A.2 Proof of Proposition 1
Proof
i) \(\Rightarrow\) ii). If i) holds, then it follows from Donaldson and Weymark (1998) that there exist partial orders that rank \(\mathbf {B}\) at most as dissimilar as \(\mathbf {A}\) and that these orders are consistent with the operations underlying axioms MC, IPC, IEC and ISC. Matrix majorization \(\preccurlyeq ^{R}\) is one of such partial orders. In fact, as highlighted in the preliminary results section, every operation underlying axioms MC, ISC, IEC and IPC can be represented by a row stochastic matrix transformation, respectively by \(\mathbf {M}\in \mathcal {R} _{n_{A}}^{MC}\) for a merge of classes operation, by \(\mathbf {S}\in \mathcal {R }_{n_{A}}^{ISC}\) for a split operation, by \(\mathbf {E}\in \mathcal {R} _{n_{A},n}^{IEC}\) for insertion/elimination of empty classes and by \(\mathbf { \Pi }\in \mathcal {P}_{n_{A}}\) for permutations of classes. A sequence of these operations is for instance \(\mathbf {X}=\mathbf {M}\mathbf {S}\mathbf {E}\varvec{\Pi }\in \mathcal {R}_{n_{A},n}\) since the set of all row stochastic matrices \(\mathcal {R}\) is closed with respect to the product operation. Any \(\mathbf {X}\) obtained through these operations is row stochastic, thereby implying that ii) holds.
ii) \(\Rightarrow\) i). Assume that ii) holds, hence \(\mathbf {B}=\mathbf {A} \mathbf {\Theta }\) with \(\mathbf {\Theta }\in \mathcal {R}_{n_{A},n_{B}}\). In shorthand notation
where \(\theta _{j}(k)\) denotes the generic element \(\theta _{jk}\) of \(\mathbf {\Theta }.\) Each addendum in (10) can be written as:
where each \(\lambda \in [0,1]\). In fact, every sequence of \(n_{B}\) random numbers \(\{\theta (k)\}_{k=1}^{n_{B}}\) with support in [0, 1] satisfying \(\sum _{k}\theta (k)=1\) can be written as:
The constraint \(\sum _{k}\theta (k)=1\) imposes that there must exist an index k such that \(\lambda _{k}=1\). If \(\lambda _{k}=1\), then the series is completed and \(\lambda _{j}=0=\theta (j)\) for any \(j>k\). Note that \(\theta (k)=0\) also if \(\lambda _{k}=0\), thus the sequence of \(\theta (k)\) may also include elements equal to 0 even if it is not yet completed. Solving backward the sequence in (12) leads to (11) given that \(\theta (k)=\lambda _{k}\cdot \prod _{j=1}^{k1}(1\lambda _{j})\) with \(\lambda _{j}\in [0,1]\;\forall j\) and \(\lambda _{k}\in [0,1]\;\forall k=2,\ldots ,n_{B}\).
Consider a sequence of matrices \(\mathbf {Z}_{[k]}\in \mathcal {R}_{A}^{ISC}\).^{Footnote 19} Matrix \(\mathbf {Z}_{[1]}\) performs the first split of vector \(\mathbf {a}_{j}\) according to proportion \(\lambda _{j1}\). Matrix \(\mathbf {Z}_{[2]}\) performs a split on the residual component \((1\lambda _{j1})\mathbf {a}_{j}\) according to the proportion \(\lambda _{j2}\). The iteration of these arguments leads to matrix \(\mathbf {Z} _{[n_{B}1]}\), representing the last split of vector \(\mathbf {a}_{j}\) out of a sequence of \(n_{B}2\) splits. It follows that (11) can be equivalently written as:
Extending the representation in (13) to all addends in (10) leads to a total of \(n_{A}(n_{B}1)=n\) splits of \(\mathbf {A}\)’s classes. The split operation preserves the number of classes, therefore it can be operationalized only if there exists a matrix \(\mathbf {Y}\in \mathcal {R} _{n_{A},n}^{IEC}\) adding a sufficient amount of empty classes to \(\mathbf {A}\) to perform the n splits. According to the summation operator in (10), the order of the classes of \(\mathbf {A}\) is irrelevant. Thus operations of permutations of classes are admitted.^{Footnote 20} By combining all the operations in a single row we obtain \(\mathbf {A}\cdot \widehat{\mathbf {X}}\), where the \(n_{A}\times n\) matrix \(\widehat{\mathbf {X}}\) rewrites:
where \(\mathbf {Z}_{[k]}(j)\) is indexed for j to highlight the relation with the class j in \(\mathbf {A}\). Here \(\widetilde{\mathbf {Z}}_{[k]}(j):= \text {diag}\left( \mathbf {I},\;\mathbf {Z}_{[k]}(j),\mathbf {I}^{\prime }\right)\) and \(\mathbf {I}\) and \(\mathbf {I}^{\prime }\) are two identity matrices of size \((j1)n_{B}\) and \((n_{A}j)n_{B}\) respectively. Line (15) comes from the fact that every block diagonal matrix can be represented as the product of the matrices associated with each block, obtained substituting the remaining blocks with identity matrices.
To conclude, it is possible to perform permutations of \(n_{A}n_{B}\) classes to rearrange the entries in \(\mathbf {A}\cdot \widehat{\mathbf {X}}\) to accommodate the definition of a merge of classes transformation through a matrix \(\varvec{\Pi }_{n_{A}n_{B}}\). A convenient permutation rearranges \(n_{B}\) groups of \(n_{A}\)tuples of classes of \(\mathbf {A}\cdot \widehat{\mathbf {X}}\), so that the jth group consists of the sequence of classes \((\lambda _{1j}\mathbf {a}_{1},\ldots ,\lambda _{n_{A}j}\mathbf {a} _{n_{A}},\ldots )\).^{Footnote 21} Consider a sequence of merges of classes, so that class 1 in the new configuration is merged with class 2, then the resulting class 2 is merged with class 3 and so on, up to the first \(n_{A}\) classes. The sequence of merge transformations can be modeled by matrices \(\mathbf {M}_{[1]}\in \mathcal {R}_{n_{A}n_{B}}^{MC}\), \(\mathbf {M}_{[2]}\in \mathcal {R}_{n_{A}n_{B}}^{MC}\) and so on, up to \(\mathbf { M}_{[n_{A}1]}\in \mathcal {R}_{n_{A}n_{B}}^{MC}\), respectively. Given the order of the classes, the same procedure can be extended to all the \(n_{B}1\) remaining \(n_{A}\)tuples of classes. This operation leaves many empty classes, that can be eliminated using a matrix \(\mathbf {Y}^{\prime }\), incorporating the elimination of empty classes operation. As a result:
All the matrices multiplying \(\mathbf {A}\) are row stochastic, and the resulting matrix is also row stochastic. Hence \(\mathbf {\Theta }\in \mathcal { R}_{n_{A},n_{B}}\) can be always decomposed in permutation transformations of the product of matrices originated exclusively by split, merge and insertion/deletion of empty classes. This is i), which concludes the proof. \(\square\)
Appendix A.3 Proof of Remark 2
Proof
Assume \(\mathbf {B}^{j}\preccurlyeq ^{R}\mathbf {A}\), then (by Proposition 1) \(\exists \mathbf {X}^{j}\in \mathcal {R}_{n}\) such that \(\mathbf {B}^{j}= \mathbf {A}\mathbf {X}^{j}\), \(\forall j\). Recall that axiom StrongMixC requires that \(\mathbf {B}=\sum _{j=1}^m{w_j\mathbf {B}^j}\), hence \({\mathbf {B}}=\sum _{j=1}^{m}{ w_{j}\mathbf {B}^{j}}=\sum _{j=1}^{m}{w_{j}\mathbf {A}\mathbf {X}^{j}}=\mathbf {A} \sum _{j=1}^{m}{w_{j}\mathbf {X}^{j}}\preccurlyeq ^{R}\mathbf {A}\) since \(\sum _{j=1}^{m}{w_{j}\mathbf {X}^{j}}\in \mathcal {R}_{n}\) given that \(\sum _{j=1}^{m}{w_{j}}=1\). Thus, \({\mathbf {B}}\preccurlyeq ^{R}\mathbf {A.}\) \(\square\)
Appendix A.4 Proof of Remark 3
Proof
A direct verification of the remark can be obtained by considering matrices \(\mathbf {A},\hat{\mathbf {A}}\in \mathcal {M}_{d}\) such that \(\hat{\mathbf {A}}\) is obtained by permuting columns j and \(j+1\) of \(\mathbf {A.}\) Thus, by IPC we have \(\mathbf {A}\thicksim \hat{\mathbf {A}}.\) Setting \(m=2,\) with \(\mathbf {B}^{1}:=\mathbf {A}\) and \(\mathbf {B}^{2}:=\hat{\mathbf {A}}\) in the definition of StrongMixC and letting \(w_{1}=w_{2}=1/2,\) we obtain \(\mathbf {B }^{0}=1/2\cdot \mathbf {B}^{1}+1/2\cdot \mathbf {B}^{2}=1/2\cdot (\mathbf {A}+ \hat{\mathbf {A}}).\) That is, \(\mathbf {B}^{0}\) coincides with \(\mathbf {A}\) except for columns j and \(j+1\) whose vectors are identical and coincide with \(1/2\cdot \mathbf {a}_{j}+1/2\cdot \mathbf {a}_{j+1}.\) By StrongMixC we have that \(\mathbf {B}^{0}\preccurlyeq \mathbf {A.}\) Consider matrix \(\mathbf {B }\in \mathcal {M}_{d}\) that is identical to \(\mathbf {B}^{0}\) except for columns j and \(j+1\) where \(\mathbf {b}_{j}=\mathbf {0}_{d}\) and \(\mathbf {b} _{j+1}=\mathbf {a}_{j}+\mathbf {a}_{j+1}.\) If we drop the empty column j from \(\mathbf {B}\) and split the class/column \(j+1\) with weights 1/2 and 1/2 into two adjacent classes we obtain matrix \(\mathbf {B}^{0}.\) Applying IEC and ISC we have that \(\mathbf {B}\thicksim \mathbf {B}^{0}.\) By transitivity of the dissimilarity relation \(\mathbf {B}\thicksim \mathbf {B} ^{0}\preccurlyeq \mathbf {A}\) implies that \(\mathbf {B}\preccurlyeq \mathbf {A.}\) Note that by construction matrices \(\mathbf {A}\) and \(\mathbf {B}\) are those in the definition of axiom MC that turns out to be satisfied. \(\square\)
Appendix A.5 Proof of Proposition 2
Proof
i) \(\Rightarrow\) ii). The proof consists in showing that the set of all matrices \(\mathbf {B}\) such that \(\mathbf {B}\preccurlyeq ^{R}\mathbf {A}\) can be characterized using exclusively operations underlying axioms StrongMixC, IEC, ISC and IPC. The result follows from the fact that if i) holds, then it follows from Donaldson and Weymark (1998) that there exist partial orders that rank \(\mathbf {B}\) at most as dissimilar as \(\mathbf {A}\) and that these orders are consistent with the operations underlying axioms ISC, IEC, IPC and StrongMixC, and matrix majorization \(\preccurlyeq ^R\) is one of such partial orders (by Proposition 1 and Remark 2).
The result holds for any matrix in \(\mathcal {M}_{d}\). Let \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{d}\) with \({n_{A}}\) not necessarily equal to \({n_{B}.}\) First, consider matrices \(\tilde{\mathbf {A}},\tilde{\mathbf {B}}\in \mathcal {M}_{d}\) of size \(d\times n\) obtained from \(\mathbf {A},\mathbf {B}\) through split and permutation of classes and deletion of empty classes such that \(\mathbf {1} _{d}^{\prime }\tilde{\mathbf {A}}=\mathbf {1}_{d}^{\prime }\tilde{\mathbf {B}}= \frac{d}{n}\mathbf {1}_{n}^{\prime }\). Every ordering \(\preccurlyeq\) satisfying axioms IEC, ISC and IPC ranks \(\tilde{\mathbf {A}}\thicksim \mathbf {A}\) and \(\tilde{\mathbf {B}}\thicksim \mathbf {B}\).
We now investigate the implications for the transitive closure of the orderings \(\preccurlyeq\) (satisfying axioms IPC, IEC, ESC) deriving from the fact that it satisfies StrongMixC. We do so by looking at all permutations of a matrix \(\tilde{\mathbf {A}}\), that turn out to be indifferent to it in terms of dissimilarity for all orderings satisfying IPC^{Footnote 22}, and applying to them the mixing operations considered in StrongMixC, thereby obtaining the set of matrices that are matrix majorized by \(\tilde{\mathbf {A} }\). The result holds for any initial matrix \(\mathbf {A}\in \mathcal {M}_d\). Denote by \(J_{k}\), \(k=2,\dots ,n\) a partition of the classes \(\{1,\dots ,n\}\) with cardinality \(J_{k}=k\). For instance, \(J_{2}=\{1,4\}\) when \(n\ge 4\). There are \(m_{k}=\left( {\begin{array}{c}n\\ k\end{array}}\right)\) partitions for any k, with \(m=\sum _{k}{m_{k}}\), that are collected in the set \(\mathcal {J}_{k}\), so that \(J_{k}\in \mathcal {J}_{k}\) and \(J_{k}\subseteq \{1,\ldots ,n\}\). For any such partition \(J_{k}\), define \(\mathcal {P}_{n}^{J_{k}}\) the set of \(n\times n\) permutation matrices corresponding to all permutations of indices in \(J_{k}\) (there are k! of them). The matrix \(\tilde{\mathbf {A}}\varvec{\Pi }\), for \(\varvec{ \Pi }\in \mathcal {P}_{n}^{J_{k}}\) \(\forall k\) is obtained by permuting k columns indexed as in \(J_{k}\). Given that \(\preccurlyeq\) satisfies IPC, then \(\tilde{\mathbf {A}}\varvec{\Pi }\thicksim \tilde{\mathbf {A}}\). Consider a mix operation as defined in StrongMixC axiom, that gives equal weight \(w_{h}=\frac{1}{k!}\) to each permutation \(h=1,\ldots ,k!\) of indices in \(J_{k}\), yielding \(\sum _{h=1}^{k!}{w_h\tilde{\mathbf {A}}\varvec{\Pi }}=\sum _{\varvec{\Pi }\in \mathcal {P}_{n}^{J_{k}}}{ \frac{1}{k!}\tilde{\mathbf {A}}\varvec{\Pi }}=\tilde{\mathbf {A}}\cdot \sum _{\varvec{\Pi }\in \mathcal {P}_{n}^{J_{k}}}{\frac{1}{k!}\varvec{ \Pi }}\). There are m of such matrices \(\sum _{\varvec{\Pi }\in \mathcal { P}_{n}^{J_{k}}}{\frac{1}{k!}\varvec{\Pi }}\in \mathcal {R}_{n}\), that can be denoted by \(\mathbf {X}^{j}\), for \(j=1,\ldots ,m\). Note that matrices \(\mathbf {X}^{j}\) form the basis of the set \(\mathcal {R}_{n}\) (see Proposition 3.1 in Dahl 1999).^{Footnote 23} We hence have that \(\tilde{\mathbf {A}}\mathbf {X}^{j}\preccurlyeq \tilde{\mathbf {A}}\) \(\forall j\) for all orderings \(\preccurlyeq\) satisfying StrongMixC and IPC, moreover we have that \(\tilde{\mathbf {A}}\mathbf {X}^{j}\preccurlyeq ^{R}\tilde{ \mathbf {A}}\) given that all \(\mathbf {X}^{j}\) are in \(\mathcal {R}_{n}.\) Consider now obtaining \(\tilde{\mathbf {B}}\) by mixing (using the weights in axiom StrongMixC) matrices \(\tilde{\mathbf {A}}\mathbf {X}^{j}\) \(\forall j\), such as: \(\tilde{\mathbf {B}}=\sum _{j=1}^{m}{w_{j}\tilde{\mathbf {A}}\mathbf {X} ^{j}}=\tilde{\mathbf {A}}\sum _{j=1}^{m}{w_{j}\mathbf {X}^{j}}\). Since \(\sum _{j} {w_{j}}=1\), the summation term identifies the convex hull of all rowstochastic matrices that form the basis for the set \(\mathcal {R}_{n}\). From Corollary 3.2 in Dahl (1999), the set of matrices \(\tilde{\mathbf {B}}=\tilde{\mathbf {A}}\sum _{j=1}^{m}{w_{j}\mathbf {X}^{j}}\) obtained for all \(w_{j}\in [0,1]\) and \(\sum _j{w_j}=1\) identifies the whole set of matrices that are matrix majorized by \(\tilde{\mathbf {A}}\) (which is called a Markotope), thus guaranteeing that \(\tilde{\mathbf {B}}\preccurlyeq ^{R} \tilde{\mathbf {A}}\). Recall now (see the second set of preliminary results) that because of the transformations of split and permutation of classes and deletion of empty classes that have generated matrices \(\tilde{\mathbf {A}}, \tilde{\mathbf {B}}\in \mathcal {M}_{d}\) such that \(\tilde{\mathbf {A}} \thicksim \mathbf {A}\) and \(\tilde{\mathbf {B}}\thicksim \mathbf {B}\) we have that it is possible to obtain \(\tilde{\mathbf {A}}\) from \(\mathbf {A}\) through a row stochastic transformation and also the other way round making use of another row stochastic transformation, and similarly for \(\tilde{\mathbf {B}}\) and \(\mathbf {B,}\) thus \(\mathbf {B}\preccurlyeq ^{R}\tilde{\mathbf {B}}\) and \(\tilde{\mathbf {A}}\preccurlyeq ^{R}\mathbf {A}\) that combined with \(\tilde{ \mathbf {B}}\preccurlyeq ^{R}\tilde{\mathbf {A}}\) by transitivity gives condition ii).
ii) \(\Rightarrow\) i). Assume ii), which is equivalent to \(\mathbf {B} \preccurlyeq \mathbf {A}\) by all dissimilarity orders satisfying MC, IPC, IEC, ISC from Proposition 1 and implies i) by Remark 3. \(\square\)
Appendix A.6 Proof of Remark 4
Proof
The proof relies heavily on Remark 5 and Remark 7, which are demonstrated later. From Remark 7, \(\mathbf {B}\preccurlyeq ^R \mathbf {A}\) is equivalently represented by \(Z(\mathbf {B})\subseteq Z(\mathbf { A})\) when \(d=2\). By Remark 5, zonotope inclusion is consistent with the mixing operations underlying axiom MixC. \(\square\)
Appendix A.7 Proof of Remark 5
Proof
Consider a \(d\times n\) matrix \(\mathbf {A}\in \mathcal {M}_{d}\) and a sequence of \(d\times n\) matrices \(\mathbf {B}^{j}\in \mathcal {M}_{d}\) for \(j=1,\ldots ,m\) such that \(\mathbf {1}_{d}^{\prime }\mathbf {A}=\mathbf {1}_{d}^{\prime } \mathbf {B}^{j}=\frac{d}{n}\mathbf {1}_{n}^{\prime }\) \(\forall j\). Note that for any sequence \(\tilde{\mathbf {A}},\tilde{\mathbf {B}}^j\in \mathcal {M}_d\) one can find such matrices \(\mathbf {A},\mathbf {B}^j\) defined as above such that \(Z(\mathbf {A})=Z(\tilde{\mathbf {A}})\) and \(Z(\mathbf {B}^j)=Z(\tilde{\mathbf {B}}^j)\), \(\forall j\). Assume that \(Z(\mathbf {B}^{j})\subseteq Z(\mathbf {A}),\forall j\). By Lemma 2, \(\forall \varvec{\gamma }^j\in \mathcal {V}_n^{01}\) \(\exists \varvec{\theta }^j\in \mathcal {V}_n\) such that \(\mathbf {B}^j\varvec{\gamma }^j=\mathbf {A}\varvec{\theta }^j\), \(\forall j\). Obtain \(\mathbf {B}\in \mathcal {M}_d\) as per (4), then \(\forall \varvec{\gamma }\in \mathcal {V}_n^{01}\) \(\exists \varvec{\gamma }^j\in \mathcal {V}_n^{01}\) such that \(\mathbf {B}\varvec{\gamma }\in conv \{\mathbf {B}^1\varvec{\gamma }^1,\ldots ,\mathbf {B}^m\varvec{\gamma }^m\}\). There hence exist weights denoted \(\omega _j^{\varvec{\gamma }}\) that are specific to the choice of the mixing \(\varvec{\gamma }\), satisfying \(\omega _j^{\varvec{\gamma }}\in [0,1]\) and \(\sum _{j=1}^m{\omega _j^{\varvec{\gamma }}}=1\) such that \(\mathbf {B}\varvec{\gamma }=\sum _j{\omega _j^{\varvec{\gamma }}\mathbf {B}^j\varvec{\gamma }^j}= \sum _j{\omega _j^{\varvec{\gamma }}\mathbf {A}\varvec{\theta }^j}= \mathbf {A}\sum _j{\omega _j^{\varvec{\gamma }}\varvec{\theta }^j}=\mathbf {A}\varvec{\theta }\) with \(\varvec{\theta }^j\in \mathcal {V}_n\) \(\forall j\). Since \(\mathcal {V}_n\) is closed to convex combinations, then \(\varvec{\theta }\in \mathcal {V}_n\), which by Lemma 2 is a sufficient condition for \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). \(\square\)
Appendix A.8 Proof of Theorem 1
Proof
i) \(\Rightarrow\) ii). The proof consists in showing that the set of all matrices \(\mathbf {B}\) such that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) can be characterized using exclusively operations underlying axioms MixC, IEC, ISC and IPC. The result follows from the fact that if i) holds then it follows from Donaldson and Weymark (1998) that there exist partial orders that rank \(\mathbf {B}\) at most as dissimilar as \(\mathbf {A}\) and that these orders are consistent with the operations underlying axioms ISC, IEC, IPC and MixC. The zonotope inclusion criterion entails one of such partial orders (by Remark 5). The result must hold for any matrix \(\mathbf {A}\). Using the same procedure and notation as in the part i) \(\Rightarrow\) ii) of the proof of Proposition 2, we can construct matrices \(\mathbf {X}^{j}\), for \(j=1,\ldots ,m\) such that \(\tilde{\mathbf {A}} \mathbf {X}^{j}\preccurlyeq \tilde{\mathbf {A}}\) \(\forall j\) for all orderings \(\preccurlyeq\) consistent with axioms MixC, IPC, IEC, ISC (note that the set of orderings consistent with axioms MixC is a subset of those consistent with StrongMixC by Remark 1). Moreover, note that for such matrices it is also verified that for any \(\varvec{\gamma }\) \(\in \mathcal {V}_{n}\) there exists a \(\varvec{\theta }\in \mathcal {V}_{n}\) such that \(\tilde{\mathbf {A}}\mathbf {X}^{j}\varvec{\gamma }=\tilde{ \mathbf {A}}\varvec{\theta }\) which yields \(\tilde{\mathbf {A}}(\mathbf {X} ^{j}\mathbf {\varvec{\gamma }}\varvec{\theta })=\mathbf {0}_{d}\) \(\forall j\). Using the fact that \(\mathbf {X}^{j}\in \mathcal {R}_{n}\) \(\forall j\) then \(\mathbf {X}^{j}\varvec{\gamma }\in \mathcal {V}_{n}\), so that it is sufficient to set \(\varvec{\theta }=\mathbf {X}^{j}\varvec{ \gamma }\) for every \(\varvec{\gamma }\in \mathcal {V}_{n}\) to guarantee that \(\varvec{\theta }\in \mathcal {V}_{n}.\) As argued in the proof of Lemma 2 the previous condition implies that \(Z(\tilde{\mathbf {A}} \mathbf {X}^{j})\subseteq Z(\tilde{\mathbf {A}})\) \(\forall j\). Note that only some classes (i.e., columns) of \(\tilde{\mathbf {A}}\mathbf {X} ^{j}\) identify vertices of \(Z(\tilde{\mathbf {A}})\). To see this, note that for any matrix \(\mathbf {X}^{j}=\sum _{\varvec{\Pi }\in \mathcal {P} _{n}^{J_{k}}}{\frac{1}{k!}\varvec{\Pi }}\) the class h of \(\tilde{ \mathbf {A}}\mathbf {X}^{j}\) has coordinates \(\mathbf {v}:=\sum _{\ell \in J_{k}} {\frac{(k1)!}{k!}\tilde{\mathbf {a}}_{\ell }}\) for any \(h\in J_{k}\), whereas class \(h^{\prime }\) of \(\tilde{\mathbf {A}}\mathbf {X}^{j}\) is equal to \(\tilde{\mathbf {a}}_{h^{\prime }}\) for all \(h^{\prime }\not \in J_{k}\). This holds \(\forall j,k\). Any vector \(\mathbf {v}\) is a vertex of the zonotope \(Z( \tilde{\mathbf {A}})\), since \(k\mathbf {v}=\tilde{\mathbf {A}}\varvec{ \gamma }\) with \(\varvec{\gamma }\in \mathcal {V}_{n}^{01}\) such that \(\gamma _{j}=1\) whenever \(j\in J_{k}\) whereas \(\gamma _{j}=0\) for any \(j\not \in J_k\). From Proposition 3.1 in Dahl (1999), the zonotope set \(Z( \tilde{\mathbf {A}})\) is the convex hull of these vertices. To complete the proof, consider now a matrix \(\tilde{\mathbf {B}}\in \mathcal {M}_d\) satisfying condition (4), where \(\mathbf {B}^j=\tilde{\mathbf {A}}\mathbf {X}^j\), \(j=1,\ldots ,m\). Such a matrix is ranked \(\tilde{\mathbf {B}}\preccurlyeq \tilde{\mathbf {A}}\) by all dissimilarity ordering satisfying axioms MixC, IEC, IPC and ISC. For any such matrix, we have that \(\forall \varvec{\gamma }\in \mathcal {V}_n^{01}\) \(\exists \varvec{\gamma }^j\in \mathcal {V}_n^{01}\), \(j=1,\ldots ,m\): \(\tilde{\mathbf {B}}\varvec{\gamma }\in conv \{\tilde{\mathbf {A}}\mathbf {X}^1\varvec{\gamma }^1,\ldots ,\tilde{\mathbf {A}}\mathbf {X}^m\varvec{\gamma }^m\}\). Any element of the conv set writes: \(\sum _{j=1}^m{\omega _j \tilde{\mathbf {A}}\mathbf {X}^j\varvec{\gamma }^j} = \tilde{\mathbf {A}}\sum _{j=1}^m{\omega _j\mathbf {X}^j\varvec{\gamma }^j}= \tilde{\mathbf {A}} \sum _{j=1}^m{\omega _j\varvec{\theta }^j} = \tilde{\mathbf {A}}\varvec{\theta }\) with \(\varvec{\theta }^j\in \mathcal {V}_n\) since \(\mathbf {X}^j\in \mathcal {R}_n\) \(\forall j\) and also \(\varvec{\theta }\in \mathcal {V}_n\) given that \(\omega _j\in [0,1]\), \(\sum _j{\omega _j}=1\). It follows that \(\forall \varvec{\gamma }\in \mathcal {V}_n^{01}\) \(\exists \varvec{\theta }\in \mathcal {V}_n\): \(\tilde{\mathbf {B}}\varvec{\gamma }=\tilde{\mathbf {A}}\varvec{\theta }\), which is equivalent to \(Z(\tilde{\mathbf {B}})\subseteq Z(\tilde{\mathbf {A}} )\) from Lemma 2. The fact that \(Z(\tilde{\mathbf {B}})=Z(\mathbf {B})\) and \(Z(\tilde{ \mathbf {A}})=Z(\mathbf {A})\) gives ii).
In order to highlight the differences with the analogous steps in the proof of Proposition 2, note that, as shown there, each matrix \(\mathbf {X }^{j}\) is a basis for \(\mathcal {R}_{n}\): every rowstochastic matrix is obtained as a convex combination (with weights as in axiom StrongMixC) of matrices \(\mathbf {X}^{j}\) for \( j=1,\ldots ,m.\) However, the weighting schemes underlying axiom StrongMixC are only a subset of the admissible weights according to axioms MixC. In particular, these weights are not capable of generating the convex hull of all vertices of \(Z(\tilde{\mathbf {A}})\) and, as a consequence, they are not sufficient to characterize the full set of matrices \(\tilde{\mathbf {B }}\) such that \(Z(\tilde{\mathbf {B}})\subseteq Z(\tilde{\mathbf {A}})\), as opposed to the weights considered in MixC. Nonetheless, the weights underlying axiom StrongMixC provide sufficient structure to characterize the set of matrices \(\tilde{\mathbf {B}}\) such that \(\tilde{\mathbf {B}} \preccurlyeq ^{R}\tilde{\mathbf {A}}\), as illustrated in Proposition 2.
ii) \(\Rightarrow\) i). We show that the zonotope inclusion condition can always be rationalized by the existence of matrices that are ranked consistently by all dissimilarity orderings in i). Assume that ii) holds, and note that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) \(\Leftrightarrow\) \(Z( \tilde{\mathbf {B}})\subseteq Z(\tilde{\mathbf {A}})\) where \(\tilde{\mathbf {A}} ,\tilde{\mathbf {B}}\in \mathcal {M}_{d}\) are \(d\times n\) matrices obtained though split and permutation of classes and deletion of empty classes from \(\mathbf {A}\) and \(\mathbf {B}\) respectively, such that \(\mathbf {1}_{d}^{\prime }\tilde{\mathbf {A}}=\mathbf {1}_{d}^{\prime }\tilde{\mathbf {B}}=\frac{d}{n} \mathbf {1}_{n}^{\prime }\), as done in the first part of the proof of Proposition 2. Consider further matrices \(\tilde{\mathbf {B}} ^{j}\in \mathcal {M}_{d}\), for \(j=1,\dots ,m\) with \(m=n\), of size \(d\times n\) defined as: \(\tilde{\mathbf {B}}^{j}:=(\tilde{\mathbf {b}}_{j},\frac{1}{n1}\mathbf {r} _{j},\ldots ,\frac{1}{n1}\mathbf {r}_{j})\cdot \varvec{\Pi }_{1,j}\) with \(\varvec{\Pi }_{i,j}\in \mathcal {P}_{n}\) being a permutation matrix permuting classes 1 and j. Denote vector \(\mathbf {r}_{j}:=\mathbf {1} _{d}\tilde{\mathbf {b}}_{j}\) the “residual”, where we recall that \(\tilde{\mathbf {b}}_j\) denotes column j of matrix \(\tilde{\mathbf {B}}\). By construction, \(Z(\tilde{\mathbf {B}}^{j})\subseteq Z(\tilde{\mathbf {A}})\) \(\forall j\). Considering that because of the definition of zonotope inclusion we have that \(\tilde{\mathbf {b}}_{j}=\tilde{\mathbf {A}}\varvec{\theta }\) , for \(\varvec{\theta }\in \mathcal {V}_{n}\) then \(\mathbf {r}_{j}=\tilde{ \mathbf {A}}(\mathbf {1}_{n}\varvec{\theta })\). Notice that the vector \(\varvec{\theta }\) is specific to a matrix \(\tilde{\mathbf {B}}^j\). It follows that:
with \(\mathbf {X}^{j}\in \mathcal {R}_{n}\) for every \(j=1,\ldots ,m\). We conclude that condition ii) always implies that \(\tilde{\mathbf {B}}^{j}\preccurlyeq ^{R}\tilde{\mathbf {A} }\) \(\forall j\) for the n matrices \(\tilde{\mathbf {B}}^{j}\) obtained as above. Now consider the set of weights \(w_{j}^{k}\) with the following features: \(w_{j}^{j}=1\) and \(w_{k}^{j}=0\) for any \(j=1,\ldots ,n\) and \(k=1,\ldots ,n\) such that \(k\ge j\). By construction: \(\tilde{\mathbf {b}}_{k}=\sum _{j}{w_{j}^{k}\mathbf {\tilde{b }}_{k}^{j}}\) for all \(k=1,\ldots ,n\). From Proposition 1, \(\tilde{ \mathbf {B}}^{j}\preccurlyeq \tilde{\mathbf {A}}\) \(\forall j\) for all orderings satisfying MC, ISC, IEC, IPC. These are also the orderings satisfying ISC, IEC, IPC and StrongMixC from Proposition 2. By Remark 1, this set of orderings includes those that are in i) because those considered in the set satisfy the stronger version of the MixC axiom. To conclude, we should verify that matrix \(\tilde{\mathbf {B}}\) is obtained through a weighting scheme that is consistent with condition (4). This is immediate, since for \(\varvec{\gamma }=(\gamma _1,\ldots ,\gamma _n)'\in \mathcal {V}_n^{01}\) we have that \(\tilde{\mathbf {B}}\varvec{\gamma }= (\tilde{\mathbf {b}}_1,\ldots ,\tilde{\mathbf {b}}_n)\varvec{\gamma }= (\sum _j{w_j^1\tilde{\mathbf {b}}_1^j},\ldots ,\sum _j{w_j^n\tilde{\mathbf {b}}_n^j})\varvec{\gamma }= (\tilde{\mathbf {b}}_1^1,\ldots ,\tilde{\mathbf {b}}_n^n)\varvec{\gamma }= (\tilde{\mathbf {B}}^1\mathbf {i}_1,\ldots ,\tilde{\mathbf {B}}^n\mathbf {i}_n)\varvec{\gamma }= (\tilde{\mathbf {B}}^1\mathbf {i}_1\gamma _1,\ldots ,\tilde{\mathbf {B}}^n\mathbf {i}_n\gamma _n)\) where we recall that \(\mathbf {i}_k\) is the kth column of a \(n\times n\) identity matrix and that the second equality follows from the choice of the weighting scheme. It is sufficient to set \(\varvec{\gamma }^j:=\mathbf {i}_j\gamma _j\in \mathcal {V}_n^{01}\) to conclude that the condition \(\forall \varvec{\gamma }\in \mathcal {V}^{01}_n\) \(\exists \varvec{\gamma }^j\in \mathcal {V}_n^{01}\) such that \(\tilde{\mathbf {B}}\varvec{\gamma }= (\tilde{\mathbf {B}}^1\varvec{\gamma }^1,\ldots ,\tilde{\mathbf {B}}^n\mathbf {i}_n\varvec{\gamma }^n)\) satisfies condition (4).
As a result for the obtained matrix \(\tilde{\mathbf {B}}\) we have that \(\tilde{\mathbf {B}}\preccurlyeq \tilde{\mathbf {A}}\) for all orderings satisfying ISC, IEC, IPC and MixC. The same orderings rank \(\tilde{ \mathbf {A}}\thicksim \mathbf {A}\) and \(\tilde{\mathbf {B}}\thicksim \mathbf {B}\), which implies i). \(\square\)
Appendix A.9 Proof of Remark 6
Proof
Recall the definition of the zonotope set inclusion, for any \(\mathbf {A}, \mathbf {B}\in \mathcal {M}_{d}\), that is \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) if and only if \(\forall \varvec{\gamma }\in \mathcal {V}_{n_{B}}\) \(\exists \varvec{\theta }\in \mathcal {V}_{n_{A}}\) such that \(\mathbf {B} \varvec{\gamma }=\mathbf {A}\varvec{\theta }\).
Let \(\mathbf {B}\preccurlyeq ^{R}\mathbf {A}\). By Proposition 1, \(\mathbf {B}=\mathbf {A}\mathbf {X}\) with \(\mathbf {X}\in \mathcal {R} _{n_{B},n_{A}}\), which after substituting in \(\mathbf {B}\varvec{\gamma }= \mathbf {A}\varvec{\theta }\) yields \(\mathbf {A}(\mathbf {X}\mathbf { \varvec{\gamma }}\varvec{\theta })=\mathbf {0}_{d}\). Given that \(\mathbf {X}\in \mathcal {R}_{n_{B},n_{A}},\) this implies that \(\mathbf {X} \varvec{\gamma }\in \mathcal {V}_{n_{A}}\) for any \(\varvec{\gamma } \in \mathcal {V}_{n_{B}}\), it is therefore sufficient to set \(\varvec{ \theta }=\mathbf {X}\varvec{\gamma }\) for \(\varvec{\gamma }\in \mathcal {V}_{n_{B}}\) to guarantee that \(\varvec{\theta }\in \mathcal {V} _{n_{A}}\), which thus implies \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\). \(\square\)
Appendix A.10 Proof of Remark 7
Proof
Let \(\tilde{\mathbf {A}},\tilde{\mathbf {B}}\in \mathcal {M}_{2}\) denote two distribution matrices obtained respectively from \(\mathbf {A},\mathbf {B}\in \mathcal {M}_{2}\) using split and permutation of classes and deletion of empty classes, such that \(Z(\mathbf {A})=Z(\tilde{\mathbf {A}})\), \(Z(\mathbf {B} )=Z(\tilde{\mathbf {B}})\), with \(n_{\tilde{A}}=n_{\tilde{B}}=\tilde{n}\), \(\sum _{j}{\mathbf {a}_{j}}=\sum _{j}{\tilde{\mathbf {a}_{j}}}=\sum _{j}{\tilde{ \mathbf {b}_{j}}}=\sum _{j}{\mathbf {b}_{j}}\) and \(\sum _{i}{\tilde{a}_{ij}} =\sum _{i}{\tilde{b}_{ij}}=\frac{1}{\tilde{n}}\) \(\forall j=1,\ldots ,\tilde{n}\). Assume that \(Z(\tilde{\mathbf {B}})\subseteq Z(\tilde{\mathbf {A}})\) which, from Lemma 2, implies that \(\forall \varvec{\gamma }\in \mathcal {V}_{\tilde{n}}^{01},\ \exists \varvec{\theta }\in \mathcal {V}_{ \tilde{n}}:\ \tilde{\mathbf {B}}\varvec{\gamma }=\tilde{\mathbf {A}} \varvec{\theta }\). The vector \(\varvec{\gamma }\) selects column vectors from \(\tilde{\mathbf {B}}\) and aggregates these vectors with elementwise summations. For every \(i=1,2\) we can equivalently write the condition as:
with \(\theta _{jk}\in [0,1]\) \(\forall j,k\). It is necessary and sufficient that (17) holds for an arbitrary group \(i=1\) to guarantee that (17) also holds for \(i=2\), given that by construction \(\tilde{a} _{2j}=\frac{2}{\tilde{n}}\tilde{a}_{1j}\) and \(\tilde{b}_{2j}=\frac{2}{ \tilde{n}}\tilde{b}_{1j}\) \(\forall j=1,\ldots ,\tilde{n}\). After rearranging the terms in (17) in increasing order, so that \(\tilde{b} _{1(k)}\le \tilde{b}_{1(k+1)}\) and \(\tilde{a}_{1(k)}\le \tilde{a}_{1(k+1)}\) , for \(k=1,\ldots ,\tilde{n}1\), and using the fact that \(\theta _{jk}\in [0,1]\) it follows that \(\sum _{k=1}^{h}{\tilde{b}_{1(k)}}\ge \sum _{j=1}^{h}{\tilde{a}_{1(j)}}\), for any \(h=1,\ldots ,\tilde{n}\). From Marshall et al. (2011), if \(d=2,\) this condition is equivalent to uniform majorization, i.e. \(\exists \mathbf {X}\in \mathcal {D}_{ \tilde{n}}:\ (\tilde{b}_{11},\ldots \tilde{b}_{1\tilde{n}})=(\tilde{a} _{11},\ldots \tilde{a}_{1\tilde{n}})\mathbf {X}\). Hence \(\tilde{\mathbf {B}}= \tilde{\mathbf {A}}\mathbf {X}\), since the same matrix \(\mathbf {X}\) guarantees that \((\tilde{b}_{21},\ldots \tilde{b}_{2\tilde{n}})=(\tilde{a}_{21},\ldots \tilde{a}_{2\tilde{n}})\mathbf {X}\). Furthermore, notice that the indifference class of \(\preccurlyeq ^R\) is also characterized by the existence of row stochastic matrices: for any \(\mathbf {A},\mathbf {B}\in \mathcal {M}_d\), \(\mathbf {B}\thicksim \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MC, ISC, IEC and IPC if and only if \(\exists \mathbf {X}\in \mathcal {R}_{n_{A},n_{B}}\) and \(\exists \mathbf {X^{\prime }}\in \mathcal {R}_{n_{B},n_{A}}\) such that \(\mathbf {B}=\mathbf {A}\cdot \mathbf {X}\) and \(\mathbf {A}=\mathbf {B}\cdot \mathbf {X^{\prime }}\). We can hence write \(\tilde{\mathbf {A}}=\mathbf {A}\mathbf {Z}\) with \(\mathbf {Z}\in \mathcal {R} _{n_{A},\tilde{n}}\) and \(\mathbf {B}=\tilde{\mathbf {B}}\mathbf {Y}\) with \(\mathbf {Y}\in \mathcal {R}_{\tilde{n},n_{B}}\). Thus \(\mathbf {B}=\tilde{ \mathbf {B}}\mathbf {Y}=\tilde{\mathbf {A}}\mathbf {X}\mathbf {Y}=\mathbf {A} \mathbf {Z}\mathbf {X}\mathbf {Y}=\mathbf {A}\varvec{\Theta }\) with \(\varvec{\Theta }\in \mathcal {R}_{n_{A},n_{B}}\) since \(\mathcal {D}_{ \tilde{n}}\subseteq \mathcal {R}_{\tilde{n}}\) and the set \(\mathcal {R}_{n}\) is closed with respect to matrix product. This concludes the proof for \(d=2\) . When \(d>2\), there is no guarantee that the row stochastic matrix \(\mathbf {X }\) satisfying (17) for group i also verifies \(\tilde{\mathbf {B}}= \tilde{\mathbf {A}}\mathbf {X}\). \(\square\)
Appendix A.11 Proof of Proposition 3
Proof
From Proposition 1, i) is equivalent to \(\mathbf {B}\preccurlyeq ^{R} \mathbf {A}\). From Lemma 1 in Appendix A.1, we have that \(\mathbf {B }\preccurlyeq ^{R}\mathbf {A}\) if and only if
for \(g:\mathbb {R}^{d}\rightarrow \mathbb {R}\) convex and homogeneous with \(g( \mathbf {0}_{d}^{\prime })=0\). Note that \(\mathbf {B}\preccurlyeq ^{R}\mathbf {A }\) is equivalent to \((\mathbf {B}^{\prime },\overline{\mathbf {b}})^{^{\prime }}\preccurlyeq ^{R}(\mathbf {A}^{\prime },\overline{\mathbf {a}})^{\prime }\), where \(\overline{\mathbf {b}}^{\prime }=\mathbf {1}_{d}^{\prime }\cdot \mathbf { B}\) and \(\overline{\mathbf {a}}^{\prime }=\mathbf {1}_{d}^{\prime }\cdot \mathbf {A,}\) that is if both matrices \(\mathbf {A,B}\in \mathcal {M}_{d}\) are "expanded" by adding one more row whose elements are given by the sum of the elements of the associated column in the original matrix. Condition (18) hence rewrites \(\sum _{j}{g(\mathbf {b}_{j}^{\prime },\overline{b}_{j}) }\le \sum _{j}{g(\mathbf {a}_{j}^{\prime },\overline{a}_{j})}\) with g defined on \(\mathbb {R}^{d+1}\). Given that g is convex and homogeneous, then \(g(\mathbf {a}_{j}^{\prime },\overline{a}_{j})=\overline{a}_{j}g(\mathbf { a}_{j}^{\prime }/\overline{a}_{j},1)=\overline{a}_{j}h(\mathbf {a} _{j}^{\prime }/\overline{a}_{j})\) where \(h\in \mathcal {H}\), while for convenience empty classes receive weight \(\overline{a}=0\). Moreover, adding \(n_{A}n_{B}\) empty classes preserves the relation in (18). We have therefore obtained the index \(D_{h}\). Thus, \(D_{h}(\mathbf {B})\le D_{h}( \mathbf {A})\) \(\forall h\in \mathcal {H}\) is equivalent to (18) and to condition ii). \(\square\)
Appendix A.12 Proof of Proposition 4
Proof
Recall that from Theorem 1 for \(\mathbf {A,B}\in \mathcal {M}_{d}\) we have that \(\mathbf {B}\preccurlyeq \mathbf {A}\) for all orderings \(\preccurlyeq\) satisfying axioms MixC, ISC, IEC, IPC if and only if \(Z( \mathbf {B})\subseteq Z(\mathbf {A})\).
Consider matrices \(\tilde{\mathbf {A}},\tilde{\mathbf {B}}\) of dimension \(d\times n\) obtained from \(\mathbf {A},\mathbf {B}\) through split and permutation of classes and deletion of empty classes, so that \(\mathbf {1} _{d}^{\prime }\tilde{\mathbf {B}}=\mathbf {1}_{d}^{\prime }\tilde{\mathbf {A}}= \frac{d}{n}\mathbf {1}_{n}^{\prime }\). Then, by construction, \(Z(\mathbf {B})\ \subseteq \ Z(\mathbf {A})\Leftrightarrow Z(\tilde{\mathbf {B}})\ \subseteq \ Z(\tilde{\mathbf {A}})\). According to Lemma 3 the latter zonotope inclusion condition is equivalent to \(n\mathbf {p}^{\prime }\tilde{\mathbf {B}} =n\mathbf {p}^{\prime }\tilde{\mathbf {A}}\varvec{\Theta }\), for \(\varvec{\Theta }\in \mathcal {C}_{n}\) and for any \({{\textbf {p}}}\in {\mathbb {R}^{d}}\). From Hardy et al. (1934), this condition is equivalent to \(\sum _{j}{\frac{1}{n}\phi (n\mathbf {p}\tilde{ \mathbf {b}}_{j})}\le \sum _{j}{\frac{1}{n}\phi (n\mathbf {p}\tilde{\mathbf {a}} _{j})}\) for all convex functions \(\phi :\mathbb {R}\rightarrow \mathbb {R}\). Consider a set of indices \(\mathcal {K}_{j}\) such that the classes of \(\tilde{ \mathbf {B}}\) are obtained from those of \(\mathbf {B}\) by split transformations. Thus for any class j in \(\mathbf {B}\) there is a nonempty set \(\mathcal {K}_{j}\) of associated classes in \(\tilde{\mathbf {B.}}\) Let \(k\in \mathcal {K}_{j}\) denote a generic element of the set \(\mathcal {K}_{j}\) associated with the class k in \(\mathbf {B}\) and consider a related "splitting" weight \({\alpha _{k}}\) such that \(\sum _{k\in \mathcal {K}_{j}}{ \alpha _{k}}=1\) for any \(j=1,...,n_{B},\) we can then write \(\tilde{\mathbf {b} }_{k}=\alpha _{k}\mathbf {b}_{j}\), with by construction \(n=\frac{1}{\overline{ \tilde{b}}_{k}}=\frac{1}{\alpha _{k}\overline{b}_{j}}\) where \(\overline{b} _{j}:=\mathbf {1}_{d}^{\prime }\mathbf {b}_{j}\) and similarly for \(\overline{ \tilde{b}}_{k}.\) It follows that the argument of the evaluation function \(\phi\) defined for matrix \(\tilde{\mathbf {B}},\) becomes \(\frac{1}{n}\phi (n \mathbf {p}\tilde{\mathbf {b}}_{k})=\overline{\tilde{b}}_{k}\phi (\mathbf {p} \tilde{\mathbf {b}}_{k}/\overline{\tilde{b}}_{k})=\alpha _{k}\overline{b} _{j}\phi (\alpha _{k}\mathbf {p}\mathbf {b}_{j}/(\alpha _{k}\overline{b} _{j}))=\alpha _{k}\overline{b}_{j}\phi (\mathbf {p}\mathbf {b}_{j}/\overline{b} _{j})\). It follows that \(\sum _{j}{\frac{1}{n}\phi (n\mathbf {p}\tilde{\mathbf { b}}_{k})}=\sum _{j}{\sum _{k\in \mathcal {K}_{j}}{\alpha _{k}\overline{b} _{j}\phi (\mathbf {p}\mathbf {b}_{j}/\overline{b}_{j})}}=\sum _{j}\overline{b} _{j}\phi (\mathbf {p}\mathbf {b}_{j}/\overline{b}_{j})\). An analogous sequence of transformations applies to \(\sum _{j}{\frac{1}{n}\phi (n\mathbf {p}\tilde{ \mathbf {a}}_{j})}\) that leads to \(\sum _{j}\overline{a}_{j}\phi (\mathbf {pa} _{j}/\overline{a}_{j})\), thereby yielding \(D_{\mathbf {p},\phi }(\mathbf {B})\ \le \ D_{\mathbf {p},\phi }(\mathbf {A})\) for all \(\mathbf {p}\in \mathbb {R} ^{d}\) and for every \(\phi\) convex that is equivalent to \(Z(\tilde{\mathbf {B} })\ \subseteq \ Z(\tilde{\mathbf {A}})\) that in turns is equivalent to \(Z( \mathbf {B})\ \subseteq \ Z(\mathbf {A})\) that is equivalent to unanimity for all orderings satisfying MixC, ISC, IEC, IPC. \(\square\)
Appendix A.13 Proofs of Corollary 2
Proof
By Theorem 1, (6) is equivalent to \(\tilde{\mathbf {B}}= \tilde{\mathbf {A}}\mathbf {X}\) for \(\mathbf {X}\in \mathcal {R}_{n_{A},n_{B}}\), which gives condition (i). Each entry in the first row of \(\tilde{ \mathbf {A}}\) is a constant equal to \(1/n_{A}\), so it can be transformed by \(\mathbf {X}\) into the corresponding element in \(\tilde{\mathbf {B}}\), equal to \(1/n_{B}\), only by multiplying each single entry by \(n_{A}/n_{B}\), thus (ii) should also hold. \(\square\)
Appendix A.14 Proofs of Corollary 3
Proof
A formal proof draws on the fact that any (inequality reducing) PD transfer of an income share \(\lambda\) among classes j and k can be formalized through a linear transformation of vector \(\mathbf {a}^{\prime }\) towards \(\mathbf {b}^{\prime }\) involving a Ttransform matrix \(\mathbf {T(}\lambda ,k,j\mathbf {)}\), such that \(\mathbf {b}^{\prime }=\mathbf {a }^{\prime }\cdot \mathbf {T}(\lambda ,k,j)\), with \(\mathbf {T(}\lambda ,k,j \mathbf {)}:=\lambda \mathbf {I}_{n}+(1\lambda )\varvec{\Pi }_{j,k}\), where \(\mathbf {I}_{n}\) is the identity matrix, \(\lambda \in [0,0.5]\) and \(\varvec{\Pi }_{j,k}\in \mathcal {P}_{n}\) is a permutation matrix obtained from \(\mathbf {I}_{n}\) by permuting columns j and k. Given a matrix \(\mathbf {A}\in \mathcal {M}_{d}\) with n columns, let \(\mathbf {S} (\lambda ,k,j)\in \mathcal {R}_{n_{A},n_{B}}\) be a row stochastic matrix that splits column k of \(\mathbf {A}\) and merges a share \((1\lambda )\) of k with column j. This row stochastic matrix writes:
where \(\lambda \in [0,0.5]\) and \(\varvec{\Pi }_{n+1,k}\in \mathcal {P}_{n+1}\) is a \(n+1\) dimensional permutation matrix obtained from \(\mathbf {I}_{n+1}\) by permuting columns \(n+1\) and k. Any Ttransform involves a proportional movement of population masses from two classes, which amounts to repeating twice a sequence of splits and merges \(\mathbf {S} (\lambda ,k,j)\), so that \(\mathbf {T}(\lambda ,k,j):=\mathbf {S}(\lambda ^{\prime },k,j)\cdot \mathbf {S}(\lambda ^{\prime \prime },j,k)\), where the splitting parameters must satisfy \(\lambda ^{\prime \prime }=1\lambda\) and \(\lambda ^{\prime }=\frac{12\lambda }{1\lambda }\). \(\square\)
Appendix A.15 Proof that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) from example (1)
We first show that \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\) for distribution matrices \(\mathbf {A},\mathbf {B}\) in example (1). Note that after splitting equally column 1 of matrix \(\mathbf {B}\) into two columns, one obtains matrix \(\tilde{\mathbf {B}}=(\frac{1}{2}\mathbf {b}_{1},\frac{1}{2} \mathbf {b}_{1},\mathbf {b}_{2},\mathbf {b}_{3})\) so that both matrices \(\tilde{ \mathbf {B}}\), \(\mathbf {A}\) share the same distribution for one of the groups (the first row), which is uniform. The zonotopes of these matrices are threedimensional objects, but the inclusion \(Z(\mathbf {B})=Z(\tilde{\mathbf { B}})\subseteq Z(\mathbf {A})\) can be verified by fixing the dimension related to group 1 and then focusing on inclusion in terms of facets of the zonotope obtained by the intersection of each zonotope with the hyperplanes identified by levels of pupulation of group 1 and parallel to the orthants of groups 2 and 3. Among these factes, only three are relevant: those corresponding to proportions of group 1 equal to \(\frac{1}{4}\), \(\frac{2}{4}\) and \(\frac{3}{4}\). If inclusion of the facets of \(Z(\mathbf {B})\) into those of \(Z(\mathbf {A})\) is verified for each relevant share of group 1, then it is verified for any share of group 1.^{Footnote 24}
The facets of \(Z(\mathbf {B})\) (dark gray) and \(\mathbf {Z(\mathbf {A})}\) (light gray) at given proportions of group 1 are represented in Fig. 3. They are taken from figure 1(b), representing the threedimensional zonotope of \(\mathbf {A.}\) The scaling of the axis eases the visual representation of the projections coordinates. Zonotope inclusion is granted for each projection (moving northeast implies larger shares of group 1 population), implying \(Z(\mathbf {B})\subseteq Z(\mathbf {A})\).
Next, we show that every class of \(\mathbf {B}\) can be obtained through merge and split operations from the classes of \(\mathbf {A}\) and yet \(\mathbf {B} \not \preccurlyeq ^{R}\mathbf {A}\). Consider the following rowstochastic matrices:
Matrix \(\mathbf {X}^{1}\) is such that column \(\mathbf {x}_{1}^{1}\) displays a merge of classes 1 and 2 of matrix \(\mathbf {A}\) yielding \(\mathbf {b}_{1}\), whereas the remaining classes are merged and split uniformly into two classes. Every column \(\mathbf {x}_{j}^{j}\) is the result of split and merge operations that determine class \(\mathbf {b}_{j}\) of the distribution matrix \(\mathbf {B}\), whereas the remaining classes are merged and then split uniformly. The products \(\mathbf {A}\mathbf {X}^{j}\) give the following matrices \(\mathbf {B}^{j}\):
It is clear that \(\mathbf {B}^{j}\preccurlyeq ^{R}\mathbf {A}\) which implies \(Z(\mathbf {B}^{j})\subseteq Z(\mathbf {A})\) (from Remark 6) for any \(j=1,2,3\). Now consider obtaining classes of matrix \(\mathbf {B}\) as a weighted average of classes of \(\mathbf {B}^j\) using the weights \(w_{j}^{k}\), \(j=1,2,3\) and \(k=1,2,3\) such that \(w_{j}^{j}=1\) \(\forall j\) and \(w_{j}^{k}=0\) when \(k\ne j\). These weights are consistent with condition (4) (and hence with axiom MixC, see the second part of the proof of Theorem 1), but not with those implied by axiom StrongMixC (they are not constant for given j). Matrix \(\mathbf {B}\) can be obtained from \(\mathbf {B} ^{j}\) using \(\mathbf {b}_{k}=\sum _{j}{w_{j}^{k}\mathbf {b}_{k}^{j}}=\mathbf {b} _{k}^{k}\) for any \(k=1,2,3\) \(\forall k\). This implies \(Z(\mathbf {B} )\subseteq Z(\mathbf {A})\) (from Lemma 5). Yet, using the weighting scheme we obtain \(\mathbf {B}=\mathbf {A}\mathbf {X}\) with \(\mathbf {x} _{k}=\sum _{j}{w_{j}^{k}\mathbf {x}_{k}^{j}}=\mathbf {x}_{k}^{k}\) \(\forall k\), but \(\mathbf {X}\not \in \mathcal {R}_{4,3}\) since there is a unique \(\mathbf {X}\) with nonnegative entries that transforms \(\mathbf {A}\) into \(\mathbf {B}\) as in (3) that is not rowstochastic, thus \(\mathbf {B} \not \preccurlyeq ^{R}\mathbf {A}\). This counterexample shows that matrix majorization is not consistent with the axiom MixC.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Andreoli, F., Zoli, C. Robust dissimilarity comparisons with categorical outcomes. Soc Choice Welf (2022). https://doi.org/10.1007/s00355022014191
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00355022014191
JEL Classification
 D63
 J71
 J62
 D30