The authors of this paper are to be praised for their well-written survey. They provide a comprehensive account of the measurement of economic inequality in one and many dimensions, focussing on Lorenz-like orderings. In addition, several fresh views on multidimensional inequality are given regarding the consequences of aggregation, the meaning of correlation increasing transfers, and the connection of inequality with the dissimilarity of multivariate distributions.

My remarks supplement their work from a practical viewpoint, that is, of a data analyst who wishes to compare distributions of socio-economic endowments regarding their inequality. The analyst draws conclusions from data about income, wealth, education and other attributes of the populations under inquiry. But these data are more or less complete, reliable and accurate. Their empirical distribution has to be seen as an unknown “true” distribution values plus an eventual noise or contamination.

With one attribute, empirical Lorenz (or generalized Lorenz) curves are determined and point-wise compared, in order to decide whether one population is more unequal than the other. However, empirical Lorenz curves, say of income, rather often cross and exhibit no clear order. Such data tells us that the curves either are “really” non-ordered (i.e. there exist Schur-convex functions that rank them in different directions) or that they are ordered but the noise overlays the ordering.

To cope with crossing Lorenz curves, a weakening of the usual Lorenz dominance is often in place. Several approaches offer themselves:

  • Shifting to a weaker relation that has a specific meaning, like higher degree dominance.

  • Restricting the order to an “essential part” of the data.

  • Restricting the comparison to one (or several) inequality indices.

  • Building a stochastic model of data generation and performing a statistical test for Lorenz order.

When two empirical Lorenz curves, \(L_X\) and \(L_Y\), cross, this often happens in their tails. If they are ordered in their “middle part”, say

$$\begin{aligned} L_X(t) \le L_Y(t) \quad \text {for all} \;\; t\in [t_0, t_1]\,, \end{aligned}$$

we consider them as quantile-restricted Lorenz ordered or \((t_0, t_1)\)-Lorenz ordered [9]. Of course, \([t_0, t_1]\) must include a large enough interval, covering e.g. the coefficient of minimal majority \(MM(X)=L_X^{-1}(0.5)\). Arguments backing this approach are:

  • Data in the tails are often less reliable and precise, compared to those in the middle.

  • Tail data may be missing due to reluctant answering behavior.

  • Sampling schemes of Official Statistics usually exclude the very poor and the very rich.

  • Recipients of middle incomes appear to be most decisive in elections.

  • The actual size of very rich incomes, beyond some level, is of little public interest; similarly, that of the very poor.

Observe that, if all people in the lower part of the population, up to quantile \(t_0\), have the same income, and all people in the upper part beyond quantile \(t_2\) have the same income, then the \((t_0,t_1)\)-restricted generalized Lorenz order is equivalent to the unrestricted generalized Lorenz order.

Many authors have constructed tests for generalized Lorenz order (or distribution equality) against non-order, \(H_0: GL_X(t) \le GL_Y(t)\) for all t (or \(H_0: GL_X(t) = GL_Y(t)\) for all t) against not \(H_0\), among them [7, 11], and [3]. However, to statistically establish the Lorenz order, reverse tests are needed, testing non-order against generalized Lorenz order, an approach, which has been pursued by [5] and [4].

Next let us consider multidimensioned Lorenz orderings of inequality. Our remarks concern positive price majorization (in the terminology of [1]), while other multivariate extensions of usual Lorenz order can be similarly treated.

Let X and Y have multivariate empirical distributions in \({\mathbb {R}}^d\) and define: X is less unequal than Y in positive price majorization, if \(p'X\) is less unequal than \(p'Y\) in univariate Lorenz order, for every \(p\in {\mathbb {R}}^d_+\). With other words, for every non-negative price vector \(p\in {\mathbb {R}}^d_+\), the budget \(p'X\) is less unequal than the budget \(p'Y\). Positive price majorization has many names in the literature, among them price Lorenz order and positive directional majorization.

With univariate data, two empirical Lorenz curves are straightforwardly checked whether one dominates the other. With multivariate data, this task is comparatively simple if we are able to restrict to a few prices p. Otherwise it comes out to be more involved. To check the data for positive price majorization, we may use a characterization of the order by upper regions.

The distribution of X is fully characterized by its zonoid central regions [6],

$$\begin{aligned} D_\alpha (X)= \left\{ \sum _{i=1}^n \theta _i x_i : \sum _{i=1}^n \theta _i =1,\; 0\le \alpha \theta _i \le \frac{1}{n}\right\} , \quad \alpha \in [0,1]. \end{aligned}$$

The regions form nested convex sets, which can be regarded as inter-quantile regions, ranging from

$$\begin{aligned} D_1(X)= \left\{ \frac{1}{n} \sum _{i=1}^n x_i \right\} \quad \text {to} \quad D_0(X) ={conv }(x_1,\dots , x_n). \end{aligned}$$

Positive price majorization is characterized as follows [8]:

FormalPara Proposition 1

X is less unequal than Y in positive price majorization if and only if

$$\begin{aligned} D_\alpha (\tilde{X})\oplus {\mathbb {R}}^d_+ \subset D_\alpha (\tilde{Y})\oplus {\mathbb {R}}^d_+\quad \text {for all} \;\; 0\le \alpha \le 1. \end{aligned}$$
(1)

Here \(\oplus \) denotes the Minkowski sum, \(K\oplus R= \{x+y : x\in K, y\in R\}\). The set

$$\begin{aligned} D_\alpha (\tilde{X})\oplus {\mathbb {R}}^d_+ = \left\{ z : z\ge \sum _{i=1}^n \theta _i \tilde{x}_i : \sum _{i=1}^n \theta _i =1, \quad 0\le \alpha \theta _i \le \frac{1}{n}\right\} \end{aligned}$$

will be mentioned as a zonoid upper region. Observe that

$$\begin{aligned} D_\alpha (\tilde{X}) \subset D_\alpha (\tilde{Y}) \end{aligned}$$
(2)

if and only if every vertex of \(D_\alpha (\tilde{X})\) is a convex combination of the vertices of \(D_\alpha (\tilde{Y}).\)

The set of vertices of a zonoid trimmed region can be exactly calculated by a breadth-first search algorithm [10]. The normal of each facet is identified by exactly d data points, which yield its ridges. As we have to verify (1) and not (2), instead of considering arbitrary normals in \({\mathbb {R}}^d\) we consider only non-negative ones. From the ridges, information about the adjacency of the facets is extracted. The facets are arranged through a tree-based order, by which the whole surface can be traversed efficiently with a minimal number of computations. The algorithm has been programmed in C++ and is available in the R-package WMTregions [2]. The task is simplified by dropping less important attributes (if there are any), which means setting one or more prices, \(p_j\), to 0.

Like the univariate Lorenz order, its multivariate extension can be quantile-restricted. Zonoid regions are multivariate analogues of inter-quantile intervals. “Outer data” are excluded from further analysis by comparing zonoid upper regions only above some minimum level \(\alpha _0\). X is less unequal than Y in the quantile-restricted price Lorenz order if

$$\begin{aligned} D_\alpha (\tilde{X})\oplus {\mathbb {R}}^d_+ \subset D_\alpha (\tilde{Y})\oplus {\mathbb {R}}^d_+ \quad \text {for all} \;\; \alpha _0\le \alpha \le 1. \end{aligned}$$

Finally, zonoid central (as well as upper) regions can be regarded as set-valued statistics. For this, the data are considered in a probabilistic setting. Positive price majorization and its characterization by upper regions can be extended to any random vectors X and Y in \({\mathbb {R}}^d\) that have finite expectations; for details see [8]. We regard the data \((x_1, x_2, \dots , x_n)\) as the realization of a random sample \((X_1, X_2, \dots , X_n)\) from X. Then the empirical region

$$\begin{aligned} D_\alpha (X_1, X_2, \dots , X_n) \end{aligned}$$
(3)

is a set-valued statistic that estimates the central region \(D_\alpha (X)\).

Zonoid central regions (3) satisfy a Law of Large Numbers. They are strongly consistent estimators of \(D_\alpha (X)\), viz.

$$\begin{aligned} D_\alpha (X) = H\text {-}\lim _{n\rightarrow \infty } D_\alpha (X_1, X_2, \dots , X_n)\,. \end{aligned}$$

\(H\text {-}\lim \) means limit in the Hausdorff distance \(\delta _H(K,R)= \max _{p\in S^{d-1}} |h_K(p)-h_R(p)|\), where \(h_K(p)=\max _{x\in K} p'x, p\in S^{d-1},\) is the support function of a convex body \(K \subset {\mathbb {R}}^d\). More explicitly,

$$\begin{aligned} P[\lim _{n\rightarrow \infty } \max _{p\in S^{d-1}}|h_{D_\alpha (X_1,\dots , X_n)}(p) - h_{D_\alpha (X)}(p)| = 0] =1. \end{aligned}$$

Moreover, a Central Limit Theorem holds for the empirical regions (3); see [8].

This allows for statistical inference about positive price majorization, specifically about the inclusion of zonoid regions at a given level \(\alpha \). Let \(\alpha \in ]0,1[\). Given two samples \((X_1,\dots , X_n)\) from X and \((Y_1,\dots ,Y_m)\) from Y we aim at securing the hypothesis

$$\begin{aligned} {H_1:} D_\alpha (\tilde{X})\oplus {\mathbb {R}}^d_+ \subset D_\alpha (\tilde{Y})\oplus {\mathbb {R}}^d_+\quad \text {against the null } H_0: not H_1. \end{aligned}$$

\(H_1\) is equivalent to \(h_{D_\alpha (\tilde{X})}(p)\le h_{D_\alpha (\tilde{Y})}(p)\) for all \(p\ge 0\), that is, by \(h_{D_\alpha (\tilde{X})}(p)=\frac{1}{\alpha }E[p'\tilde{X}]\), to

$$\begin{aligned} H_1: E[p'\tilde{X}] \le E[p'\tilde{Y}] \quad \text {for all} \;\; p\ge 0, \end{aligned}$$

which may be tested through a proper bootstrap procedure.