Measuring income inequality based on unequally distributed income

This paper proposes a new framework for measuring income inequality. The framework is based on the unequally distributed (UD) incomes that are obtained by removing the equally distributed parts from incomes. We then derive the normalized norm indexes from the cumulative distribution function and the unscaled Lorenz curve of the UD incomes. The relation between the normalized norm indexes and the popular Gini coefficient and coefficient of variation (CV) shows that the Gini coefficient and CV represent only parts of income inequality. We analyze example income distributions and the Luxembourg Income Study datasets to show that the normalized norm indexes evaluate income inequality appropriately and solve the negative income problem.

whether income inequality is decreasing, whether a redistribution policy is effective at reducing economic inequality, whether a country is more or less unequal than other countries, and so on. The conventional approach is to compute a single index from the income distribution and the corresponding Lorenz curve. The Gini coefficient, defined statistically as the mean absolute difference of all pairs of incomes divided by twice the mean income, was introduced by Gini (1914). The Gini coefficient was further generalized by Donaldson and Weymark (1980) and Yitzhaki (1983). Gastwirth (2014) showed that the Gini coefficient underestimates the rate increase in inequality and modified the Gini coefficient by dividing the mean absolute difference of all pairs of incomes by the median income. Other measures include the coefficient of variation (CV) and the Pietra index. The Pietra index suggested by Schutz (1951) is the portion of the total income that would have to be redistributed. The Pietra index is also known as the Hoover index, the Robin Hood index, and the Ricci-Schutz index. Zanardi (1964) developed an index to measure the asymmetry of the Lorenz curve. In recent contributions, Gallegati et al. (2016) and Clementi et al. (2019) propose the Zanardi index as an appropriate measure of income inequality. There is a class of indexes based on information theory, members of which include the Atkinson index, the generalized entropy index, and the Theil index (Atkinson 1970;Shorrocks 1980;Theil 1967). Such indexes measure diversity in incomes. For a general overview of income inequality measurement, refer to Hao and Naiman (2010), Jenkins and Kerm (2011) and Cowell (2011).
As mentioned above, most of the currently available income inequality indexes are derived from the income distribution and the corresponding Lorenz curve. This approach suffers from the negative income problem. Negative incomes are frequently observed in reality. Self-employed individuals will have negative incomes when their firms and organizations experience losses. Negative incomes are also observed when employees repay an amount of debt that exceeds their earnings. Since most income inequality indexes assume that incomes are nonnegative, the negative incomes observed in the reality create problems in inequality measurement. The Atkinson index, the generalized entropy index, and the Theil index are not computable. The Gini coefficient requires the normalization proposed by Chen et al. (1982) and Raffinetti et al. (2015). It is well known that the CV and Pietra index are appropriate only for nonnegative variables at a ratio scale. Since the income shares for negative incomes cannot be interpreted as proportions, all income inequality indexes involving income shares are neither computable nor interpretable. Therefore, various remedial adjustments of negative incomes have been devised. For example, Cowell (2011) and Hao and Naiman (2010) suggest the deletion of non-positive incomes. OECD (2016) recommends replacing negative incomes with zero incomes. Bellù and Liberati (2006) advise replacing zero incomes with arbitrarily small positive incomes. Raffinetti et al. (2016) study the effects of these remedial adjustments on the Gini coefficient.
These remedial adjustments indicate that the nonnegative income assumption does not correspond to reality. The conventional approach attempts to resolve the inconsistency between this assumption and reality by adjusting the data, not the assumption. Therefore, we need a realistic approach based on something other than the income distribution.
In a recent work, Park et al. (2018) suggest that income inequality should be measured from unequally distributed (UD) incomes, not incomes. UD incomes are obtained from the income distribution by removing the parts of incomes associated with equality. The UD income approach neither requires the nonnegative income assumption nor suffers from the negative income problem. Park et al. (2018) also propose that income inequality should be represented by two indexes reflecting the location and dispersion of UD incomes. However, they do not present a general framework for measuring income inequality based on UD incomes. The aims of the present paper are (i) to propose a framework using the cumulative distribution function (CDF) and the unscaled Lorenz curve of UD incomes; (ii) to show that income inequality has two dimensions, horizontal and vertical, under the proposed framework; (iii) to develop new indexes for the two dimensions; and (iv) to present the relation between the new indexes and the popular Gini coefficient and CV.
This paper is organized as follows. Section 2 briefly reviews UD income. Section 3 adopts the CDF of UD income and represents income inequality as two types of departure from perfect equality. Normalized norms of the departures are proposed as new income inequality indexes. It is also shown that the normalized norm indexes are decomposed into three constituents, the Rawlsian index, the Gini coefficient, and the CV. Section 4 adopts the unscaled Lorenz curve of UD income and develops an income inequality index. It is shown that the index is equivalent to one of the normalized norm indexes derived in Sect. 3. The normalized norm indexes are illustrated using example income distributions and the Luxembourg Income Study (LIS) datasets in Sect. 5. Section 6 presents concluding remarks.

Unequally distributed income
We denote the income distribution of a population of n individuals by y = (y 1 , y 2 , · · · , y n ), where y i is the income of i-th individual. Without loss of generality, we assume that the incomes are ordered, that is, y 1 ≤ y 2 ≤ · · · ≤ y n . The total income and mean income are denoted by S y = n i=1 y i and μ y = S y /n, respectively. We begin by noting that y i = y 1 + (y i − y 1 ), i = 1, 2, . . . n. This expression indicates that all income values include y 1 , ny 1 of S y is equally distributed among the n individuals, and the remaining S y − ny 1 is unequally distributed among the n individuals. The unequally distributed portions of the incomes, x i = (y i − y 1 ), i = 1, 2, . . . , n, are referred to as the UD incomes. The total and mean of the UD incomes are written as S x = S y − ny 1 = n μ y − y 1 and μ x = μ y − y 1 .
The income distribution encompasses the UD income distribution, and contains information about both equality and inequality. The UD income distribution is obtained by removing the parts of the incomes associated with equality from the income distribution. The amount of income associated with equality is ny 1 . The amount of income associated with inequality is S y − ny 1 , which is distributed as x. Income inequality definitely exists in the UD income distribution. It is therefore reasonable to measure income inequality from the UD income distribution.
As defined in Cowell (2011), inequality is a departure from some idea of equality. In the field of income inequality, equality is represented by a perfectly equal income distribution y pe = μ y , . . . , μ y , usually referred to as perfect equality. We should note that the most significant change from using the UD income approach occurs in the representation of perfect equality. The UD income distribution corresponding to perfect equality is obtained as x pe = (0, 0, · · · , 0), the origin of the UD income space.
The income inequality of an income distribution is usually evaluated by the discrepancy from its corresponding y pe . Two income distributions with the same n and different S y correspond to different y pe s. Theoretically, there are infinitely many y pe s in the income space. Each of the two income distributions is compared to its corresponding y pe . Two income distributions with the same n and S y in general have different ny 1 s. In this case, the two income distributions are compared to the same y pe . However, regardless of S y and μ y , perfect equality in the UD income approach is uniquely represented by x pe , the origin of the UD income space. The following two sections address how to measure the discrepancy of x from x pe .

Normalized norm indexes from CDF
A general way to describe a distribution is the CDF. The CDFs for x and x pe are There are two types of departure, the vertical departure and the horizontal departure. The vertical departure is the difference between F (x) and F pe (x). Let Q ( p) denote the quantile function of the UD income distribution x, which is defined as The quantile function for x pe is obtained as Q pe ( p) = 0 for 0 ≤ p ≤ 1. The horizontal departure is the difference between Q ( p) and Q pe ( p). The size of the difference between two functions is usually assessed by norms. Thus, we compute the 1 and 2 norms of the vertical and horizontal departures. Let us first consider the vertical departure. The 1 and 2 norms of the vertical departure, denoted by v 1 and v 2 , are defined as Applying the results of Dorfman (1979) and Yitzhaki (1998) where Γ x is the Gini mean difference (GMD) of the UD income. To convert the norms to indexes, we normalize v 1 and v 2 . Since v 1 is in monetary units and v 2 is in the square root of monetary units, v 1 is normalized by μ y , while v 2 is normalized by √ μ y . The normalized vertical norm indexes, denoted by v 1 and v 2 , are obtained as Since Γ x = Γ y and the Gini coefficient is the GMD divided by twice the mean, we have where G y and G x are the Gini coefficients of income and UD income. The normalized average vertical distance from perfect equality, v 1 , is equal to the Rawlsian index introduced by Park et al. (2018). Since the Rawlsian index is the ratio of total UD income to total income, it can be interpreted as the magnitude of income inequality. There is another justification for the Rawlsian index. Since the y i s are ordered, the simplest expression for perfect equality is y 1 = μ y . Therefore, the discrepancy μ x = μ y − y 1 and the normalized μ x are sensible measures for the departure from perfect equality. v 2 is obtained as a combination of the Rawlsian index and the Gini coefficient. It should be noted that the Gini coefficient is associated with the vertical departure and does not alone describe income inequality. Equation (3) contradicts the conventional notion that the greater income inequality is, the greater the Gini coefficient is. If a progressive transfer is not directed to the poorest, the mean UD income and the Rawlsian index are preserved while the Gini coefficient is reduced. Equation (3) states that progressive transfers preserving the mean UD income do not decrease the vertical departure. Income inequality can be improved through the progressive transfers reducing the mean UD income.
Next we consider the horizontal departure. As shown in Fig. 1, the horizontal departure is x i for (i − 1) /n < F (x) ≤ i/n. The 1 and 2 norms of the horizontal departure, denoted by h 1 and h 2 , are obtained as where V x is the variance of UD income.
where CV y and CV x are the CVs of income and UD income. It is easily verified that h 2 is equal to the normalized Euclidean distance of x from perfect equality in the UD income space. Henceforth, we denote v 1 and h 1 simply by 1 and call it the Rawlsian index.
We should note that the CV is associated with the horizontal departure. The Gini coefficient and CV are associated with different types of departure. Therefore, they are not substitutes for one another. Progressive transfers preserving the mean UD income reduce the horizontal departure but increase the vertical departure. Such progressive transfers affect the vertical and horizontal departures in opposite directions.
Most conventional income inequality indexes such as G y and CV y describe the dispersion of the income distribution. The Rawlsian index measures the location of the UD income distribution, while G x and CV x measure the dispersion of the UD income distribution. v 2 and h 2 are expressed as the combinations of the Rawlsian index and a dispersion measure.
The normalized norm indexes v 2 and h 2 suggest the following: (i) Income inequality has multiple dimensions. The normalized norm indexes v 2 and h 2 consider two dimensions of income inequality.
(ii) The normalized norm indexes v 2 and h 2 involve both the location and dispersion of the UD income distribution. The location of the UD income distribution, described by the Rawlsian index 1 , is the primary constituent of income inequality. (iii) The dispersion of the income distribution should be distinguished from income inequality. The popular Gini coefficient and CV are associated with this dispersion. According to Equations (3) and (5), the Gini coefficient and CV represent only parts of income inequality. (iv) Progressive transfers preserving the mean UD income reduce dispersion of the UD income distribution. Such a reduction in dispersion decreases the horizontal departure but increases the vertical departure. An effective way to reduce both types of departures is to decrease the Rawlsian index. The Rawlsian index can be decreased by progressive transfers to the poorest.

A normalized norm index from the unscaled Lorenz curve
One of the most popular graphical representations of an income distribution is the Lorenz curve. Income inequality is usually measured by the area enclosed by the Lorenz curves for the income distribution and perfect equality. The area results in the Gini coefficient. This section attempts to derive an income inequality index from the Lorenz curve for the UD income distribution. Similar to the Lorenz curve for income, the Lorenz curve for the UD income is defined as However, the Lorenz curve for UD income is not applicable, because the Lorenz curve for x pe is not defined due to μ x = 0. Thus, we use p 0 Q(t)dt, which is referred to as the unscaled Lorenz curve and denoted by L u ( p). Figure 2 depicts the unscaled Lorenz curves L u ( p) and L u pe ( p) for x and x pe . Note that L u pe ( p) lies on the horizontal axis. The departure of L u ( p) from L u pe ( p) can be assessed by area B, which is the 1 norm of the departure. The dotted diagonal line is not an unscaled Lorenz curve and does not represent perfect equality. The dotted diagonal line is a reference line for computing area A enclosed by L u ( p) and the dotted line. Since area A is μ x G x /2, area B is obtained as Normalizing B by μ y , we have The normalized 1 norm of the difference between L u ( p) and L u pe ( p) is equivalent to v 2 obtained in the previous section. That is, the CDF and the unscaled Lorenz curve for UD income result in equivalent indexes.

Application of normalized norm indexes
In this section, we apply the normalized norm indexes to example income distributions and real income datasets. First, consider the eight example income distributions in Table 1. The first seven income distributions y i , i = 1, 2, . . . , 7 have the same n = 5, S y = 15, and μ y = 3. Each of them is obtained from others by a series of transfers. So we can investigate the effect of transfers on income inequality. There are two couples of income distributions that G y and CV y fail to distinguish with respect to income inequality, but the normalized norm indexes succeed. In order to examine the negative income problem, we also included an income distribution with a negative income.
G y and CV y assess the income inequality of y 1 and y 2 as being the same. However, there is a difference between y 1 and y 2 ; 5 of S y is evenly distributed among the individuals in y 1 , while none of S y is evenly distributed among the individuals in y 2 . This difference is revealed by the Rawlsian index 1 . y 1 has less 1 than does y 2 . Although y 2 has less G x and CV x than y 1 , y 1 has less v 2 and h 2 than y 2 because it has less 1 . Unlike G y and CV y , the normalized norm indexes indicate that y 1 is more equal than y 2 . A similar result is obtained from a comparison of y 3 and y 4 . We can consider a series of progressive transfers preserving the mean UD income that transform y 1 into y 4 . According to G y and CV y , y 1 is more equal than y 4 . While y 1 and y 4 have the same Rawlsian index, y 4 has less G x and CV x than y 1 . However, according to v 2 and h 2 , y 4 is more vertically unequal and horizontally equal than y 1 . A progressive transfer preserving the mean UD income decreases horizontal inequality but increases the vertical inequality. That is, a progressive transfer preserving the mean UD income does not reduce income inequality. We can observe this phenomenon when we compare y 4 and y 5 . y 6 is obtained from y 5 by a progressive transfer reducing the mean UD income, i.e., the transfer reducing the Rawlsian index. Such progressive transfers decrease both horizontal income inequality and vertical income inequality and consequently decrease income inequality. Let us consider y 7 in which a negative income value is observed. G y and CV y for y 7 in Table 1 are computed without any manipulation of the negative income value. If the negative income value is removed from y 7 , y 7 will exhibit perfect equality. If the negative income value is replaced with zero, y 7 will become y 8 . Replacement with zero primarily reduces UD income and the Rawlsian index. Consequently, income inequality is underestimated as seen from 1 s, v 2 s and h 2 s for y 7 and y 8 . According to G y , y 7 is more unequal than y 1 and y 2 . However, the normalized norm indexes assume that y 7 is the most unequal income distribution among the income distributions in Table 1.
Next, we apply the normalized norm indexes to the latest LIS datasets for household disposable income after the year 2010 (LIS 2019). Because the years included in the datasets are not the same, this analysis is not intended to be a thorough cross-national comparison but is instead intended to test the applicability of the normalized norm indexes to cross-national comparison. The normalized norm indexes and the popular Gini coefficients and CVs for forty-two countries were computed and are presented in Table 2. The countries were arranged in increasing order of G y . The numbers in parentheses are ranks for each index. G y 's in Table 2 are slightly different from the Gini coefficients published by the LIS Data Center. This is because the LIS Data Center computes the Gini coefficient after removing negative and zero income values.
We first note that negative incomes are frequently observed. The Rawlsian index is greater than one when there are negative incomes. Twenty-seven countries have negative incomes. The normalized norm indexes provide considerably different information on income inequality than G y and CV y . The Gini coefficient indicates that Denmark and Norway belong to five most equal countries. However, all the normalized norm indexes indicate that Denmark and Norway are among the five most unequal countries. According to G y and CV y , Paraguay and South Africa are among the six most unequal countries. However, Paraguay and South Africa are two most equal countries from the perspective of h 2 .   Fig. 3. Figure 3 shows that less v 2 can be accompanied by greater h 2 , especially when both v 2 and h 2 are small. The countries with the eleven largest Rawlsian indexes are Norway, Peru, Poland, Denmark, the UK, India, Serbia, Mexico, Germany, Israel, and Luxembourg. These countries correspond to the countries with the eleven largest v 2 and h 2 . The ranks of these countries on the Rawlsian index, v 2 and h 2 are the same. The countries with the five smallest Rawlsian indexes are the Czech Republic, Hungary, Slovenia, Brazil, and Russia. Their ranks for v 2 and h 2 are low but somewhat different from one another. This indicates that the Rawlsian index dominates v 2 and h 2 when the Rawlsian index is large. The v 2 's and h 2 's are plotted against 1 in Figs. 4 and 5. Figures 4 and 5 show that there is a strong linear tendency between the Rawlsian index and the normalized For a given income distribution, a progressive transfer maintaining the Rawlsian index decreases h 2 but increases v 2 . Although the income distributions in Table 2 are different, we can observe similar negative correlations between h 2 and v 2 when the Rawlsian index is fixed. Figure 6 is a plot of v 2 against h 2 for seven countries with 1 = 1. The plot shows that h 2 and v 2 are negatively correlated.

Conclusions
Measurement is the foundation of quantitative research. The proper measurement of income inequality is important for studying economic inequality. In this paper, we proposed a new framework for measuring income inequality. The framework is based on the UD income distribution, which is obtained from the income distribution by removing the equally distributed parts of incomes. We then derived the normalized norm indexes from the CDF and the unscaled Lorenz curve for the UD income distribution. The normalized norm indexes represent the location and the dispersion of the UD income distribution, while most conventional income inequality indexes reflect only the dispersion of the income distribution. By analyzing example income distributions and the LIS datasets, we showed that the normalized norm indexes are appropriate measures of income inequality and solve the negative income problem. However, it should be noted that the proposed indexes represent only one-dimensional views of income inequality. They describe the horizontal and vertical dimensions of the two-dimensional UD income distribution. Therefore, the horizontal and vertical normalized norm indexes are not alternatives to one another. Both the horizontal and vertical normalized norm indexes should be taken into consideration to comprehensively understand income inequality. For the sake of simplicity without sacrificing comprehensiveness, it would be desirable to devise a new index that reflects a twodimensional view of income inequality. Our future research will be directed toward the development of such an index.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.