Estimation of Concentration Measures and Their Standard Errors for Income Distributions in Poland
Authors
 First online:
DOI: 10.1007/s1129401293614
Abstract
Measures of concentration (inequality) are often used in the analysis of income and wage size distributions. Among, them the Gini and Zenga coefficients are of greatest importance. It is well known that income inequality in Poland increased significantly in the period of transformation from a centrally planned economy to a market economy. High income inequality can be a source of serious problems, such as increasing poverty, social stratification, and polarization. Therefore, it seems especially important to present reliable estimates of income inequality measures for a population of households in Poland in different divisions. In this paper, some estimation methods for Gini and Zenga concentration measures are presented together with their application to the analysis of income distributions in Poland by socioeconomic groups. The basis for the calculations was individual data coming from the Polish Household Budget Survey conducted by the Central Statistical Office. The standard errors of Gini and Zenga coefficients were estimated by means of the bootstrap and the parametric approach based on the Dagum model.
Keywords
Income distribution Income inequality Variance estimationJEL
C10 J30Introduction
Measures of inequality are widely used to study income, welfare, and poverty issues. They can also be helpful to analyze the efficiency of a tax policy or to measure the level of social stratification and polarization. They are most frequently applied to dynamic comparisons (comparing inequality across time). The Gini concentration coefficient based on the Lorenz curve is the most widely used measure of income inequality. The Zenga point concentration measure, based on the Zenga curve, has recently received some attention in the literature.
The true values of income inequality coefficients are usually unknown and they can only be estimated on the basis of sample data coming from household budget surveys. Estimators of concentration coefficients are usually nonlinear, thus their standard errors cannot be obtained easily. The methods of variance estimation that can solve this problem include: various replication techniques, Taylor expansion, and parametric procedures based on income distribution models.
The main objective of the paper is to use survey data to analyze income inequality in Poland by socioeconomic groups by means of selected concentration measures and their decomposition. This approach can further be used to assess relative economic affluence of one subpopulation with respect to another and to estimate stratification indices. To complete the analysis, some variance estimation techniques that can be used to estimate the standard errors of Gini and Zenga inequality measures should also be presented and applied.
Estimation of Income Concentration Measures
 L(p)

the Lorenz function
 p = F(y)

cumulative distribution function of income.
 y _{(i)}

household incomes in a nondecsending order,
 w _{ i }

survey weight for ith economic units, and
 \( \sum\limits_{{j = 1}}^i {{w_j}} \)

rank of jth economic unit in nelement sample.
The component G _{ w } is the contribution of withingroups’ inequality to the Gini index and G _{ b } is the contribution of net betweengroups’ inequality, while G _{ t } denotes the contribution of populations overlapping, also called transvariation. The terms p _{ j } and s _{ j } denote the population and income shares of the jth subpopulation, respectively. The term D _{ jh }, called either economic distance ratio or REA, plays a crucial role in the decomposition (3), and can be regarded as the measure of relative economic affluence of the jth subpopulation with respect to the hth subpopulation:
The area below the Z_{p} curve representing the concentration area is equal to 1 in the case of perfect concentration, and takes value 0 when all incomes are equal. The Zenga curve does not represent the forced behavior, as does the Lorenz curve, so it can take various shapes depending on the underlying income distribution model.
 y _{ i:n }

ith order statistics in nelement sample based on weighted data, and
 \( \bar{y} \)

sample arithmetic mean.
The estimator (5) has been proven to be consistent and asymptotically normally distributed.
Estimation of Standard Errors

The precision of an estimator T _{ n } is usually discussed in terms of its sampling variance D ^{2}(T _{ n }) or its standard error being simply the square root of the variance.

In many cases, the exact value of sampling variance is unknown, because it depends on unknown population quantities

After survey data have been obtained, however, an estimate of the variance \( {\widehat{D}^2}(\widehat{\theta }) \) can be calculated.

For most income concentration measures (Gini and Zenga indices among them), explicit variance estimators are theoretically complicated—it is hard to derive general mathematical formulas for nonlinear statistics, especially when the sampling design is complex.

Taylor linearization technique,

Random groups method,

Balanced Half Samples (BHS), also called Balanced Repeated Replication (BRR),

Jacknife technique,

Bootstraping,

Parametric approach based on maximum likelihood theory,

Generalized Variance Function (GVF) first applied in Current Polpulation Survey CPS in 1947.
In the context of inequality measures Taylor linearization, the jackknife, and the bootstrap are the methods of variance estimation most often applied (see: Verma and Betti 2005; Davidson 2009; Kordos and Zięba 2010).
 g′(θ)

first derivative of a function g(θ)
 V(Y _{ i })

variance of a random variable Y _{i}
 cov(Y _{ i }, Y _{ j })

covariance between variables Y _{ i } and Y _{ j }.
 T _{(l)}

the value of T based only on the data that remain after omitting the lth group,
 T _{(Q)}

jacknife estimator of \( \theta \) defined as the simple arithmetic mean of pseudovalues, and
 L

number of jacknife samples.

an inequality measure of interest can be expressed as a function g(θ ) of the parameters θ of an income distribution model given by a density function f(y,θ),

the density function is well fitted to data, and

the ML (maximum likelihood) estimates T _{ n } of the parameters θ can be obtained.
Application
The results of the calculations were obtained on the basis of the data coming from the Polish Household Budget Survey (HBS) for the years 2006 and 2008. In 2006 the randomly selected sample covered 37,508 households, i.e., approximately 0.3 % of the total number of households, while in 2006 the total sample size was 37,584. The samples were selected by twostage stratified sampling with unequal inclusion probabilities for primary sampling units. In order to maintain the relation between the structure of the surveyed population and the sociodemographic structure of the total population, data obtained from the HBS were weighted with the structure of households by number of persons and class of locality coming from the Population and Housing Census 2002. The basic analysis presented in the paper was conducted after dividing the overall sample by socioeconomic group, constructed according to the exclusive or main source of maintenance.
First, according to the formulas (2), (3), and (5), the estimates of Gini and Zenga inequality measures were calculated and the Gini index was decomposed into between and withingroups inequality. Then, the estimates of their standard errors were obtained using two variance estimation methods: bootstraping and parametric approach. The estimation of Gini and Zenga coefficients for the entire population was also carried out. As a theoretical distribution model, the Dagum typeI function was used (see: Dagum 1977).
Estimated values of Gini and Zenga inequality measures by socioeconomic group and boostrap estimates of their standard errors (first row2006, second row 2008)
Socioeconomic group 
Gini index \( \widehat{G} \) 
Standard error of \( \widehat{G} \) 
Coeff. of Variation CV 
Zenga index \( \widehat{Z} \) 
Standard error of \( \widehat{Z} \) 
Coeff. of Variation CV 

1. Employees 
0.29 
0.0043 
0.0150 
0.25 
0.0073 
0.0293 
0.29 
0.0065 
0.0222 
0.26 
0.0130 
0.0504  
2. Farmers 
0.40 
0.0145 
0.0359 
0.45 
0.0266 
0.0595 
0.43 
0.0181 
0.0423 
0.49 
0.0333 
0.0674  
3. Selfemployed 
0.36 
0.0198 
0.0551 
0.38 
0.0356 
0.0938 
0.32 
0.0132 
0.0412 
0.31 
0.0224 
0.0729  
4. Retirees and pensioners 
0.29 
0.0038 
0.0132 
0.24 
0.0062 
0.0255 
0.30 
0.0046 
0.0154 
0.25 
0.0084 
0.0327  
5. Nonearned sources 
0.36 
0.0335 
0.0928 
0.38 
0.0615 
0.1595 
0.36 
0.0185 
0.0508 
0.38 
0.0324 
0.0847  
Total 
0.34 
0.0042 
0.0124 
0.33 
0.0079 
0.0240 
0.35 
0.0045 
0.0132 
0.34 
0.0093 
0.0275 
Income inequality decomposition by subpopulations in 2008 (socioeconomic groups)
1. 
Betweengroup inequality G _{ b } 
0.1479 (43 %)  
2. 
Withingroup inequality G _{ w } 
0.1132 (32 %)  
Contribution of 
– employees 
0.0854 (24.0 %)  
– farmers 
0.0014 (0.5 %)  
– selfemployed 
0.0021 (0.6 %)  
– pensioners and retirees 
0.0240 (7.0 %)  
–nonearned sources 
0.0003 (0.0 %)  
3. 
Transvariation G _{ t } 
0.0829 (24 %)  
4. 
Total income inequality G 
0.3440 (100 %) 
Average family income and economic distance ratios for socioeconomic groups in Poland in 2006
No. j 
Socioeconomic group 
Mean income [PLN] 
Economic distance ratio D _{ jh }  

1 
2 
3 
4 
5  
1 
Employees 
2,944 
0.00 
0.34 
0.42 
0.65 
0.78 
2 
Farmers 
3,644 
0.34 
0.00 
0.35 
0.75 
0.83 
3 
Selfemployed 
3,955 
0.42 
0.35 
0.00 
0.82 
0.88 
4 
Pensioners, retirees 
1,907 
0.65 
0.75 
0.82 
0.00 
0.32 
5 
Nonearned sources 
1,585 
0.78 
0.83 
0.88 
0.32 
0.00 
The net betweengroups component G _{ b } contributes 43 % of the total Gini coefficient. The highest value of economic distance ratio was observed between nonearned sources and selfemployed (D = 0,88)—the economic situation of self employed is 88 % better than the nonearned sources (see: Table 3). The transvariation component G _{ t } describing the overlapping of the subpopulations accounts for the remaining 24 % of the total income inequality in Poland.
Parametric estimates of the Gini and Zenga inequality measures and their standard errors based on the Dagum model parameters
Socioeconomic group 
Year 
Parameters 
Goodnessoffit 
Gini index 
CV [%] 
Zenga index 
CV [%]  

λ 
β 
δ  
1. Employees 
2006 
27.3830 
0.9572 
3.4436 
0.9753 
0.2934 
1.4 
0.2594 
2.5 
2008 
63.5020 
0.9445 
3.4498 
0.9704 
0.2939 
1.4 
0.2601 
2.5  
2. Farmers 
2006 
21.2122 
0.7441 
2.5230 
0.9441 
0.4231 
4.3 
0.4872 
6.8 
2008 
359.5840 
0.3681 
3.5045 
0.9543 
0.3922 
3.8 
0.4320 
4.9  
3. Selfemployed 
2006 
54.1232 
0.8122 
3.2129 
0.9524 
0.3275 
3.8 
0.3159 
6.7 
2008 
165.7337 
0.7905 
3.4738 
0.9527 
0.3058 
3.6 
0.2796 
6.4  
4. Pensioners and retirees 
2006 
4.6359 
1.0756 
3.2315 
0.9402 
0.3045 
1.6 
0.2776 
3.0 
2008 
5.3830 
1.1699 
3.0939 
0.9240 
0.3127 
1.7 
0.2916 
3.1  
5. Nonearned sources 
2006 
6.8157 
0.5471 
3.5911 
0.9547 
0.3322 
4.0 
0.3256 
5.3 
2008 
7.1906 
0.6218 
3.0583 
0.9665 
0.3697 
5.3 
0.3907 
8.3  
Total 
2006 
11.7510 
0.9056 
3.4928 
0.9685 
0.3407 
1.0 
0.3387 
1.7 
2008 
27.6980 
0.7937 
3.0316 
0.9634 
0.3524 
1.6 
0.3461 
1.7 
Concluding Remarks
The paper considered the problem of efficient estimation of inequality indices on the basis of random samples, including the measurement of inequality within and between subpopulations. Reliable estimates of inequality indices are usually available only on the national level, whereas in this paper, the detailed results for socioeconomic groups were presented. They can be helpful to identify the sources of income inequality and poverty in Poland.
The results of the calculations presented in the paper reveal that the level of income inequality in Poland is high, as compared with many other European countries, especially for some socioeconomic groups. The main component of income inequality in Poland, when measured by the Gini index, is economic disparity between socioeconomic groups. The high value of the overlapping component suggests that the socioeconomic groups are not separated perfectly, so they cannot be regarded as strata.
In general, the inequality estimation was more efficient when the Gini index was applied, which resulted in fewer errors of estimates. On the other hand, the synthetic Zenga measure seemed more sensitive to slight changes of income inequality within the groups of households. Thus, it is clear that both inequality coefficients, accompanied by the measures of their precision, can be regarded as useful tools in income distribution analysis.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.