# Estimation of Concentration Measures and Their Standard Errors for Income Distributions in Poland

- First Online:

DOI: 10.1007/s11294-012-9361-4

- Cite this article as:
- Jędrzejczak, A. Int Adv Econ Res (2012) 18: 287. doi:10.1007/s11294-012-9361-4

## Abstract

Measures of concentration (inequality) are often used in the analysis of income and wage size distributions. Among, them the Gini and Zenga coefficients are of greatest importance. It is well known that income inequality in Poland increased significantly in the period of transformation from a centrally planned economy to a market economy. High income inequality can be a source of serious problems, such as increasing poverty, social stratification, and polarization. Therefore, it seems especially important to present reliable estimates of income inequality measures for a population of households in Poland in different divisions. In this paper, some estimation methods for Gini and Zenga concentration measures are presented together with their application to the analysis of income distributions in Poland by socio-economic groups. The basis for the calculations was individual data coming from the Polish Household Budget Survey conducted by the Central Statistical Office. The standard errors of Gini and Zenga coefficients were estimated by means of the bootstrap and the parametric approach based on the Dagum model.

### Keywords

Income distributionIncome inequalityVariance estimation### JEL

C10J30## Introduction

Measures of inequality are widely used to study income, welfare, and poverty issues. They can also be helpful to analyze the efficiency of a tax policy or to measure the level of social stratification and polarization. They are most frequently applied to dynamic comparisons (comparing inequality across time). The Gini concentration coefficient based on the Lorenz curve is the most widely used measure of income inequality. The Zenga point concentration measure, based on the Zenga curve, has recently received some attention in the literature.

The true values of income inequality coefficients are usually unknown and they can only be estimated on the basis of sample data coming from household budget surveys. Estimators of concentration coefficients are usually nonlinear, thus their standard errors cannot be obtained easily. The methods of variance estimation that can solve this problem include: various replication techniques, Taylor expansion, and parametric procedures based on income distribution models.

The main objective of the paper is to use survey data to analyze income inequality in Poland by socio-economic groups by means of selected concentration measures and their decomposition. This approach can further be used to assess relative economic affluence of one subpopulation with respect to another and to estimate stratification indices. To complete the analysis, some variance estimation techniques that can be used to estimate the standard errors of Gini and Zenga inequality measures should also be presented and applied.

## Estimation of Income Concentration Measures

- L(p)
the Lorenz function

- p = F(y)
cumulative distribution function of income.

*y*_{(i)}household incomes in a non-decsending order,

*w*_{i}survey weight for

*i*-th economic units, and- \( \sum\limits_{{j = 1}}^i {{w_j}} \)
rank of

*j*-th economic unit in*n-*element sample.

*n*divided into

*k*subpopulations can be decomposed as follows (Dagum 1997):

The component *G*_{w} is the contribution of within-groups’ inequality to the Gini index and *G*_{b} is the contribution of net between-groups’ inequality, while *G*_{t} denotes the contribution of populations overlapping, also called transvariation. The terms *p*_{j} and *s*_{j} denote the population and income shares of the *j*-th subpopulation, respectively. The term *D*_{jh}, called either economic distance ratio or REA, plays a crucial role in the decomposition (3), and can be regarded as the measure of relative economic affluence of the *j*-th subpopulation with respect to the *h*-th subpopulation:

*The Zenga synthetic inequality index Z*can be expressed as the area below the Zenga curve

*Z*

_{p,}which is based on the relation between income and population quantiles (see: Fig. 2):

The area below the Z_{p} curve representing the concentration area is equal to 1 in the case of perfect concentration, and takes value 0 when all incomes are equal. The Zenga curve does not represent the forced behavior, as does the Lorenz curve, so it can take various shapes depending on the underlying income distribution model.

*y*_{i:n}*i*-th order statistics in*n*-element sample based on weighted data, and- \( \bar{y} \)
sample arithmetic mean.

The estimator (5) has been proven to be consistent and asymptotically normally distributed.

## Estimation of Standard Errors

The precision of an estimator

*T*_{n}is usually discussed in terms of its sampling variance*D*^{2}(*T*_{n}) or its standard error being simply the square root of the variance.In many cases, the exact value of sampling variance is unknown, because it depends on unknown population quantities

After survey data have been obtained, however, an estimate of the variance \( {\widehat{D}^2}(\widehat{\theta }) \) can be calculated.

For most income concentration measures (Gini and Zenga indices among them), explicit variance estimators are theoretically complicated—it is hard to derive general mathematical formulas for nonlinear statistics, especially when the sampling design is complex.

Taylor linearization technique,

Random groups method,

Balanced Half Samples (BHS), also called Balanced Repeated Replication (BRR),

Jacknife technique,

Bootstraping,

Parametric approach based on maximum likelihood theory,

Generalized Variance Function (GVF)- first applied in Current Polpulation Survey CPS in 1947.

In the context of inequality measures Taylor linearization, the jackknife, and the bootstrap are the methods of variance estimation most often applied (see: Verma and Betti 2005; Davidson 2009; Kordos and Zięba 2010).

*T*

_{n}by a pseudoestimator

*g*(

*Y*) which is a linear function of sample observations. It is based on the first-order Taylor expansion around a parameter

*θ*and neglecting the remainder term:

*g*(

*Y*) as the approximation of the true estimator variance:

*g*′(*θ*)first derivative of a function

*g*(*θ*)*V*(*Y*_{i})variance of a random variable

*Y*_{i}- cov(
*Y*_{i},*Y*_{j}) covariance between variables

*Y*_{i}and*Y*_{j}.

*L*dependent groups of equal size. Next, for each group, the estimator

*T*

_{l}(called pseudovalue) is calculated based on the data that remain after omitting the

*l-*th group:

*T*_{(l)}the value of

*T*based only on the data that remain after omitting the*l*-th group,*T*_{(Q)}jacknife estimator of \( \theta \) defined as the simple arithmetic mean of pseudovalues, and

*L*number of jacknife samples.

*N*independent resamples (called bootstrap samples) by a design identical to the one by which the sample was drawn from the population, we calculate estimators \( T_k^{ * } \),

*k*= 1…

*N*. The bootstrap variance estimator is defined as:

*Y*can be approximated by a theoretical distribution model, the method of variance estimation based on maximum likelihood theory can also be used. Let us assume that:

an inequality measure of interest can be expressed as a function

*g*(**θ**) of the parameters**θ**of an income distribution model given by a density function*f*(*y*,**θ**),the density function is well fitted to data, and

the ML (maximum likelihood) estimates

**T**_{n}of the parameterscan be obtained.**θ**

*g*(

**θ**) takes the form:

**I**

_{θ}denotes the Fisher information matrix.

## Application

The results of the calculations were obtained on the basis of the data coming from the Polish Household Budget Survey (HBS) for the years 2006 and 2008. In 2006 the randomly selected sample covered 37,508 households, i.e., approximately 0.3 % of the total number of households, while in 2006 the total sample size was 37,584. The samples were selected by two-stage stratified sampling with unequal inclusion probabilities for primary sampling units. In order to maintain the relation between the structure of the surveyed population and the socio-demographic structure of the total population, data obtained from the HBS were weighted with the structure of households by number of persons and class of locality coming from the Population and Housing Census 2002. The basic analysis presented in the paper was conducted after dividing the overall sample by socio-economic group, constructed according to the exclusive or main source of maintenance.

First, according to the formulas (2), (3), and (5), the estimates of Gini and Zenga inequality measures were calculated and the Gini index was decomposed into between and within-groups inequality. Then, the estimates of their standard errors were obtained using two variance estimation methods: bootstraping and parametric approach. The estimation of Gini and Zenga coefficients for the entire population was also carried out. As a theoretical distribution model, the Dagum type-I function was used (see: Dagum 1977).

*N*= 5000. It can be easily noticed that the values of Zenga indices for socio-economic groups in Poland vary from 0.25 to 0.49, while the Gini coefficients take values from 0.29 to 0.43. Thus the Zenga coefficient seems to be more sensitive to differences between family incomes that the Gini one. The standard errors are significantly higher for the Zenga coefficient, being usually 3-6 % of the estimated values. The relative dispersion of the Gini index is usually 1–5 %. Additionally, Figs. 3 and 4 show that despite relatively small number of repetitions, the distributions of both inequality statistics can be approximated by the normal density curves.

Estimated values of Gini and Zenga inequality measures by socio-economic group and boostrap estimates of their standard errors (first row-2006, second row- 2008)

Socio-economic group | Gini index \( \widehat{G} \) | Standard error of \( \widehat{G} \) | Coeff. of Variation CV | Zenga index \( \widehat{Z} \) | Standard error of \( \widehat{Z} \) | Coeff. of Variation CV |
---|---|---|---|---|---|---|

1. Employees | 0.29 | 0.0043 | 0.0150 | 0.25 | 0.0073 | 0.0293 |

0.29 | 0.0065 | 0.0222 | 0.26 | 0.0130 | 0.0504 | |

2. Farmers | 0.40 | 0.0145 | 0.0359 | 0.45 | 0.0266 | 0.0595 |

0.43 | 0.0181 | 0.0423 | 0.49 | 0.0333 | 0.0674 | |

3. Self-employed | 0.36 | 0.0198 | 0.0551 | 0.38 | 0.0356 | 0.0938 |

0.32 | 0.0132 | 0.0412 | 0.31 | 0.0224 | 0.0729 | |

4. Retirees and pensioners | 0.29 | 0.0038 | 0.0132 | 0.24 | 0.0062 | 0.0255 |

0.30 | 0.0046 | 0.0154 | 0.25 | 0.0084 | 0.0327 | |

5. Non-earned sources | 0.36 | 0.0335 | 0.0928 | 0.38 | 0.0615 | 0.1595 |

0.36 | 0.0185 | 0.0508 | 0.38 | 0.0324 | 0.0847 | |

Total | 0.34 | 0.0042 | 0.0124 | 0.33 | 0.0079 | 0.0240 |

0.35 | 0.0045 | 0.0132 | 0.34 | 0.0093 | 0.0275 |

*G*

_{w}) accounted for 32 % of the overall inequality in Poland. The within-group component reflects the inner polarization of all the groups: what causes remarkable differentials in average income between managers and blue-collar workers within the group of employees, between entrepreneurs and the others within the group of self-employed, or between retirees and pensioners within the fourth group. Table 2 can also be helpful to answer the question to what extent particular groups contribute to the overall inequality. Because of very small income and population shares, the income disparities among the self-employed weigh only 0.6 % on the total inequality, while the contribution of farmers is even smaller being 0.5 %. The group with the highest share (24 %) in the overall Gini index is the group of employees

*.*

Income inequality decomposition by subpopulations in 2008 (socio-economic groups)

1. | Between-group inequality | 0.1479 (43 %) | ||

2. | Within-group inequality | 0.1132 (32 %) | ||

Contribution of | – | 0.0854 (24.0 %) | ||

– | 0.0014 (0.5 %) | |||

– | 0.0021 (0.6 %) | |||

– | 0.0240 (7.0 %) | |||

– | 0.0003 (0.0 %) | |||

3. | Transvariation | 0.0829 (24 %) | ||

4. | Total income inequality | 0.3440 (100 %) |

Average family income and economic distance ratios for socio-economic groups in Poland in 2006

No. j | Socio-economic group | Mean income [PLN] | Economic distance ratio | ||||
---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | |||

1 | Employees | 2,944 | 0.00 | 0.34 | 0.42 | 0.65 | 0.78 |

2 | Farmers | 3,644 | 0.34 | 0.00 | 0.35 | 0.75 | 0.83 |

3 | Self-employed | 3,955 | 0.42 | 0.35 | 0.00 | 0.82 | 0.88 |

4 | Pensioners, retirees | 1,907 | 0.65 | 0.75 | 0.82 | 0.00 | 0.32 |

5 | Non-earned sources | 1,585 | 0.78 | 0.83 | 0.88 | 0.32 | 0.00 |

The net between-groups component *G*_{b} contributes 43 % of the total Gini coefficient. The highest value of economic distance ratio was observed between non-earned sources and self-employed (*D* = 0,88)—the economic situation of self- employed is 88 % better than the non-earned sources (see: Table 3). The transvariation component *G*_{t} describing the overlapping of the subpopulations accounts for the remaining 24 % of the total income inequality in Poland.

Parametric estimates of the Gini and Zenga inequality measures and their standard errors based on the Dagum model parameters

Socio-economic group | Year | Parameters | Goodness-of-fit | Gini index | CV [%] | Zenga index | CV [%] | ||
---|---|---|---|---|---|---|---|---|---|

λ | β | δ | |||||||

1. Employees | 2006 | 27.3830 | 0.9572 | 3.4436 | 0.9753 | 0.2934 | 1.4 | 0.2594 | 2.5 |

2008 | 63.5020 | 0.9445 | 3.4498 | 0.9704 | 0.2939 | 1.4 | 0.2601 | 2.5 | |

2. Farmers | 2006 | 21.2122 | 0.7441 | 2.5230 | 0.9441 | 0.4231 | 4.3 | 0.4872 | 6.8 |

2008 | 359.5840 | 0.3681 | 3.5045 | 0.9543 | 0.3922 | 3.8 | 0.4320 | 4.9 | |

3. Self-employed | 2006 | 54.1232 | 0.8122 | 3.2129 | 0.9524 | 0.3275 | 3.8 | 0.3159 | 6.7 |

2008 | 165.7337 | 0.7905 | 3.4738 | 0.9527 | 0.3058 | 3.6 | 0.2796 | 6.4 | |

4. Pensioners and retirees | 2006 | 4.6359 | 1.0756 | 3.2315 | 0.9402 | 0.3045 | 1.6 | 0.2776 | 3.0 |

2008 | 5.3830 | 1.1699 | 3.0939 | 0.9240 | 0.3127 | 1.7 | 0.2916 | 3.1 | |

5. Non-earned sources | 2006 | 6.8157 | 0.5471 | 3.5911 | 0.9547 | 0.3322 | 4.0 | 0.3256 | 5.3 |

2008 | 7.1906 | 0.6218 | 3.0583 | 0.9665 | 0.3697 | 5.3 | 0.3907 | 8.3 | |

Total | 2006 | 11.7510 | 0.9056 | 3.4928 | 0.9685 | 0.3407 | 1.0 | 0.3387 | 1.7 |

2008 | 27.6980 | 0.7937 | 3.0316 | 0.9634 | 0.3524 | 1.6 | 0.3461 | 1.7 |

## Concluding Remarks

The paper considered the problem of efficient estimation of inequality indices on the basis of random samples, including the measurement of inequality within and between subpopulations. Reliable estimates of inequality indices are usually available only on the national level, whereas in this paper, the detailed results for socio-economic groups were presented. They can be helpful to identify the sources of income inequality and poverty in Poland.

The results of the calculations presented in the paper reveal that the level of income inequality in Poland is high, as compared with many other European countries, especially for some socio-economic groups. The main component of income inequality in Poland, when measured by the Gini index, is economic disparity between socio-economic groups. The high value of the overlapping component suggests that the socio-economic groups are not separated perfectly, so they cannot be regarded as strata.

In general, the inequality estimation was more efficient when the Gini index was applied, which resulted in fewer errors of estimates. On the other hand, the synthetic Zenga measure seemed more sensitive to slight changes of income inequality within the groups of households. Thus, it is clear that both inequality coefficients, accompanied by the measures of their precision, can be regarded as useful tools in income distribution analysis.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.