Analyzing the Gender Gap in Poland and Italy, and by Regions

High-income inequality, accompanied by substantial regional differentiation, is still a great challenge for social policymakers in many European countries. One of the important elements of this phenomenon is the inequality between income distributions of men and women. Using data from the European Union Statistics on Income and Living Conditions, the distributions of income for Italy and Poland were compared, and the gender gap in these countries was assessed. No single metric can capture the full range of experiences, so a set of selected tools were adopted. The Dagum model was fitted to each distribution, summary measures, like the Gini and Zenga inequality indices, were evaluated, and the Zenga curve was employed to detect changes at each income quantile. Afterward, empirical distributions were compared through a relative approach, providing an analytic picture of the gender gap for both countries. The analysis moved beyond the typical focus on average or median earnings differences, towards a focus on how the full distribution of women’s earnings relative to men’s compares. The analysis was performed in the different macroregions of the two countries, with a discussion of the results. The study revealed that income inequality in Poland and Italy varies across gender and regions. In Italy, the highest inequality was observed in the poorest region, i.e. the islands. On the contrary, in Poland, the highest inequality occurred in the richest region, the central one. The relative distribution method was a powerful tool for studying the gender gap.


Introduction
Income disparity is still growing in the Organisation for Economic Cooperation and Development (OECD) countries and it reached its highest level in the past half-century. Several studies were conducted on the issue (e.g., OECD 2011OECD , 2015. The trend of rising inequality has become a priority for policymakers and calls for the analysis of various aspects of income inequality, including measurement and decomposition by regional areas, income sources and recently across genders. For social and economic policies, it seems interesting to compare income inequality across the European Union (EU) countries and regions.
The focus of the present paper is on income distributions across Poland and Italy because the aforementioned countries represent different economic backgrounds. Poland still suffers the effects of the transitioning from a centrally-planned economy to a market-based economy, and Italy is a former well-established market economy. Moreover, according to the Tárki European Social Report (TÁRKI, 2009), a study on intolerance to income inequality across countries confirmed a markedly lower level of acceptance of inequality in the post-socialist bloc than in the other European countries.
Nowadays, there is open debate about the observed discrepancy between income distributions of men and women. Women make up almost half of the workforce, yet, on average, they continue to have lower incomes than men. Gender equality is one of the fundamental values of the EU. The EU is dedicated not only to defending this right but also to promoting gender equality within the member states and across the world. This was the core aim of the European Commission's Strategic engagement for gender equality 2016-2019.report (European Commission, 2016). Naturally, gender can affect people's working lives in different ways. Therefore, even if it is illegal for employers in EU countries to pay men and women different amounts for doing the same job, there are many other reasons why, on average, substantial income differences between men and women can be observed. Differences in pay are caused by many concurring factors, and an interesting in-depth analysis can be found in Blau and Kahn (2003). Occupational segregation is perhaps the main reason. Men are more prevalent in higher-paid industries, while women are mostly in lower-paid industries. The differences in remuneration across industry sectors all influence the gender pay gap (Blau and Kahn, 2000). There is also vertical segregation. Few women work in senior and better-paying positions. Moreover, in many countries, there is still ineffective equal-pay legislation regulating women's overall paid working hours. The distribution of the workforce across working hour bands is generally more even for women than for men, due largely to the higher incidence of part-time work among women. Thus, a greater proportion of women work fewer hours. Finally, some barriers to entry into the labor market are related to the education level and single parenting rate (Leythienne and Ronkowski, 2018). Gender equality also means equal economic independence for women and men is still observed. It refers to equality in decision-making, and, in a wider setting, requires equal dignity, integrity and the ending of gender-based violence.
In this paper, an analysis of data from household budget surveys carried out in Poland and Italy in 2015 is presented, with special attention to the income gender gap within each country and for macroregions. The calculations were based on microdata from the European Union Statistics on Income and Living Condition (EU-SILC) (Eurostat, 2015).
The original contribution of the present research is the variety of chosen metrics, complemented by the descriptive and inferential tools. More in detail, after estimating the Dagum model for each distribution, summary measures, like the Gini and Zenga inequality indices, were evaluated and the Zenga curve was adopted to detect changes at each income quantile. Afterward, a comparison of the empirical distributions was developed by a relative approach, providing an analytic picture of the gender gap for both countries. The analysis moves beyond the typical focus on average or median earnings differences, toward a perspective on how the full distribution of women's earnings relative to men's compares. Finally, the analysis was carried out in the different macroregions of the two countries and the results were discussed.
Results of EU-SILC show that the popular Gini inequality indexes for net household incomes in the two countries were rather similar and equaled 0.34 and 0.35, respectively. Nevertheless, the comparative studies conducted by Jedrzejczak (2015) and Zenga and Jedrzejczak (2020) revealed substantial differences in inequality patterns much higher for the Italian macroregions when compared to the Polish ones. In particular, a relatively strong negative correlation was observed between gross domestic product (GDP) per capita and income inequality measured by the Gini index in the Italian regions, while in Poland this correlation was slightly positive. As a result, in Italy, the highest-inequality levels occurred in the poorest regions, while in Poland, the opposite was observed.

Methodology and Notation
First a brief review of the Dagum model for fitting the economic data is provided. Then, the inequality curves and indices and computational details are presented. Finally the notion of relative distribution is defined, providing a powerful tool to perform the comparison among genders.

Density Estimation
The Dagum distribution was chosen to model the income data. The choice of the underlying distribution was based on the well-known economic foundations and a large body of diversified empirical pieces of evidence supporting the aforementioned model as an excellent candidate to fit observed income distributions (Kleiber and Kotz, 2003). Recall that F belongs to the Dagum family if its density function is given by for some a, b, p > 0. Notice that the first moment of a Dagum distribution is finite if and only if a > 1. Therefore, the inequality measures considered in this paper (and introduced in the following subsection) are defined under this condition. The parameters a and p are shape parameters, while b is a scale parameter. This model allows for various degrees of positive skewness and leptokurtosis. Moreover, it has built-in flexibility to be unimodal, to approximate income distributions, or to be zeromodal to describe wealth distributions. The shape parameters are related to inequality, Lorenz and first-stochastic dominance. For example, let F 1 = f (a 1 , b 1 , p 1 ) and F 2 = f (a 2 , b 2 , p 2 ) represent two Dagum distributions. Then the necessary and sufficient conditions for Lorenz dominance (i.e. non-intersecting Lorenz curves) are a 1 p 1 ≤ a 2 p 2 and a 1 ≤ a 2 . For more details on this distribution, in the frame-work of economic size distributions, see Kleiber and Kotz (2003, chap. 6.3) and references therein. Finally, the cumulative distribution function (CDF) for the Dagum is given by Given an independent and identically distributed sample x 1 , x 2 , ..., x n , the likelihood equations for the Dagum family are given by (1) However, no explicit solution to this system is known (e.g., Kleiber 2008). An effective package for Dagum estimation is available on the CRAN R package repository (CRAN, 2018). It solved numerically the maximum likelihood (ML) optimization with very good model fit.

Inequality Curves and Indices
Let Z ≥ 0 be a random variable representing gross or net incomes as well as taxes. Let F (Z) be its cumulative distribution function (CDF), μ = E(Z) the mean value, and F −1 (p) = inf {z : F (Z) ≥ p} its quantile function for 0 < p < 1. The Lorenz curve {(p, L F (p))|p ∈ [0, 1]}, plots the cumulative share of Z, say L F (p), versus the cumulative share of the population, p. In the ideal case of perfect equality (that is, a society in which all people have the same income), the share of incomes equals the share of the population, so that L F (p) = p, for all 0 < p < 1. In this case, the Lorenz curve is the diagonal line from (0, 0) to (1, 1). On the other hand, the lower the share of income L F (p) held by the share of income earners p, the higher the inequality. In the ideal case of perfect inequality (that is, a society in which all people but one have an income of nil), the share of incomes equals zero for 0 ≤ p < 1, so that L F (p) = 0, and L F (1) = 1 only for p = 1. In this setting, it is very natural to express the degree of inequality through the deviation of the actual Lorenz curve from the diagonal line. The Gini index (Gini 1914) is twice the area between the equality line and the Lorenz curve: The Gini index can be rewritten as Three issues arise when using the Gini index. First, the weight, p, gives the lowest importance to the more critical comparisons. Second, the weight, p, gives more emphasis to the less informative comparisons. Third, the considered groups, which μ and − μ(p) refer to, are overlapped. These considerations gave rise to many attempts for modifying the Gini index (see Greselin (2014) for a review).
Observing the noticeable increase in disparities between less fortunate and more fortunate individuals, Zenga (2007) introduced a new inequality curve, Z F (p), obtained by contrasting the average income of the poorer p% bottom earners, that is − μ(p), with the amount that is held, on average, by the richest top earners, i.e. the remaining (1 − p)% of the population, that is The methodology proposed by Zenga keeps in mind that the notions of poor and rich are relative to each other and summarizes, in a single measure, the amount of inequality in the population. A measure of economic inequality can be defined by calculating the area beneath the Zenga's curve: It is worth recalling that the Zenga index follows the axiomatic approach. It obeys the Pigou-Dalton transfer principle (Dalton 1920;Pigou 1920), it is scale invariant, and it has the desired properties of anonymity and decomposability.
The next aim is to compare the two indices: Both indices are relative measures of inequality, but the latter, compared to the former, has the following interesting features. First, the Zenga index gives the same weight to each comparison along the entire distribution. Second, the considered groups, which − μ(p) and + μ(p) refer to in the definition of Z F , are exhaustive and disjoint. Finally, the curve Z F (p) has neither forced values at the endpoints nor is constrained to being non-decreasing and concave on the interval [0, 1], as is the case for the Lorenz curve L F (p). Turning back to the expressions of the inequality measures in the Dagum model, the Lorenz curve and the Gini index have the following form (Dagum 1977): and In Equation (7), B(t; a; b) indicates the beta CDF, while Γ (x) indicates the Gamma function in Equation (8). The substitution of the Lorenz curve in the formula of Zenga's index yields As noted at the beginning of this section, the Lorenz curve and the two inequality measures are defined if and only if a > 1. Finally, bearing in mind that the aim is to employ such measures on survey data, the estimators for the aforementioned curves and indices are needed. From the random sample (X 1 , . . . , X n ), the empirical Lorenz curve can be rephrased as: {(i/n, L n (i/n)}: whereX denotes the sample mean and X (j ) denotes the j -th order statistics. L n (p) is the share of the total amount of income owned by the least fortunate p × 100% of the sample. The Gini index, evaluated from the sample, is The empirical Zenga measure (Greselin and Pasquazzi, 2009) can be obtained by replacing the population, CDF F , by its empirical counterpart F n : . (12)

Relative Distributions
Let Y 0 be a random variable representing a measurement for a population (e.g., income, wealth, hourly wages etc). In the following, the population that generated Y 0 is denoted as the reference population (in the present case, the male income distribution can be set as the reference). Denote the CDF of Y 0 by F 0 (y). Suppose also to observe another measurement of Y from a different population. The population that generated Y is denoted as the comparison population. It is assumed that Y has CDF F (y). Typically, Y is the measurement for a separate group (in the present case, the female income distribution), but in other cases, it can also be the same group at a later period. The objective is to study the differences between the comparison distribution and the reference distribution. Unless explicitly mentioned, both F and F 0 are assumed to be absolutely continuous with common support. The relative distribution of Y to Y 0 is defined (Handcock and Morris, 2006) as the distribution of the random variable: It is worth remarking that R is obtained from Y by transforming the latter by the CDF for Y 0 , namely F 0 . This has also been called the grade transformation (Ćwik and Mielniczuk, 1989). If the two distributions, Y and Y 0 , are identical, then the random variable, R, is uniformly distributed on the unit interval. The deviation from the uniform pattern can be interpreted as the gap between the distributions. While this transformation is not widely used or understood in the social sciences, it is a very useful one because R measures the relative rank of Y compared to Y 0 .
As a random variable, R has both a CDF and a probability density function. In particular, R has CDF G, such that G(p) = F (F −1 0 (p)) for 0 ≤ p ≤ 1. The relative CDF, G(p), can be interpreted as the proportion of the comparison group whose attribute lies below the p th quantile of the reference group. Note that even though the relative CDF is explicitly scaled in terms of quantiles, the implicit unit of comparison is the value of the attribute on the original measurement scale, with y p = F −1 0 (G(p)) representing the cut point.

Results
A brief description of the European Union Statistics on income and Living Conditions (EU-SILC) data (Eurostat, 2015) begins this section. Results from the gender-gap analysis follow.

EU-SILC 2015 Data
The EU-SILC is an instrument aiming at collecting timely and comparable cross-sectional and longitudinal multidimensional microdata on income, poverty, social exclusion and living conditions. This instrument is anchored in the European Statistical System (ESS). The variable of interest is gross personal income, measured yearly, which consists of: employeegross income PY010G, cash benefits from self-employment PY050G, gross individual pension plans PY080G, gross-unemployment benefits PY090G, old-age benefits PY100G, survival benefits PY110G, disability benefits PY130G, and education-related allowances PY140G.
Gross-employee income includes the following items: wages and salaries paid in cash for time worked or work done in main and any secondary or casual job(s), remuneration for time not worked (e.g. holiday payments), enhanced rates of pay for overtime, fees paid to directors of incorporated enterprises, piece-rate payments, payments for fostering children, commissions, tips and gratuities, supplementary payments (e.g. thirteenth-month payment), profit sharing and bonuses paid in cash, additional payments based on productivity, allowances paid for working in remote locations (regarded as part of the conditions of the job), allowances for transportation to or from work (Eurostat, 2015).
Tables 1 and 2 contain descriptive statistics of EU-SILC 2015 data, for Poland and Italy, respectively. Acronyms, such as PL1, followed by their names, correspond to the name of a region, as given by the Nomenclature of Territorial Units for Statistics (NUTS) 1 , i.e., the statistical classification of economic units in the European Community. From the same standards, "PL" stands for Poland, "IT" stands for Italy.  Notes: Acronyms, such as IT1, followed by their names, correspond to the name of a region, as given by NUTS, i.e., the statistical classification of economic units in the European Community, where "IT" stands for Italy. Source: Own calculations using data from EU-SILC (Eurostat, 2015)

Gender-gap Analysis for Poland and Italy, at National and Regional Level
The analysis of the data began with obtaining useful summary measures, like the median and 98% quantile of income, for regions and at the country level, when partitioning the population by gender. Afterward, parameter estimation for the Dagum model was performed on each data subsample, and the Gini and Zenga indices were calculated using Equations (11) and (12). Also, the latter results are reported in Tables 3 and 4, respectively, for Poland and Italy.
From Tables 3 and 4, it is worth noting that the differences between median incomes were higher in Italy, where males were more affluent than females by 47% on average (in Poland by 34 %). In contrast, the biggest discrepancy between income-inequality levels was observed in Poland. The Gini index was 9% higher for men (for Italy the difference was smaller: 1.8%). For Italian men, the Zenga index was Z = 0.747. This means that, on average, the mean income of the p% poorest men was 1-74.7=25.3% lower than the mean income of the (1-p)% richest fraction of men in the sample. Similarly, for Italian women, the Zenga index had the value Z=0.744, so that the mean income was, on average, 25.6% lower than the mean income of the richest fraction of women in the sample.
Interesting findings were obtained further in terms of regional differences, both within and between countries. In Poland, the differences between the regions were much smaller Notes: Acronyms, such as PL1, followed by their names, correspond to the name of a region, as given by NUTS, i.e., the statistical classification of economic units in the European Community, where "PL" stands for Poland. Source: Own calculations using data from EU-SILC (Eurostat, 2015)  Notes: Acronyms, such as IT1, followed by their names, correspond to the name of a region, as given by NUTS, i.e., the statistical classification of economic units in the European Community, where "IT" stands for Italy. Source: Own calculations using data from EU-SILC (Eurostat, 2015) than in Italy, when comparing average-income levels. In contrast, regional-income inequalities were more variable in Poland. Moreover, in Italy, the highest inequality occurred in the poorest region, the islands, while in Poland the highest inequality occurred in the richest region, the central one. Table 5 shows the results of a one-sided significance test for the two-income means (men vs. women) for Poland and Italy, to complete the information. Notes: Acronyms, such as PL1 or IT1, followed by their names, correspond to the name of a region, as given by NUTS, i.e., the statistical classification of economic units in the European Community, where "PL" and "IT" stand for Poland and Italy, respectively. Source: Own calculations, data from EU-SILC 2015 Fig. 1 Dagum densities estimated on male and female income data for Poland (left) and Italy (right). Source: Own calculations using data from EU-SILC (Eurostat, 2015) Figure 1 presents the estimated Dagum densities for male and female groups in Poland and Italy. The juxtaposed income distributions (for males and females) had a similar shape in both countries. However, the distributions of men were more dispersed and shifted to the right, when compared to the distributions of women.
The density curves, as well as the summary measures presented in Tables 3 and 4, were inadequate to property address all unanswered questions about gender gaps. The comparison was more informative when employing a detailed inequality analysis based on inequality curves. The study contrasted the usefulness of the Lorenz and the Zenga curves in depicting the overall income inequality in Poland and Italy, reported in Figure 1 in the Online Supplemental Appendix and here in Figure 2, respectively.
The Lorenz and Zenga curves provided an in-depth comparison of income inequality among and between men and women. At first glance, it can be seen that the Lorenz curves (Online Supplemental Appendix Figure 1) were close to each other and it was rather difficult to determine the level and direction of the discrepancy between gender groups. This was particularly difficult for Italy, where the Lorenz curves intersect. Since the Zenga curve refers to non-overlapping and opposite groups in the population, the Zenga curve was more sensitive to inequality changes at each point of the income range. In Poland, the Zenga curve for males lies visibly above the one for females, showing higher-income inequality for the latter group throughout the entire income distribution. In Italy, the Zenga curves revealed a much smaller discrepancy between gender groups. Additionally, the curves intersect, so the relationship between inequality levels changes. Men were more unequal within higherincome groups, while for small-and intermediate-income levels, their distribution was more homogeneous compared to women. As an example, the highlighted point in the Zenga curve in Figure 2 says that, for Italian women, the poorest 72% had mean income equal to 33% of the mean income of the richest 28%.
The question arises, for the sake of a profound gender-gap analysis, is it enough to consider inequality curves along with only a few selected income quantiles (median and 98th percentile)? The phenomenon of gender gap depends not only on the differences between the median values for men and women but also on all the deviations observed over the entire income range, i.e., for each quantile of the income distributions. The relative distribution method is a relevant tool to complete the analysis. It contrasts the gender groups all along their income distributions. Figure 3 shows the gender gap results for Poland and Italy, respectively, obtained via the relative distribution approach. The curve of the relative income distribution provides rich and detailed information. Each point on the curve has a precise interpretation. For instance,  (Eurostat, 2015) in the right panel of Figure 3, at the third decile of the male earnings distribution, that is p = 0.3, G(0.3) = 0.54. This means that approximately 54% of women earn less than the third decile male income. One of the peculiarities of the relative graphs is that the distance between Euro values on the right-hand scale is measured in units of persons rather than in Euro. Therefore, the distance between 0 and 10,000 Euro, is larger than the distance between 40,000 and 50,000 Euro, because a larger fraction of people have an income falling in the former range of incomes than in the latter. In other words, the information captured by the density curves in Figure 1 is conveyed here, along with a straight-through assessment of the gender gap.
A comparative analysis of the relative distributions for Poland and Italy, by regions, showed significant differences between the countries. For a more in-depth analysis, the complete results are available in Figures 2, 3, 4, 5, 6 and 7 in the Online Supplemental Appendix. Own calculations using data from EU-SILC (Eurostat, 2015) Analyzing the Gender Gap in Poland and Italy showing the maximum and the minimum gender gap, respectively. Source: Own calculations using data from EU-SILC (Eurostat, 2015) Some considerations arose and are worth noting here, in the main paper. First, the gender gap in Italy was quite diverse by region, while, in Poland, it was more stable. An essential observation was that in Italy, the largest-income gap was observed in the northeast, while in Poland the largest income gap occurred in the southern region. Both regions have the maximum mean income within the corresponding countries. Related results were reported in the left panels of Figures 5 and 4, respectively. The relative distribution of the female income with respect to the male income in Figure 4 (left panel) showed that, in the southern Polish region, the median income for men, say 7,285 euro, was equal to the third quantile for women. In the central Polish region Figure 4 right panel, 45 percent of the women had an income lower than the men's third decile. showing the maximum and the minimum gender gap, respectively. Source: Own calculations using data from EU-SILC (Eurostat, 2015) In both countries, in contrast to some expectations, the regions with the largest inequalities had the smallest gender gaps. In Poland, it was the central region, and in Italy, this occurred in the islands (right panels of Figures 4 and 5, respectively). In both regions, substantial differences between the means were combined with very high-income inequality among men.
In Figure 5 (left panel), with respect to the northeastern Italian region, the third decile of male income, around 20,000 euro, corresponds to the 58th percentile of women's income. In the Italian islands, Figure 5 (right panel) shows that the same amount of 20,000 euro corresponds to the 53rd and 68th percentiles in male and female income distributions, respectively. Obviously, the remarks reported here refer to some specific points of the relative distribution, to provide an instance of the much richer insight and interpretation available from the whole curves.

Conclusions
Income inequality in Italy and Poland varies across gender and regions. In Italy, the highest inequality occurs in the poorest region, the islands. On the contrary, in Poland, the highest inequality is attained in the richest region, the central one. Income inequality in Poland is substantially higher among men, as is the mean income. This finding holds for the whole country, as well as for the regions. In Italy, the behaviour of the Zenga curves reveals a much smaller discrepancy between gender groups, when compared to Poland. Intersecting inequality curves for Italian data confirm that men are more unequal at higherincome groups while for small-and intermediate-income levels, their distribution is more homogeneous compared to women.
In Italy, the poorest regions are those with the lower gender gap and vice versa. In contrast, in Poland, the gender gap was highest in the southern region and lowest in the central (richest) region. In Poland, the gender gap is more stable across regions. Rather unexpectedly, in both countries, the regions with the largest inequalities have the smallest income gaps. In Poland, this contrast occurs in the central region, and in Italy, in the islands. In both regions, substantial differences between the means are reduced by very high-income inequality among men.
The relative distribution methodology was employed as a powerful tool for studying the gender gap in both countries, and by regions. The gender gap can also be measured by standard inequality decomposition and the relative economic affluence.
The present paper offers a descriptive approach. Policy recommendations can only be made based on an analysis of the causes of the gender gap. This might include analysis of the gender gap controlling for years of education, job tenure, marital status, age and number of children. This topic will the object of future work. in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommonshorg/licenses/by/4.0/.