Assessment of goodness of fit of income distribution in France and Germany based on the Zenga distribution

The aim of this paper is to apply the Zenga distribution for equivalent disposable income from the last two waves of European Quality of Life Surveys for Germany and France (both for total society and selected socio-economic groups) and to assess the goodness of fit to empirical data. The Zenga distribution has not been used to describe the income distribution in these countries yet. The obtained parameters were assessed for fitting to empirical data using two measures—the Wasserstein-Kantorovich and the Wasserstein-Kantorovich standardized measure. The analysis of the results received allows for the conclusion that the Zenga distribution can fit the income distributions both for small as well as large values. It was also shown that the Zenga distribution fits the data well even with small and very small samples. The article uses a new measure to assess the fit of the distribution to empirical data, based on the Wasserstein-Kantorovich measure assessing the distance between the empirical and theoretical cumulative distribution function. The modification consisted in standardizing the Wasserstein-Kantorovich measure by dividing the field between distributors by the rectangle area, where length is maximum income and width is maximum value of the cumulative distribution function. The proposed measure is not sensitive to extreme values, often found in the analysis of income distribution, and can be applied even in very small samples.


Introduction
The research on income and its distribution occupies an important place in economic theory, and its results are a valuable source of information both for scientists as well as for public institutions as a basis for shaping social policy.Income distribution is also an important element of research on the living conditions of the society and the general condition of the economy.
The income distribution can be described in several ways.The first one consists in providing a synthetic index characterizing a selected trait of distribution, mainly inequality, for example Gini coefficient (Gini 1912(Gini , 1914(Gini , 1921)), Lorenz curve (1905), Pietra inequality index (Pietra 1915) as well as Zenga inequality index (Zenga 2007).The contrast of the Gini and Zenga indices, application them for the analysis of incomes in various countries, and development empirical estimation methodologies for lightand heavy-tailed distributions is the subject of the work of Gresselin et al. (2010Gresselin et al. ( , 2013Gresselin et al. ( , 2014)).
More complete information on the distribution of income can be obtained using an empirical distribution, for example, a histogram or quantiles.In this case, however, it is not always possible to determine all values of the distribution characteristics, e.g.modal, without grouping the data.On the other hand, using only grouped data may be a source of problems with closing extreme class intervals, which is necessary to determine many other distribution characteristics.The above problems can be avoided by using the theoretical distribution as an income distribution model.
The first theoretical function describing the distribution of income was proposed by Pareto in 1897 in Cours d'Economie Politique, formulating it in the form of the Pareto law (Pareto 1964).A detailed description of the Pareto distribution and the research conducted with the participation of this distribution was described by Arnold (1983).Another very popular theoretical model of income distribution is the log-normal (LN) distribution.It was popularized by Aitchison and Brown (1957).In the literature, the LN distribution is available in two versions-two-parameter and three-parameter distribution, but the three-parameter distribution differs from the two-parameter distribution only in position, and not in shape or variance.
Another group of 12 theoretical distributions was proposed by Burr.Some of the Burr distributions are used in the literature under other names, e.g. the type III Burr distribution is sometimes called the Dagum distribution (Dagum 1985), and the Burr distribution of the XII type by the Singh-Maddali distribution (cf.Singh and Maddala 1976).Burr distributions of type III and XII are special cases of generalized beta distribution (McDonald and Xu 1995;Ulman 2011).
Other theoretical models of wage and income distributions include the gamma distribution used by Salem and Mount (1974) to describe the US income distribution in 1960-69, the family of functions to describe Champernowne's (1953) pre-tax income distribution, and the distribution of Gram-Charlier type A used by Rutherford (1955).
In this paper, a new three-parameter model for distributions proposed by Zenga (2010) was applied.This model is a Beta mixture defined for non-negative distributions indicated for describing income, financial and actuarial distributions.Zenga model has three parameters: is a scale parameter and it is equal to the expected value, and are shape parameters that inequality depends on.It means that this distribution controls the location and inequality separately so restrictions on the expected value and inequality measure can be imposed separately (Arcagni and Porro 2013).The estimated parameters of Zenga distribution can be found, through D'Addario's invariants method (Zenga et al. 2010a;Arcagni 2011).
The Zenga model was used to describe the income distributions for Italy, Swiss, the United States, the United Kingdom (Zenga et al. 2010b), Poland (Jędrzejczak and Trzcińska 2018), the Czech Republic (Trzcińska 2022).The parameters for Germany and France have not been estimated.
The aim of this paper is to apply the Zenga distribution for equivalent disposable income from the last two waves of European Quality of Life Surveys for Germany and France (both for total society and selected socio-economic groups) and to assess the goodness of fit to empirical data.For this purpose, a new measure of fit based on the distance between empirical and theoretical distribution was applied.

Mathematical and statistical properties of Zenga distribution
The three-parameter model was introduced by Zenga (2010Zenga ( , 2010aZenga ( , 2012)).The density function f (x ∶ ; ; ) in Zenga distribution has been obtained as a mixture of Polisic- chio's (2008) following truncated Pareto density: a fixed  > 0 and all the values of k in the interval (0;1) .The density on the parameter k is given by the beta density and has the following form: where B( ; ) is the beta function.
The model is characterized by the probability density function f (x ∶ ; ; ) , (  > 0; > 0; > 0 ) for non-negative variables: Graphs of the density function of Zenga distribution for different levels of parameters (on the left) and (on the right) are presented in Fig. 1.
It is easy to see that In the case  > 0 , the cumulative distribution function F(x ∶ ; ; ) is described by the equation where is the incomplete beta function.
(5) It is interesting to note that the parameter governs the behavior of the density function as x tends to 0, while the value of the parameter regulates the finiteness of the function in the neighborhood of .The parameter α is an inverse inequality indicator and is a direct inequality indicator.In particular the bigger value of the parameter α the less unequal the distribution (Porro 2015).The expected value E(X) is always equal to the parameter μ.

Data and methods
In this paper all calculations are based on the data from research European Quality of Life Surveys (EQLS), the data of monthly household income has been recalculated into the net equivalent income per member of the household expressed in Euro.The purpose of the European Quality of Life Surveys is to measure both objective and subjective indicators of the standard of living of citizens and their households.The Zenga model for the socioeconomic group in Germany and France was used for two periods: 2007 and 2016.
To investigate the goodness of fit of theoretical distribution to the empirical one Zenga proposed: the Mortara index A 1 , the quadratic K. Pearson index A 2 and the modified quad- ratic index A ′ 2 which are described by the following formulas (Zenga et al. 2012): where n j and nj are respectively the empirical and the estimated frequencies of the jth interval.
However, all these measures are not suitable for small data sets because they require data aggregation.To find a degree of adjustments of a theoretical distribution to the empirical one, the Wasserstein-Kantorovich distance was applied.This measure has a long history and continues to attract interest from diverse fields in statistics, machine learning, and computer science, in particular image analysis Santambrogio (2015), Peyre and Cuturi (2019), and Panaretos and Zemel (2020).
The Wasserstein-Kantorovich distance between empirical F p and theoretical F q cumula- tive distribution function is computed as: It should be noted that the maximum value of the income affects the distance between cumulative distribution functions.The distance is greater the higher the maximum income is.To overcome this dependency and compare the goodness of fit of the Zenga distribution between the studied countries as well as in various socio-economic groups, the Wasserstein-Kantorovich measure was normalized.The area between theoretical and empirical cumulative distribution function was divided by the rectangle area, where length is the value of the highest income in a given set and width is maximum value of the cumulative distribution function, which equals to 1.This allows for the standardization of the distance between cumulative distribution functions in the range from [0, 1).The measure equal to 0 means full compliance (overlapping of the empirical and theoretical cumulative graphs).This is rather theoretical situation in which the area between the empirical and theoretical cumulative distribution functions is equal to 0. A value close to 1 means a full cumulative mismatch.The presented measures of distributions similarity have a clear interpretation.The lower the value of d W , and normalized d W the higher the consistency of compared distributions.
The parameters estimates were obtained by means of D'Addario's invariants method, as it is described in (D'Addario 1934(D'Addario , 1939;;Zenga 2010).Numerical methods of optimization were carried out using Mathematica program.

Results of applying Zenga distribution for equivalized income in France and Germany
In this chapter, the results of applying Zenga distribution for equivalent income in France and Germany are discussed.Table 1 presents descriptive characteristics of the data set.
The data sets range from 994 observations for France in 2016 to 1437 observations for Germany.Each of the above distributions is extremely right-sided asymmetric.The biggest asymmetry is for France in 2007-it is caused by extremely high maximum observation, which equals 46,700€.Both measures of differentiation and shape measures (asymmetry, kurtosis) are higher for France than for Germany.
Table 2 presents estimation results and measures of goodness of fit for income distributions in Germany and France.
Considering the measure of the Wasserstein-Kantorovich distance, the best match was recorded for Germany in 2007.It is easy to see that the fit measured with this measure is better the larger the sample size.As expected, a significantly higher value of the Wasserstein-Kantorovich measure was observed for France in 2007.In this data set, the maximum value of income was almost 2 times higher than for France in 2016 and almost 7 times

Results of applying Zenga distribution for equivalized income in socio-economics groups in France and Germany
Table 3 shows the descriptive characteristics of the socioeconomic groups of France and Germany.In both studied countries, the size of the studied groups is diversified-the least numerous groups are students and unable to work.The most numerous groups are employees and retired.In the distinguished subgroups, a high or very high level of differentiation and asymmetry can be observed.In extreme cases (employed France 2007) variation coefficient equals 1.237.882and the asymmetry coefficient-24.41.In other cases,  A graphical presentation of the distributions for the studied countries in selected socioeconomic groups is shown in Fig. 11.
For employed, a shift in the shape of the distribution (a marked change in the height of the distribution) in both countries was observed in 2016 compared to 2007.In the case of Germany, a shift to the right (towards higher income values) was also observed.Similar behavior of the distributions was noticed for the retired.The results of the approximation of the empirical income distributions in Germany and France for socio-economic groups in two periods of time by means of the Zenga model, together with the goodness of fit measures d W and normalized d W are presented in Table 4.
For socio-economic groups in France in 2007, the best match was obtained for employed.The worst fit, on the other hand, was recorded for students.It is also the smallest group-it has only 24 observations.The analysis of the distance between cumulative distribution functions in subsequent data sets confirms the observed relationship that the best match is observed for significantly more numerous groups (employed, retired) and the worst for the least numerous group in a given data subset.It should be noted, however, that the Zenga distribution fits well even for small and very small samples.In 4 out of 15 cases where the group size was less than 100 observations, and in 4 out of 8 cases when the group size was less than 50, the distance between cumulative distribution function is less than 0.05.The greatest distance between cumulative distribution function was observed for the unable to work, whose number was less than 20 observations, yet this measure was 0.070985, which is less than 0.1.

Conclusions
In the article, the Zenga distribution was applied to description of the distribution of equivalent income for France and Germany for the last two waves of the EQLS research (2007 and 2016).Parameters were estimated for the entire country as well as for individual socioeconomic groups.The obtained parameters were assessed for fitting to empirical data using two measures-the Wasserstein-Kantorovich and the Wasserstein-Kantorovich standardized measure.
The analysis of the obtained results allows to state that the Zenga distribution can fit the income distributions both for small as well as large values.Similar results were obtained using the A 1 , A 2 and A .′ 2 measures (Zenga et al. 2010a;Arcagni and Porro 2013).It was shown that the Zenga distribution fits well with the data even with small and very small samples.
In the article, the goodness of fit measures proposed by Zenga were abandoned due to the need to aggregate data in the goodness of fit assessment procedure.It was considered that such an operation with trials of less than 100 observations may strongly influence the obtained results.For this reason, it was decided to use the Wasserstein-Kantorovich measure, which assesses the distance between the empirical and theoretical cumulative distribution function.This measure has not been used in the literature to assess the goodness of fit of the Zenga distribution yet.However, this measure is also not free from disadvantages Table 4 Estimation results for income distributions by socio-economic group in Germany and France based on the data coming from EQLS because it is sensitive to outliers that are often found in variable distribution analyses with the Paretian right teil.For this reason, it was decided to modify the Wasserstein-Kantorovich distance by standardizing it.This was done by dividing the field between the cumulative distribution function by the rectangle area, where length is maximum income and width is maximum value of the cumulative distribution function.The applied measure allowed for the assessment of the distribution fit even in very small samples.

Fig. 3 Fig. 4 Fig. 5
Fig. 3 Density function of Zenga distribution fitted to empirical distribution for France 2016

Fig. 11
Fig. 11 The density distribution functions of the Zenga model for employees (on the left) and retired (on the right)

Table 1
Descriptive characteristics of equivalized income in Germany and France based on the data com-

Table 2
Estimation results for income distributions in Germany and France based on the data coming from EQLS

Table 3
Descriptive characteristics of equivalized income in socio-economic groups in France 2016 based on the data coming from EQLS