Abstract
The quintile share ratio of disposable income is the primary inequality indicator of the European Union. As an inequality indicator, it must be sensitive to extreme large observations. Therefore, outliers have a strong impact on the bias and the variance of the classical quintile share ratio estimator. This may mislead the interpretation of income inequality. A class of estimators which are robust against outliers is introduced. They have a bounded influence function, they may reduce the bias incurred by the robustification and they reduce variability. Based on an asymptotic framework which respects the design-based, non-parametric approach, inference for these robust estimators is developed. A large simulation study with close to reality universes derived from the Statistics of Living Conditions Surveys of the EU allows to study the performance of the proposed estimators.
Similar content being viewed by others
References
Alfons, A., Filzmoser, P., Hulliger, B., Kolb, JP., Kraft, S., MĂĽnnich, R., Templ, M.: Synthetic data generation of silc data. Research Project Report WP6 - D6.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011a)
Alfons, A., Kraft, S., Templ, M., Filzmoser, P.: Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat. Method Appl. 20(3), 383–407. doi:10.1007/s10260-011-0163-2 (2011b)
Atkinson, T., Cantillon, B., Marlier, E., Nolan, B.: Social indicators: the EU and social inclusion. Oxford University Press, Oxford (2002)
Beaumont, J.F., Rivest, L.P.: Dealing with outliers in survey data. In: Pfeffermann, D., Rao, C. (eds.) Sample surveys: theory, methods and inference, Handbook of Statistics, vol. 29A, chap 11. Elsevier, Amsterdam, pp. 247–280 (2009)
Binder, D.A., Patak, Z.: Use of estimating functions for estimation from complex surveys. J. Am. Stat. Assoc. 89(427), 1035–1043 (1994)
Bowley, A.L.: Elements of statistics. Charles Scribner’s Sons, New York (1920)
Bruch, C., MĂĽnnich, R., Zins, S.: Variance estimation for complex surveys. Tech. rep., AMELI deliverable D3.1, http://ameli.surveystatistics.net/ (2011)
Chambers, R.L.: Outlier robust finite population estimation. J. Am. Stat. Assoc. 81(396), 1063–1069 (1986)
Cowell, F.A., Flachaire, E.: Income distribution and inequality measurement: The problem of extreme values. J. Econom. 141, 1044–1072 (2007)
Cowell, F.A., Victoria-Feser, M.P.: Robustness properties of inequality measures. Econometrica 64(1), 77–101 (1996)
Cowell, F.A., Victoria-Feser, M.P.: Welfare rankings in the presence of contaminated data. Econometrica 70(3), 1221–1233 (2002)
Cowell, F.A., Victoria-Feser, M.P.: Distribution-free inference for welfare indices under complete and incomplete information. J. Econ. Inequal. 1, 191–219 (2003)
Cowell, F.A., Victoria-Feser, M.P.: Distributional dominance with trimmed data. J. Bus. Econ. Stat 24(3), 291–300 (2006)
David, HA., Nagaraja, HN.: Order Statistics, 3rd edn. Wiley, Hoboken (2003)
Deaton, A.: The analysis of household surveys: a microeconomic approach to development policy, 3rd edn. World Bank Publications, The Johns Hopkins University Press, Baltimore (2000)
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87(418), 376–382 (1992)
European Commission: Laeken indicators. Detailed calculation methodology. Tech. rep., EUROSTAT working group statistics on income, poverty and social exclusion, Luxembourg. DOC. E2/IPSE/2003 (2003)
Fuller, W.A.: Simple estimators for the mean of skewed populations. Statistica Sinica 1, 137–158 (1991)
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions. Wiley, New York (1986)
Huber, P.J.: Robust statistics. Wiley, New York (1981)
Hulliger, B.: Outlier robust Horvitz-Thompson estimators. Surv. Methodol. 21(1), 79–87 (1995)
Hulliger, B., MĂĽnnich, R.: Variance estimation for complex surveys in the presence of outliers. In: ASA Proceedings of the Section on Survey Research Methods (2006) American Statistical Association, In (2006)
Hulliger, B., Schoch, T.: Robustification of the quintile share ratio. In: Proceedings of the NTTS Conference—New Techniques and Technologies for Statistics, Eurostat, Brussels (2009)
Hulliger, B., Alfons, A., Filzmoser, P., Meraner, A., Schoch, T., Templ, M.: Robust methodology for laeken indicators. Tech. rep., Research Project Report WP4 D4.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011)
Krewski, D., Rao, J.: Inference from stratified samples: properties of the linearization, jackknife, and balanced repeated replication method. Ann. Stat. 9(5), 1010–1019 (1981)
Langel, M., Tillé, Y.: Statistical inference for the quintile share ratio. J. Stat. Plan. Infer. 141, 2976–2985 (2011)
Moreno-Rebollo, J., Muñoz-Reyes, A., Muñoz-Pichardo, J.: Miscellanea: influence diagnostic in survey sampling: conditional bias. Biometrika 86(4), 923–928 (1999). doi:10.1093/biomet/86.4.923
Moreno-Rebollo, JL., Muñoz-Reyes, A., Jiménez-Gamero, MD., Muñoz-Pichardo, J.: Influence diagnostic in survey sampling: Estimating the conditional bias. Metrika 55(3):209–214, doi:10.1007/s001840100142 (2002)
Nygård, F., Sandström, A.: Income inequality measures based on sample surveys. J. Econom. 42, 81–95 (1989)
Osier, G.: Variance estimation for complex indicators of poverty and inequality using linearization techniques. Surv. Res. Method. 3, 167195 (2009)
Pfeffermann, D.: The role of sampling weights when modelling survey data. Int. Stat. Rev. 61(2), 317–337 (1993)
Rao, J.N.K., Wu, C.F.J.: Inference from stratified samples: second-order analysis of three methods for nonlinear statistics. J. Amer. Stat. Assoc. 80(391), 620–630 (1985)
Särndal, C.E., Swensson, B., Wretman, J.: Model assisted survey sampling, 2nd edn. Springer, New York (1992)
Serfling, R.J.: Approximation theorems of mathematical statistics. Wiley, New York (1980)
Shao, J.: L-statistics in complex survey problems. Ann. Stat. 22(2), 946–967 (1994)
Smith, T.: Influential observations in survey sampling. J. Appl. Stat. 14(2), 143–152 (1987)
Stigler, S.M.: The asymptotic distribution of the trimmed mean. Ann. Stat. 1(3), 472–477 (1973)
Victoria-Feser, M.P., Ronchetti, E.M.: Robust methods for personal-income distribution models. Can. J. Stat. 22(2), 247–258 (1994)
Wolter, K.M.: Introduction to variance estimation, 2nd edn. Springer, New York (2007)
Zheng, B.: Testing lorenz curves with non-simple random samples. Econometrica 70(3), 1235–1243 (2002)
Acknowledgments
This work was carried out under the project AMELI (“Advanced Methodology for European Laeken Indicators”). The AMELI project was funded from the European Commission’s 7th Framework Programme. EC-Project Reference: 217322, Research area: SSH-2007-6.2-01. Visit: http://www.ameli.surveystatistics.net.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs
Appendix: Proofs
Proof
(Lemma 1) Let \(F \in \mathcal F \), where \(\mathcal F \) is the set of cumulative distribution functions, and denote by \(F_{\varepsilon }(y)\) the mixture distribution \(F_{\varepsilon }(y)=(1-\varepsilon )F(y) + \varepsilon G(y)\), where \(G(y)=1\!\!1\{ y \ge z\}\) is an elementary (degenerate) cdf. For \(\beta \in \mathbb Q _1\) such that \(\beta \) is not a limit point of \(F^{-1}\), the influence function writes \(IF(z, QSM(\cdot ;\beta ), F) = \partial / \partial \varepsilon \left[ \beta ^{-1}\int ^{\xi _{\beta }(F_{\varepsilon })} y \mathrm d F_{\varepsilon }(y)\right] \) for \(\varepsilon = 0\). Thus, differentiating w.r.t. \(\varepsilon \) (by means of the Leibniz integration rule) and taking \(\varepsilon \downarrow 0\), yields \(IF(z, QSM(\cdot ;\beta ), F) = \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d G(y) - \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d F(y) + \beta ^{-1}\xi _{\beta }(F)f(\xi _{\beta }(F)) \cdot \left[ \mathrm d \xi _{\beta }(F_{\varepsilon })/\mathrm d \varepsilon \right] _{\varepsilon = 0}\). Note that \([\mathrm d \xi _{\beta }(F_{\varepsilon })/ \mathrm d \varepsilon ]_{\varepsilon = 0}\) is the influence function of the \(\beta \)th quantile functional (see e.g., Huber 1981, 56–57) and defined as \(IF(z, \xi _{\beta }(\cdot ), F) = [\beta - 1\!\!1\{\xi _{\beta }(F) \ge z \}][f(\xi _{\beta }(F))]^{-1}\). Assembling all terms, \(f\) cancels out and we get the influence function which completes the proof. \(\square \)
Proof
(Lemma 3) Suppose \(\underline{\mathcal{Y }}=0\), and \(\beta _t \in \mathbb Q _1\) be associated with \(Q(F;\beta _t)\), where \(\forall \beta _t,t=1,\ldots ,p: 0<\beta _t<1;~ \beta _t\) is not a limit point of \(F^{-1}\). The \(Q(F;\beta _t)\) functional admits a first-order von Mises expansion at \(F\) around \(G\), which is given by \(Q(G;\beta _t)=Q(F;\beta _t) + \int IF(y, Q(\cdot ;\beta _t),F) \mathrm d (G-F)(y) + R(G,F)\), with \(IF\) according to Lemma 1. For ease of notation, write \(z_{hijk}=IF(y_{hijk},Q(\cdot ;\beta _t),F)\) and \(Z_{hijk}=IF(Y_{hijk},Q(\cdot ;\beta _t),F)\) (adopting the convention that capital letters denote random variables). Under the assumption \(0<\beta _t<1\) and for \(n\) sufficiently large, there exist constants \(c_t\) such that \(\inf _L F(c_t)>\beta _t, \forall t\) (where \(L \rightarrow \infty \) according to the asymptotic framework; and the fact that \(\hat{F}_L(c_t) - F_L(c_t) \rightarrow _p 0\)), then \(\{z_{hijk}\}\) is bounded. Moreover, and under the regularity conditions on the sampling design, i.e., Assumptions (A1) and (A2), Liapounov’s condition hold and we obtain for the weighted average
and \(\mathbb{E }\overline{z} =1/N \sum _{h=1}^{L} \sum _{i=1}^{n_h} \sum _{j=1}^{n_{hi}} \sum _{k=1}^{N_{hij}} Z_{hijk}=0\) (cf. Shao 1994, Theorem 1). Thus, by (Krewski and Rao 1981, Theorem 3.1) \(\overline{z}/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)\) (since \(\mathbb E \overline{z}=0\)). For \(n\) sufficiently large, we may write \(\hat{Q}(\hat{F};\beta _t)=Q(F;\beta _t) + \overline{z} + R(\hat{F},F)\). Finally, by (Shao (1994), Theorem 1) \(\sqrt{n}R(\hat{F},F)\rightarrow _p0\), and thus \([\hat{Q}(\hat{F};\beta _t)-Q(F;\beta _t)]/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)\).
In particular, the asymptotic covariance of \(\sqrt{n}Q(F;\beta _i)\) and \(\sqrt{n}Q(F;\beta _j)\) using the result of Lemma 1 is given by
Given \(\beta _i \le \beta _j\) and that \(1\!\!1\{x \le \xi (F;\beta _j)\}=1\) whenever \(1\!\!1\{x \le \xi (F;\beta _i)\}=1\) the right-hand side of (29) becomes
On simplifying (30), we obtain (in close relation to the “cumulative income functional” in Cowell and Victoria-Feser (2003, Appendix A.1))
where \(\omega _{\beta _i, \beta _j}\) is a short-hand notation for \(\omega _{Q(F;\beta _i), Q(F;\beta _j)}\), and \(S(\beta _i,F):=\int ^{\xi (F;\beta _i)}y^2\mathrm d F(y)\).
Thus, for each \(Q(F;\beta _i)\) with \(0 < \beta _i < 1\) (and if \(\beta _i\) is not a limit point of \(F^{-1}\)), \(i=1,\ldots ,p\), we have \(\sqrt{n} (\hat{Q}(\hat{F};\beta _i) - Q(F;\beta _i)) \rightarrow _d N(0, \omega _{\beta _i, \beta _i})\). The vector \((\hat{Q}(\hat{F};\beta _1), \ldots , \hat{Q}(\hat{F};\beta _p) )^T\) can be shown (by a Cramer-Wold device; see e.g., Serfling (1980, p.18)) to have a \(p\)-variate limiting normal distribution with covariance matrix \(\varvec{\Omega }\) whose \(i,j\)th element is equal to \(\omega _{\beta _i,\beta _j}\) for \(i \le j\). This completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Hulliger, B., Schoch, T. Robust, distribution-free inference for income share ratios under complex sampling. AStA Adv Stat Anal 98, 63–85 (2014). https://doi.org/10.1007/s10182-013-0215-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-013-0215-z