Skip to main content

Advertisement

Log in

Robust, distribution-free inference for income share ratios under complex sampling

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

The quintile share ratio of disposable income is the primary inequality indicator of the European Union. As an inequality indicator, it must be sensitive to extreme large observations. Therefore, outliers have a strong impact on the bias and the variance of the classical quintile share ratio estimator. This may mislead the interpretation of income inequality. A class of estimators which are robust against outliers is introduced. They have a bounded influence function, they may reduce the bias incurred by the robustification and they reduce variability. Based on an asymptotic framework which respects the design-based, non-parametric approach, inference for these robust estimators is developed. A large simulation study with close to reality universes derived from the Statistics of Living Conditions Surveys of the EU allows to study the performance of the proposed estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Alfons, A., Filzmoser, P., Hulliger, B., Kolb, JP., Kraft, S., MĂĽnnich, R., Templ, M.: Synthetic data generation of silc data. Research Project Report WP6 - D6.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011a)

  • Alfons, A., Kraft, S., Templ, M., Filzmoser, P.: Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat. Method Appl. 20(3), 383–407. doi:10.1007/s10260-011-0163-2 (2011b)

  • Atkinson, T., Cantillon, B., Marlier, E., Nolan, B.: Social indicators: the EU and social inclusion. Oxford University Press, Oxford (2002)

    Book  Google Scholar 

  • Beaumont, J.F., Rivest, L.P.: Dealing with outliers in survey data. In: Pfeffermann, D., Rao, C. (eds.) Sample surveys: theory, methods and inference, Handbook of Statistics, vol. 29A, chap 11. Elsevier, Amsterdam, pp. 247–280 (2009)

  • Binder, D.A., Patak, Z.: Use of estimating functions for estimation from complex surveys. J. Am. Stat. Assoc. 89(427), 1035–1043 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Bowley, A.L.: Elements of statistics. Charles Scribner’s Sons, New York (1920)

    Google Scholar 

  • Bruch, C., MĂĽnnich, R., Zins, S.: Variance estimation for complex surveys. Tech. rep., AMELI deliverable D3.1, http://ameli.surveystatistics.net/ (2011)

  • Chambers, R.L.: Outlier robust finite population estimation. J. Am. Stat. Assoc. 81(396), 1063–1069 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  • Cowell, F.A., Flachaire, E.: Income distribution and inequality measurement: The problem of extreme values. J. Econom. 141, 1044–1072 (2007)

    Article  MathSciNet  Google Scholar 

  • Cowell, F.A., Victoria-Feser, M.P.: Robustness properties of inequality measures. Econometrica 64(1), 77–101 (1996)

    Article  MATH  Google Scholar 

  • Cowell, F.A., Victoria-Feser, M.P.: Welfare rankings in the presence of contaminated data. Econometrica 70(3), 1221–1233 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Cowell, F.A., Victoria-Feser, M.P.: Distribution-free inference for welfare indices under complete and incomplete information. J. Econ. Inequal. 1, 191–219 (2003)

    Article  Google Scholar 

  • Cowell, F.A., Victoria-Feser, M.P.: Distributional dominance with trimmed data. J. Bus. Econ. Stat 24(3), 291–300 (2006)

    Article  MathSciNet  Google Scholar 

  • David, HA., Nagaraja, HN.: Order Statistics, 3rd edn. Wiley, Hoboken (2003)

  • Deaton, A.: The analysis of household surveys: a microeconomic approach to development policy, 3rd edn. World Bank Publications, The Johns Hopkins University Press, Baltimore (2000)

    Google Scholar 

  • Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87(418), 376–382 (1992)

    Article  MATH  Google Scholar 

  • European Commission: Laeken indicators. Detailed calculation methodology. Tech. rep., EUROSTAT working group statistics on income, poverty and social exclusion, Luxembourg. DOC. E2/IPSE/2003 (2003)

  • Fuller, W.A.: Simple estimators for the mean of skewed populations. Statistica Sinica 1, 137–158 (1991)

    MATH  MathSciNet  Google Scholar 

  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions. Wiley, New York (1986)

    MATH  Google Scholar 

  • Huber, P.J.: Robust statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  • Hulliger, B.: Outlier robust Horvitz-Thompson estimators. Surv. Methodol. 21(1), 79–87 (1995)

    Google Scholar 

  • Hulliger, B., MĂĽnnich, R.: Variance estimation for complex surveys in the presence of outliers. In: ASA Proceedings of the Section on Survey Research Methods (2006) American Statistical Association, In (2006)

  • Hulliger, B., Schoch, T.: Robustification of the quintile share ratio. In: Proceedings of the NTTS Conference—New Techniques and Technologies for Statistics, Eurostat, Brussels (2009)

  • Hulliger, B., Alfons, A., Filzmoser, P., Meraner, A., Schoch, T., Templ, M.: Robust methodology for laeken indicators. Tech. rep., Research Project Report WP4 D4.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011)

  • Krewski, D., Rao, J.: Inference from stratified samples: properties of the linearization, jackknife, and balanced repeated replication method. Ann. Stat. 9(5), 1010–1019 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  • Langel, M., TillĂ©, Y.: Statistical inference for the quintile share ratio. J. Stat. Plan. Infer. 141, 2976–2985 (2011)

    Article  MATH  Google Scholar 

  • Moreno-Rebollo, J., Muñoz-Reyes, A., Muñoz-Pichardo, J.: Miscellanea: influence diagnostic in survey sampling: conditional bias. Biometrika 86(4), 923–928 (1999). doi:10.1093/biomet/86.4.923

    Article  MATH  MathSciNet  Google Scholar 

  • Moreno-Rebollo, JL., Muñoz-Reyes, A., JimĂ©nez-Gamero, MD., Muñoz-Pichardo, J.: Influence diagnostic in survey sampling: Estimating the conditional bias. Metrika 55(3):209–214, doi:10.1007/s001840100142 (2002)

    Google Scholar 

  • NygĂĄrd, F., Sandström, A.: Income inequality measures based on sample surveys. J. Econom. 42, 81–95 (1989)

    Article  MATH  Google Scholar 

  • Osier, G.: Variance estimation for complex indicators of poverty and inequality using linearization techniques. Surv. Res. Method. 3, 167195 (2009)

    Google Scholar 

  • Pfeffermann, D.: The role of sampling weights when modelling survey data. Int. Stat. Rev. 61(2), 317–337 (1993)

    Article  MATH  Google Scholar 

  • Rao, J.N.K., Wu, C.F.J.: Inference from stratified samples: second-order analysis of three methods for nonlinear statistics. J. Amer. Stat. Assoc. 80(391), 620–630 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  • Särndal, C.E., Swensson, B., Wretman, J.: Model assisted survey sampling, 2nd edn. Springer, New York (1992)

    Book  MATH  Google Scholar 

  • Serfling, R.J.: Approximation theorems of mathematical statistics. Wiley, New York (1980)

    Book  MATH  Google Scholar 

  • Shao, J.: L-statistics in complex survey problems. Ann. Stat. 22(2), 946–967 (1994)

    Article  MATH  Google Scholar 

  • Smith, T.: Influential observations in survey sampling. J. Appl. Stat. 14(2), 143–152 (1987)

    Article  Google Scholar 

  • Stigler, S.M.: The asymptotic distribution of the trimmed mean. Ann. Stat. 1(3), 472–477 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  • Victoria-Feser, M.P., Ronchetti, E.M.: Robust methods for personal-income distribution models. Can. J. Stat. 22(2), 247–258 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Wolter, K.M.: Introduction to variance estimation, 2nd edn. Springer, New York (2007)

    MATH  Google Scholar 

  • Zheng, B.: Testing lorenz curves with non-simple random samples. Econometrica 70(3), 1235–1243 (2002)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was carried out under the project AMELI (“Advanced Methodology for European Laeken Indicators”). The AMELI project was funded from the European Commission’s 7th Framework Programme. EC-Project Reference: 217322, Research area: SSH-2007-6.2-01. Visit: http://www.ameli.surveystatistics.net.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Beat Hulliger.

Appendix: Proofs

Appendix: Proofs

Proof

(Lemma 1) Let \(F \in \mathcal F \), where \(\mathcal F \) is the set of cumulative distribution functions, and denote by \(F_{\varepsilon }(y)\) the mixture distribution \(F_{\varepsilon }(y)=(1-\varepsilon )F(y) + \varepsilon G(y)\), where \(G(y)=1\!\!1\{ y \ge z\}\) is an elementary (degenerate) cdf. For \(\beta \in \mathbb Q _1\) such that \(\beta \) is not a limit point of \(F^{-1}\), the influence function writes \(IF(z, QSM(\cdot ;\beta ), F) = \partial / \partial \varepsilon \left[ \beta ^{-1}\int ^{\xi _{\beta }(F_{\varepsilon })} y \mathrm d F_{\varepsilon }(y)\right] \) for \(\varepsilon = 0\). Thus, differentiating w.r.t. \(\varepsilon \) (by means of the Leibniz integration rule) and taking \(\varepsilon \downarrow 0\), yields \(IF(z, QSM(\cdot ;\beta ), F) = \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d G(y) - \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d F(y) + \beta ^{-1}\xi _{\beta }(F)f(\xi _{\beta }(F)) \cdot \left[ \mathrm d \xi _{\beta }(F_{\varepsilon })/\mathrm d \varepsilon \right] _{\varepsilon = 0}\). Note that \([\mathrm d \xi _{\beta }(F_{\varepsilon })/ \mathrm d \varepsilon ]_{\varepsilon = 0}\) is the influence function of the \(\beta \)th quantile functional (see e.g., Huber 1981, 56–57) and defined as \(IF(z, \xi _{\beta }(\cdot ), F) = [\beta - 1\!\!1\{\xi _{\beta }(F) \ge z \}][f(\xi _{\beta }(F))]^{-1}\). Assembling all terms, \(f\) cancels out and we get the influence function which completes the proof. \(\square \)

Proof

(Lemma 3) Suppose \(\underline{\mathcal{Y }}=0\), and \(\beta _t \in \mathbb Q _1\) be associated with \(Q(F;\beta _t)\), where \(\forall \beta _t,t=1,\ldots ,p: 0<\beta _t<1;~ \beta _t\) is not a limit point of \(F^{-1}\). The \(Q(F;\beta _t)\) functional admits a first-order von Mises expansion at \(F\) around \(G\), which is given by \(Q(G;\beta _t)=Q(F;\beta _t) + \int IF(y, Q(\cdot ;\beta _t),F) \mathrm d (G-F)(y) + R(G,F)\), with \(IF\) according to Lemma 1. For ease of notation, write \(z_{hijk}=IF(y_{hijk},Q(\cdot ;\beta _t),F)\) and \(Z_{hijk}=IF(Y_{hijk},Q(\cdot ;\beta _t),F)\) (adopting the convention that capital letters denote random variables). Under the assumption \(0<\beta _t<1\) and for \(n\) sufficiently large, there exist constants \(c_t\) such that \(\inf _L F(c_t)>\beta _t, \forall t\) (where \(L \rightarrow \infty \) according to the asymptotic framework; and the fact that \(\hat{F}_L(c_t) - F_L(c_t) \rightarrow _p 0\)), then \(\{z_{hijk}\}\) is bounded. Moreover, and under the regularity conditions on the sampling design, i.e., Assumptions (A1) and (A2), Liapounov’s condition hold and we obtain for the weighted average

$$\begin{aligned} \int IF(y, Q(\cdot ;\beta ), F) \mathrm d \hat{F}(y) = 1/N \sum _{h=1}^{L} \sum _{i=1}^{n_h} \sum _{j=1}^{n_{hi}} \sum _{k=1}^{N_{hij}} w_{hijk} z_{hijk}=\overline{z}, \end{aligned}$$
(28)

and \(\mathbb{E }\overline{z} =1/N \sum _{h=1}^{L} \sum _{i=1}^{n_h} \sum _{j=1}^{n_{hi}} \sum _{k=1}^{N_{hij}} Z_{hijk}=0\) (cf. Shao 1994, Theorem 1). Thus, by (Krewski and Rao 1981, Theorem 3.1) \(\overline{z}/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)\) (since \(\mathbb E \overline{z}=0\)). For \(n\) sufficiently large, we may write \(\hat{Q}(\hat{F};\beta _t)=Q(F;\beta _t) + \overline{z} + R(\hat{F},F)\). Finally, by (Shao (1994), Theorem 1) \(\sqrt{n}R(\hat{F},F)\rightarrow _p0\), and thus \([\hat{Q}(\hat{F};\beta _t)-Q(F;\beta _t)]/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)\).

In particular, the asymptotic covariance of \(\sqrt{n}Q(F;\beta _i)\) and \(\sqrt{n}Q(F;\beta _j)\) using the result of Lemma 1 is given by

$$\begin{aligned} \omega _{\beta _i,\beta _j}=\int IF(z, Q(\cdot ;\beta _i),F)IF(z, Q(\cdot ;\beta _j),F) \mathrm d F(z) \end{aligned}$$
(29)

Given \(\beta _i \le \beta _j\) and that \(1\!\!1\{x \le \xi (F;\beta _j)\}=1\) whenever \(1\!\!1\{x \le \xi (F;\beta _i)\}=1\) the right-hand side of (29) becomes

$$\begin{aligned}&\left[ \xi _{\beta _i} - Q(F;\beta _i) \right] \left[ \xi _{\beta _j} - Q(F;\beta _j) \right] + \int \limits ^{\xi _{\beta _j}} \left[ \xi _{\beta _i} - Q(F;\beta _i)\right] \frac{1}{\beta _j} \left[ x - \xi _{\beta _j} \right] \mathrm d F(x) \nonumber \\&\quad + \int \limits ^{\xi _{\beta _i}} \left[ \frac{1}{\beta _j} \left( x - \xi _{\beta _j} \right) + \xi _{\beta _j} - Q(F;\beta _j)\right] \frac{1}{\beta _i} \left[ x - \xi _{\beta _i} \right] \mathrm d F(x) \end{aligned}$$
(30)

On simplifying (30), we obtain (in close relation to the “cumulative income functional” in Cowell and Victoria-Feser (2003, Appendix A.1))

$$\begin{aligned} \omega _{\beta _i, \beta _j}&= \frac{1}{\beta _i\beta _j}S(\beta _{i},F) + \xi _{\beta _{i}} Q(\beta _{j},F) + \xi _{\beta _{j}} Q(\beta _{i},F) - \frac{1}{\beta _{j}} Q(\beta _{i},F)\left( \xi _{\beta _{j}} + \xi _{\beta _{i}}\right) \nonumber \\&- Q(\beta _{i},F) Q(\beta _{j},F) + \xi _{\beta _{i}} \xi _{\beta _{j}} \left( \frac{1}{\beta _{j}} - 1\right) ,\qquad \text{ for } i \le j, \end{aligned}$$
(31)

where \(\omega _{\beta _i, \beta _j}\) is a short-hand notation for \(\omega _{Q(F;\beta _i), Q(F;\beta _j)}\), and \(S(\beta _i,F):=\int ^{\xi (F;\beta _i)}y^2\mathrm d F(y)\).

Thus, for each \(Q(F;\beta _i)\) with \(0 < \beta _i < 1\) (and if \(\beta _i\) is not a limit point of \(F^{-1}\)), \(i=1,\ldots ,p\), we have \(\sqrt{n} (\hat{Q}(\hat{F};\beta _i) - Q(F;\beta _i)) \rightarrow _d N(0, \omega _{\beta _i, \beta _i})\). The vector \((\hat{Q}(\hat{F};\beta _1), \ldots , \hat{Q}(\hat{F};\beta _p) )^T\) can be shown (by a Cramer-Wold device; see e.g., Serfling (1980, p.18)) to have a \(p\)-variate limiting normal distribution with covariance matrix \(\varvec{\Omega }\) whose \(i,j\)th element is equal to \(\omega _{\beta _i,\beta _j}\) for \(i \le j\). This completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hulliger, B., Schoch, T. Robust, distribution-free inference for income share ratios under complex sampling. AStA Adv Stat Anal 98, 63–85 (2014). https://doi.org/10.1007/s10182-013-0215-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-013-0215-z

Keywords

Navigation