Robust, distribution-free inference for income share ratios under complex sampling

Hulliger, Beat; Schoch, Tobias

doi:10.1007/s10182-013-0215-z

Robust, distribution-free inference for income share ratios under complex sampling

Original Paper
Published: 23 May 2013

Volume 98, pages 63–85, (2014)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Beat Hulliger¹ &
Tobias Schoch¹

232 Accesses
1 Citation
Explore all metrics

Abstract

The quintile share ratio of disposable income is the primary inequality indicator of the European Union. As an inequality indicator, it must be sensitive to extreme large observations. Therefore, outliers have a strong impact on the bias and the variance of the classical quintile share ratio estimator. This may mislead the interpretation of income inequality. A class of estimators which are robust against outliers is introduced. They have a bounded influence function, they may reduce the bias incurred by the robustification and they reduce variability. Based on an asymptotic framework which respects the design-based, non-parametric approach, inference for these robust estimators is developed. A large simulation study with close to reality universes derived from the Statistics of Living Conditions Surveys of the EU allows to study the performance of the proposed estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

Article 10 February 2017

Luke A. Prendergast & Robert G. Staudte

New non-parametric inferences for low-income proportions

Article 18 February 2016

Shan Luo & Gengsheng Qin

Interval Estimators for Inequality Measures Using Grouped Data

References

Alfons, A., Filzmoser, P., Hulliger, B., Kolb, JP., Kraft, S., Münnich, R., Templ, M.: Synthetic data generation of silc data. Research Project Report WP6 - D6.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011a)
Alfons, A., Kraft, S., Templ, M., Filzmoser, P.: Simulation of close-to-reality population data for household surveys with application to EU-SILC. Stat. Method Appl. 20(3), 383–407. doi:10.1007/s10260-011-0163-2 (2011b)
Atkinson, T., Cantillon, B., Marlier, E., Nolan, B.: Social indicators: the EU and social inclusion. Oxford University Press, Oxford (2002)
Book Google Scholar
Beaumont, J.F., Rivest, L.P.: Dealing with outliers in survey data. In: Pfeffermann, D., Rao, C. (eds.) Sample surveys: theory, methods and inference, Handbook of Statistics, vol. 29A, chap 11. Elsevier, Amsterdam, pp. 247–280 (2009)
Binder, D.A., Patak, Z.: Use of estimating functions for estimation from complex surveys. J. Am. Stat. Assoc. 89(427), 1035–1043 (1994)
Article MATH MathSciNet Google Scholar
Bowley, A.L.: Elements of statistics. Charles Scribner’s Sons, New York (1920)
Google Scholar
Bruch, C., Münnich, R., Zins, S.: Variance estimation for complex surveys. Tech. rep., AMELI deliverable D3.1, http://ameli.surveystatistics.net/ (2011)
Chambers, R.L.: Outlier robust finite population estimation. J. Am. Stat. Assoc. 81(396), 1063–1069 (1986)
Article MATH MathSciNet Google Scholar
Cowell, F.A., Flachaire, E.: Income distribution and inequality measurement: The problem of extreme values. J. Econom. 141, 1044–1072 (2007)
Article MathSciNet Google Scholar
Cowell, F.A., Victoria-Feser, M.P.: Robustness properties of inequality measures. Econometrica 64(1), 77–101 (1996)
Article MATH Google Scholar
Cowell, F.A., Victoria-Feser, M.P.: Welfare rankings in the presence of contaminated data. Econometrica 70(3), 1221–1233 (2002)
Article MATH MathSciNet Google Scholar
Cowell, F.A., Victoria-Feser, M.P.: Distribution-free inference for welfare indices under complete and incomplete information. J. Econ. Inequal. 1, 191–219 (2003)
Article Google Scholar
Cowell, F.A., Victoria-Feser, M.P.: Distributional dominance with trimmed data. J. Bus. Econ. Stat 24(3), 291–300 (2006)
Article MathSciNet Google Scholar
David, HA., Nagaraja, HN.: Order Statistics, 3rd edn. Wiley, Hoboken (2003)
Deaton, A.: The analysis of household surveys: a microeconomic approach to development policy, 3rd edn. World Bank Publications, The Johns Hopkins University Press, Baltimore (2000)
Google Scholar
Deville, J.C., Särndal, C.E.: Calibration estimators in survey sampling. J. Am. Stat. Assoc. 87(418), 376–382 (1992)
Article MATH Google Scholar
European Commission: Laeken indicators. Detailed calculation methodology. Tech. rep., EUROSTAT working group statistics on income, poverty and social exclusion, Luxembourg. DOC. E2/IPSE/2003 (2003)
Fuller, W.A.: Simple estimators for the mean of skewed populations. Statistica Sinica 1, 137–158 (1991)
MATH MathSciNet Google Scholar
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust statistics: the approach based on influence functions. Wiley, New York (1986)
MATH Google Scholar
Huber, P.J.: Robust statistics. Wiley, New York (1981)
Book MATH Google Scholar
Hulliger, B.: Outlier robust Horvitz-Thompson estimators. Surv. Methodol. 21(1), 79–87 (1995)
Google Scholar
Hulliger, B., Münnich, R.: Variance estimation for complex surveys in the presence of outliers. In: ASA Proceedings of the Section on Survey Research Methods (2006) American Statistical Association, In (2006)
Hulliger, B., Schoch, T.: Robustification of the quintile share ratio. In: Proceedings of the NTTS Conference—New Techniques and Technologies for Statistics, Eurostat, Brussels (2009)
Hulliger, B., Alfons, A., Filzmoser, P., Meraner, A., Schoch, T., Templ, M.: Robust methodology for laeken indicators. Tech. rep., Research Project Report WP4 D4.2, FP7-SSH-2007-217322 AMELI. http://ameli.surveystatistics.net (2011)
Krewski, D., Rao, J.: Inference from stratified samples: properties of the linearization, jackknife, and balanced repeated replication method. Ann. Stat. 9(5), 1010–1019 (1981)
Article MATH MathSciNet Google Scholar
Langel, M., Tillé, Y.: Statistical inference for the quintile share ratio. J. Stat. Plan. Infer. 141, 2976–2985 (2011)
Article MATH Google Scholar
Moreno-Rebollo, J., Muñoz-Reyes, A., Muñoz-Pichardo, J.: Miscellanea: influence diagnostic in survey sampling: conditional bias. Biometrika 86(4), 923–928 (1999). doi:10.1093/biomet/86.4.923
Article MATH MathSciNet Google Scholar
Moreno-Rebollo, JL., Muñoz-Reyes, A., Jiménez-Gamero, MD., Muñoz-Pichardo, J.: Influence diagnostic in survey sampling: Estimating the conditional bias. Metrika 55(3):209–214, doi:10.1007/s001840100142 (2002)
Google Scholar
Nygård, F., Sandström, A.: Income inequality measures based on sample surveys. J. Econom. 42, 81–95 (1989)
Article MATH Google Scholar
Osier, G.: Variance estimation for complex indicators of poverty and inequality using linearization techniques. Surv. Res. Method. 3, 167195 (2009)
Google Scholar
Pfeffermann, D.: The role of sampling weights when modelling survey data. Int. Stat. Rev. 61(2), 317–337 (1993)
Article MATH Google Scholar
Rao, J.N.K., Wu, C.F.J.: Inference from stratified samples: second-order analysis of three methods for nonlinear statistics. J. Amer. Stat. Assoc. 80(391), 620–630 (1985)
Article MATH MathSciNet Google Scholar
Särndal, C.E., Swensson, B., Wretman, J.: Model assisted survey sampling, 2nd edn. Springer, New York (1992)
Book MATH Google Scholar
Serfling, R.J.: Approximation theorems of mathematical statistics. Wiley, New York (1980)
Book MATH Google Scholar
Shao, J.: L-statistics in complex survey problems. Ann. Stat. 22(2), 946–967 (1994)
Article MATH Google Scholar
Smith, T.: Influential observations in survey sampling. J. Appl. Stat. 14(2), 143–152 (1987)
Article Google Scholar
Stigler, S.M.: The asymptotic distribution of the trimmed mean. Ann. Stat. 1(3), 472–477 (1973)
Article MATH MathSciNet Google Scholar
Victoria-Feser, M.P., Ronchetti, E.M.: Robust methods for personal-income distribution models. Can. J. Stat. 22(2), 247–258 (1994)
Article MATH MathSciNet Google Scholar
Wolter, K.M.: Introduction to variance estimation, 2nd edn. Springer, New York (2007)
MATH Google Scholar
Zheng, B.: Testing lorenz curves with non-simple random samples. Econometrica 70(3), 1235–1243 (2002)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work was carried out under the project AMELI (“Advanced Methodology for European Laeken Indicators”). The AMELI project was funded from the European Commission’s 7th Framework Programme. EC-Project Reference: 217322, Research area: SSH-2007-6.2-01. Visit: http://www.ameli.surveystatistics.net.

Author information

Authors and Affiliations

School of Business, University of Northwestern Switzerland (FHNW), Riggenbachstrasse 16, 4600, Olten, Switzerland
Beat Hulliger & Tobias Schoch

Authors

Beat Hulliger
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Schoch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Beat Hulliger.

Appendix: Proofs

Proof

(Lemma 1) Let $F \in \mathcal F $, where $\mathcal F $ is the set of cumulative distribution functions, and denote by $F_{\varepsilon }(y)$ the mixture distribution $F_{\varepsilon }(y)=(1-\varepsilon )F(y) + \varepsilon G(y)$, where $G(y)=1\!\!1\{ y \ge z\}$ is an elementary (degenerate) cdf. For $\beta \in \mathbb Q _1$ such that $\beta $ is not a limit point of $F^{-1}$, the influence function writes $IF(z, QSM(\cdot ;\beta ), F) = \partial / \partial \varepsilon \left[ \beta ^{-1}\int ^{\xi _{\beta }(F_{\varepsilon })} y \mathrm d F_{\varepsilon }(y)\right] $ for $\varepsilon = 0$. Thus, differentiating w.r.t. $\varepsilon $ (by means of the Leibniz integration rule) and taking $\varepsilon \downarrow 0$, yields $IF(z, QSM(\cdot ;\beta ), F) = \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d G(y) - \beta ^{-1} \int ^{\xi _{\beta }(F)} y \mathrm d F(y) + \beta ^{-1}\xi _{\beta }(F)f(\xi _{\beta }(F)) \cdot \left[ \mathrm d \xi _{\beta }(F_{\varepsilon })/\mathrm d \varepsilon \right] _{\varepsilon = 0}$. Note that $[\mathrm d \xi _{\beta }(F_{\varepsilon })/ \mathrm d \varepsilon ]_{\varepsilon = 0}$ is the influence function of the $\beta $th quantile functional (see e.g., Huber 1981, 56–57) and defined as $IF(z, \xi _{\beta }(\cdot ), F) = [\beta - 1\!\!1\{\xi _{\beta }(F) \ge z \}][f(\xi _{\beta }(F))]^{-1}$. Assembling all terms, $f$ cancels out and we get the influence function which completes the proof. $\square $

Proof

(Lemma 3) Suppose $\underline{\mathcal{Y }}=0$, and $\beta _t \in \mathbb Q _1$ be associated with $Q(F;\beta _t)$, where $\forall \beta _t,t=1,\ldots ,p: 0<\beta _t<1;~ \beta _t$ is not a limit point of $F^{-1}$. The $Q(F;\beta _t)$ functional admits a first-order von Mises expansion at $F$ around $G$, which is given by $Q(G;\beta _t)=Q(F;\beta _t) + \int IF(y, Q(\cdot ;\beta _t),F) \mathrm d (G-F)(y) + R(G,F)$, with $IF$ according to Lemma 1. For ease of notation, write $z_{hijk}=IF(y_{hijk},Q(\cdot ;\beta _t),F)$ and $Z_{hijk}=IF(Y_{hijk},Q(\cdot ;\beta _t),F)$ (adopting the convention that capital letters denote random variables). Under the assumption $0<\beta _t<1$ and for $n$ sufficiently large, there exist constants $c_t$ such that $\inf _L F(c_t)>\beta _t, \forall t$ (where $L \rightarrow \infty $ according to the asymptotic framework; and the fact that $\hat{F}_L(c_t) - F_L(c_t) \rightarrow _p 0$), then $\{z_{hijk}\}$ is bounded. Moreover, and under the regularity conditions on the sampling design, i.e., Assumptions (A1) and (A2), Liapounov’s condition hold and we obtain for the weighted average

$$\begin{aligned} \int IF(y, Q(\cdot ;\beta ), F) \mathrm d \hat{F}(y) = 1/N \sum _{h=1}^{L} \sum _{i=1}^{n_h} \sum _{j=1}^{n_{hi}} \sum _{k=1}^{N_{hij}} w_{hijk} z_{hijk}=\overline{z}, \end{aligned}$$

(28)

and $\mathbb{E }\overline{z} =1/N \sum _{h=1}^{L} \sum _{i=1}^{n_h} \sum _{j=1}^{n_{hi}} \sum _{k=1}^{N_{hij}} Z_{hijk}=0$ (cf. Shao 1994, Theorem 1). Thus, by (Krewski and Rao 1981, Theorem 3.1) $\overline{z}/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)$ (since $\mathbb E \overline{z}=0$). For $n$ sufficiently large, we may write $\hat{Q}(\hat{F};\beta _t)=Q(F;\beta _t) + \overline{z} + R(\hat{F},F)$. Finally, by (Shao (1994), Theorem 1) $\sqrt{n}R(\hat{F},F)\rightarrow _p0$, and thus $[\hat{Q}(\hat{F};\beta _t)-Q(F;\beta _t)]/\sigma (Q(\cdot ;\beta _t),F) \rightarrow _d N(0,1)$.

In particular, the asymptotic covariance of $\sqrt{n}Q(F;\beta _i)$ and $\sqrt{n}Q(F;\beta _j)$ using the result of Lemma 1 is given by

$$\begin{aligned} \omega _{\beta _i,\beta _j}=\int IF(z, Q(\cdot ;\beta _i),F)IF(z, Q(\cdot ;\beta _j),F) \mathrm d F(z) \end{aligned}$$

(29)

Given $\beta _i \le \beta _j$ and that $1\!\!1\{x \le \xi (F;\beta _j)\}=1$ whenever $1\!\!1\{x \le \xi (F;\beta _i)\}=1$ the right-hand side of (29) becomes

$$\begin{aligned}&\left[ \xi _{\beta _i} - Q(F;\beta _i) \right] \left[ \xi _{\beta _j} - Q(F;\beta _j) \right] + \int \limits ^{\xi _{\beta _j}} \left[ \xi _{\beta _i} - Q(F;\beta _i)\right] \frac{1}{\beta _j} \left[ x - \xi _{\beta _j} \right] \mathrm d F(x) \nonumber \\&\quad + \int \limits ^{\xi _{\beta _i}} \left[ \frac{1}{\beta _j} \left( x - \xi _{\beta _j} \right) + \xi _{\beta _j} - Q(F;\beta _j)\right] \frac{1}{\beta _i} \left[ x - \xi _{\beta _i} \right] \mathrm d F(x) \end{aligned}$$

(30)

On simplifying (30), we obtain (in close relation to the “cumulative income functional” in Cowell and Victoria-Feser (2003, Appendix A.1))

$$\begin{aligned} \omega _{\beta _i, \beta _j}&= \frac{1}{\beta _i\beta _j}S(\beta _{i},F) + \xi _{\beta _{i}} Q(\beta _{j},F) + \xi _{\beta _{j}} Q(\beta _{i},F) - \frac{1}{\beta _{j}} Q(\beta _{i},F)\left( \xi _{\beta _{j}} + \xi _{\beta _{i}}\right) \nonumber \\&- Q(\beta _{i},F) Q(\beta _{j},F) + \xi _{\beta _{i}} \xi _{\beta _{j}} \left( \frac{1}{\beta _{j}} - 1\right) ,\qquad \text{ for } i \le j, \end{aligned}$$

(31)

where $\omega _{\beta _i, \beta _j}$ is a short-hand notation for $\omega _{Q(F;\beta _i), Q(F;\beta _j)}$, and $S(\beta _i,F):=\int ^{\xi (F;\beta _i)}y^2\mathrm d F(y)$.

Thus, for each $Q(F;\beta _i)$ with $0 < \beta _i < 1$ (and if $\beta _i$ is not a limit point of $F^{-1}$), $i=1,\ldots ,p$, we have $\sqrt{n} (\hat{Q}(\hat{F};\beta _i) - Q(F;\beta _i)) \rightarrow _d N(0, \omega _{\beta _i, \beta _i})$. The vector $(\hat{Q}(\hat{F};\beta _1), \ldots , \hat{Q}(\hat{F};\beta _p) )^T$ can be shown (by a Cramer-Wold device; see e.g., Serfling (1980, p.18)) to have a $p$-variate limiting normal distribution with covariance matrix $\varvec{\Omega }$ whose $i,j$th element is equal to $\omega _{\beta _i,\beta _j}$ for $i \le j$. This completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hulliger, B., Schoch, T. Robust, distribution-free inference for income share ratios under complex sampling. AStA Adv Stat Anal 98, 63–85 (2014). https://doi.org/10.1007/s10182-013-0215-z

Download citation

Received: 19 July 2012
Accepted: 07 May 2013
Published: 23 May 2013
Issue Date: January 2014
DOI: https://doi.org/10.1007/s10182-013-0215-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust, distribution-free inference for income share ratios under complex sampling

Abstract

Access this article

Similar content being viewed by others

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

New non-parametric inferences for low-income proportions

Interval Estimators for Inequality Measures Using Grouped Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust, distribution-free inference for income share ratios under complex sampling

Abstract

Access this article

Similar content being viewed by others

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

New non-parametric inferences for low-income proportions

Interval Estimators for Inequality Measures Using Grouped Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs

Appendix: Proofs

Proof

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation