Skip to main content
Log in

Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

Die Schätzung von Quantilen: Herausforderungen und Probleme im Kontext von Einkommens- und Vermögensverteilungen

  • Originalveröffentlichung
  • Published:
AStA Wirtschafts- und Sozialstatistisches Archiv Aims and scope Submit manuscript

Abstract

Means, quantiles and extreme values are common statistics for the description of distributions. However, estimating sample quantiles with the default definition in different software programs leads to unequal results. This is due to the fact that software programs use different quantile definitions. Since most practitioners are not aware of this fact and use different quantile definitions interchangeably, this work compares the default definitions in the software programs SPSS, R, SAS software, and Stata and additional quantile definitions that are suggested by the literature. The work especially focuses on how the quantile estimators perform in the context of describing the distribution of income and wealth. Furthermore, the possibilities of considering sampling weights in the quantile estimation and methods for producing variance estimates using the above-mentioned software are discussed.

Zusammenfassung

Mittelwerte, Quantile und Extremwerte sind übliche Statistiken, die zur Beschreibung von Verteilungen genutzt werden. Allerdings sind die Ergebnisse für Quantile, die mit verschiedener Software berechnet werden, nicht zwingend gleich. Dies ist darauf zurückzuführen, dass Quantilsdefinitionen verschiedener Software-Programme teils nicht einheitlich sind. Da diese unterschiedlichen Definitionen vielen Anwendern nicht bewusst sind und die Funktionen in der Software austauschbar genutzt werden, vergleicht diese Arbeit unterschiedliche Quantilsdefinitionen in den Software-Programmen SPSS, R, SAS Software und Stata. Außerdem werden Quantilsdefinitionen betrachtet, die in vorherigen Vergleichen in der Literatur empfohlen werden. Diese Arbeit betrachtet besonders die Güte der unterschiedlichen Quantilsdefinitionen für die Beschreibung von Einkommens- und Vermögensverteilungen. Außerdem werden Möglichkeiten zur Berücksichtigung von Survey-Gewichten bei der Quantilsschätzung, sowie zur Varianzsschätzung in den genannten Software-Programmen diskutiert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Alfons A, Templ M (2013) Estimation of social exclusion indicators from complex surveys: the R package laeken. J Stat Softw 54(15):1–25

    Article  Google Scholar 

  • Babu G (1986) A note on bootstrapping the variance of sample quantile. Ann Inst Stat Math 38(3):439–443

    Article  MathSciNet  MATH  Google Scholar 

  • Bell WR, Basel WW, Maples JJ (2016) An overview of the U.S. Census Bureau’s small area income and poverty estimates program. In: Pratesi M (ed) Analysis of poverty data by small area estimation. John Wiley & Sons, Hoboken, pp 379–403

    Google Scholar 

  • Beste J, Grabka MM, Goebel J (2018) Armut in Deutschland. AStA Wirtsch Sozialstat Arch 12(1):27–62

    Article  Google Scholar 

  • Bhat CR (1994) Imputing a continuous income variable from grouped and missing income observations. Econ Lett 46(4):311–319

    Article  MATH  Google Scholar 

  • Blom G (1958) Statistical estimates and transformed beta-variables. John Wiley & Sons, Hoboken

    MATH  Google Scholar 

  • Bundesinstitut für Bau‑, Stadt-, und Raumforschung (2017) Indikatoren und Karten zur Raum- und Stadtentwicklung. Datenlizenz Deutschland – Namensnennung – Version 2.0. http://www.inkar.de/. Accessed 12 Apr 2018

    Google Scholar 

  • Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Stat Math 63(1):157–179

    Article  MathSciNet  MATH  Google Scholar 

  • Cheung K, Lee S (2005) Variance estimation for sample quantiles using the m out of n bootstrap. Ann Inst Stat Math 57(2):279–290

    Article  MathSciNet  MATH  Google Scholar 

  • Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Datta GS, Lahiri P, Maiti T (2002) Empirical Bayes estimation of median income of four-person families by state using time series and cross-sectional data. J Stat Plan Inference 102(1):83–97

    Article  MathSciNet  MATH  Google Scholar 

  • David H, Nagaraja H (2003) Order statistics. John Wiley & Sons, Hoboken

    Book  MATH  Google Scholar 

  • Deutsche Bundesbank (2016) Vermögen und Finanzen privater Haushalte in Deutschland: Ergebnisse der Vermögensbefragung 2014. Monatsbericht, Deutsche Bundesbank

    Google Scholar 

  • Dielmann T, Lowry C, Pfaffenberger R (1994) A comparison of quantile estimators. Commun Stat Simul Comput 23(2):355–371

    Article  MATH  Google Scholar 

  • Edgeworth FY (1886) XLVI. Problems in probabilities. Lond Edinb Dublin Philos Mag J Sci 22(137):371–384

    Article  MATH  Google Scholar 

  • Eubank RL (2004) Quantiles. In: Kotz S, Read CB, Balakrishnan N, Vidakovic B, Johnson NL (eds) Encyclopedia of statistical sciences. John Wiley & Sons, Hoboken

    Google Scholar 

  • eurostat (2013) Statistik der Europäischen Union über Einkommen und Lebensbedingungen (EU-SILC). https://ec.europa.eu/eurostat/de/web/microdata/european-union-statistics-on-income-and-living-conditions. Accessed 18 Sept 2018

    Google Scholar 

  • eurostat (2018a) Distribution of income by quantiles – EU-SILC survey. http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ilc_di01&lang=en. Accessed 12 Apr 2018

    Google Scholar 

  • eurostat (2018b) Smarter, greener, more inclusive? Indicators to support the Europe 2020 strategy. Publications Office of the European Union, Luxembourg

    Google Scholar 

  • Fan J, Tang M, Tian M (2014) Kernel quantile estimator with ICI adaptive bandwidth selection technique. Acta Math Sin Engl Ser 30(4):710–722

    Article  MathSciNet  MATH  Google Scholar 

  • Forschungsdaten- und Servicezentrum (FDSZ) der Deutschen Bundesbank (2014) Panel on Household Finances (PHF) https://doi.org/10.12757/Bbk.PHF.02.02.01 (Plus one additional attribute (district code))

    Book  Google Scholar 

  • Galton F (1889) Natural inheritance. Macmillan, New York

    Book  Google Scholar 

  • Genton MG, Ma Y, Parzen E (2006) Discussion of “Sur une limitation très générale de la dispersion de la médiane” by M. Fréchet. J Soc Fr Statistique (2009) 147(2):51–60

    MATH  Google Scholar 

  • Geraci M (2016) Qtools: a collection of models and tools for quantile inference. R J 8(2):117–138

    Article  Google Scholar 

  • Graf M, Nedyalkova D (2014) Modeling of income and indicators of poverty and social exclusion using the generalized beta distribution of the second kind. Rev Income Wealth 60(4):821–842

    Google Scholar 

  • Gumbel EJ (1939) La probabilité des hypothèses. C R Acad Sci 209:645–647

    MathSciNet  MATH  Google Scholar 

  • Harrell FE, Davis C (1982) A new distribution-free quantile estimator. Biometrika 69(3):635–640

    Article  MathSciNet  MATH  Google Scholar 

  • Harrell FE Jr, Dupont C et al (2018) Hmisc: Harrell miscellaneous. R package version 4.1-1. https://CRAN.R-project.org/package=Hmisc. Accessed: 20. Nov 2017

    Google Scholar 

  • Hazen A (1914) Storage to be provided in impounding reservoirs for municipal water supply. Trans Am Soc Civ Eng 77:1539–1641

    Google Scholar 

  • Hosking J (1990) L‑moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol 52(1):105–124

    MathSciNet  MATH  Google Scholar 

  • Hyndman R, Fan Y (1996) Sample quantiles in statistical packages. Am Stat 50(4):361–365

    Google Scholar 

  • IBM (2013) IBM SPSS statistics for Windows, version 25.0

    Google Scholar 

  • Johnson NL, Kotz S (1970) Continuous univariate distributions. Houghton Mifflin Harcourt, Boston

    MATH  Google Scholar 

  • Juritz JM, Juritz JWF, Stephens M (1983) On the accuracy of simulated percentage points. J Am Stat Assoc 78(382):441–444

    Article  MathSciNet  Google Scholar 

  • Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. John Wiley & Sons, Hoboken

    Book  MATH  Google Scholar 

  • Knerr P, Aust F, Chudziak N, Gilberg R, Kleudgen M (2015) Methodenbericht – Private Haushalte und ihre Finanzen (PHF) 2. Erhebungswelle – Anonymisierte Fassung –. Methodenbericht, infas Institut für angewandte Sozialwissenschaft GmbH

    Google Scholar 

  • Kolenikov S (2017) epctile – estimation and inference for percentiles. http://staskolenikov.net/stata. Accessed: 20. Feb 2017

    Google Scholar 

  • Kreutzmann AK, Pannier S, Rojas-Perilla N, Schmid T, Templ M, Tzavidis N (2019) The R package emdi for estimating and mapping regionally disaggregated indicators. J Stat Softw.

  • Langford E (2006) Quartiles in elementary statistics. J Stat Educ 50(4):361–365

    Google Scholar 

  • Lavallée P, Beaumont JF (2015) Why we should put some weight on weights. Survey methods: insights from the field, pp 1–18

    Google Scholar 

  • Lohr SL (2010) Sampling: design and analysis. Cengage Learning, Boston

    MATH  Google Scholar 

  • Longford N (2011) Small-sample estimators of the quantiles of the normal, log-normal and Pareto distributions. J Stat Comput Simul 82(9):1383–1395

    Article  MathSciNet  MATH  Google Scholar 

  • Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(8):1–19

    Article  Google Scholar 

  • Ma Y, Genton MG, Parzen E (2011) Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63(2):227–243

    Article  MathSciNet  MATH  Google Scholar 

  • Majumder KL, Bhattacharjee GP (1973) Algorithm AS63: the incomplete beta integral. J R Stat Soc Ser C Appl Stat 22(3):409–411

    Google Scholar 

  • Makkonen L, Pajari M (2014) Defining sample quantiles by the true rank probability. J Probab Stat. https://doi.org/10.1155/2014/326579

    Article  MathSciNet  MATH  Google Scholar 

  • Marchetti S, Giusti C, Pratesi M (2016) The use of Twitter data to improve small area estimates of households’ share of food consumption expenditure in Italy. AStA Wirtsch Sozialstat Arch 10(2-3):79–93

    Article  Google Scholar 

  • Marchetti S, Beręsewicz M, Salvati N, Szymkowiak M, Wawrowski Ł (2018) The use of a three-level M‑quantile model to map poverty at local administrative unit 1 in Poland. J R Stat Soc Ser A 181(4):1–28

    Article  MathSciNet  Google Scholar 

  • McDonald J (1984) Some generalized functions for the size distribution of income. Econometrica 52(3):647–663

    Article  MATH  Google Scholar 

  • McDonald J, Bordley R (1996) Something new, something old: parametric models for the size distribution of income. J Income Distrib 6(1):91–103

    Google Scholar 

  • Muenchen RA (2017) The popularity of data science software. http://r4stats.com/articles/popularity/. Accessed 27 Feb 2018

    Google Scholar 

  • Münnich R, Burgard JP, Vogt M (2013) Small Area-Statistik: Methoden und Anwendungen. AStA Wirtsch Sozialstat Arch 6(3-4):149–191

    Article  Google Scholar 

  • Okolewski A, Rychlik T (2001) Sharp distribution-free bounds on the bias in estimating quantiles via order statistics. Stat Probab Lett 52(2):207–213

    Article  MathSciNet  MATH  Google Scholar 

  • Parrish R (1990) Comparison of quantile estimators in normal sampling. Biometrics 46(1):247–257

    Article  MATH  Google Scholar 

  • Parzen E (1979) Nonparametric statistical data modeling. J Am Stat Assoc 74(365):105–121

    Article  MathSciNet  MATH  Google Scholar 

  • Phien H (1990) A note on the computation of the incomplete beta function. Adv Eng Softw 12(1):39–44

    Article  Google Scholar 

  • R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (https://www.R-project.org/)

    Google Scholar 

  • Rust KF, Rao JNK (1996) Variance estimation for complex surveys using replication techniques. Stat Methods Med Res 5(3):283–310

    Article  Google Scholar 

  • SAS Institute Inc (2018) Version 9.4 of the SAS system

    Google Scholar 

  • Schmid T, Bruckschen F, Salvati N, Zbiranski T (2017) Constructing sociodemographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J R Stat Soc Ser A 180(4):1163–1190

    Article  MathSciNet  Google Scholar 

  • Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22(5):750–751

    Article  Google Scholar 

  • Sfakianakis M, Verginis D (2008) A new family of nonparametric quantile estimators. Commun Stat Simul Comput 37(2):337–345

    Article  MathSciNet  MATH  Google Scholar 

  • Shao J (1988) A note on bootstrap variance estimation. Technical report, Purdue University

    Book  Google Scholar 

  • Shao J, Wu C (1989) A general theory for jackknife variance estimation. Ann Stat 17(3):1176–1197

    Article  MathSciNet  MATH  Google Scholar 

  • Shao J, Wu C (1992) Asymptotic properties of the balanced repeated replication method for sample quantiles. Ann Stat 20(3):1571–1593

    Article  MathSciNet  MATH  Google Scholar 

  • Sheather S, Marron J (1990) Kernel quantile estimators. J Am Stat Assoc 85(410):410–416

    Article  MathSciNet  MATH  Google Scholar 

  • StataCorp (2015) Stata statistical software: release 15. StataCorp LLC, College Station

    Google Scholar 

  • Steinhauer HW, Aßmann C, Zinn S, Goßmann S, Rässler S (2015) Sampling and weighting cohort samples in institutional contexts. AStA Wirtsch Sozialstat Arch 9(2):131–157

    Article  Google Scholar 

  • Tzavidis N, Zhang LC, Luna A, Schmid T, Rojas-Perilla N (2018) From start to finish: a framework for the production of small area official statistics. J R Stat Soc Ser A 181(4):927–979

    Article  MathSciNet  Google Scholar 

  • Vélez JI, Correa JC (2014) Should we think of a different median estimator? Comun Estad 7(1):11–17

    Google Scholar 

  • Walker AM (1968) A note on the asymptotic distribution of sample quantiles. J R Stat Soc Series B Stat Methodol 30(3):570–575

    MathSciNet  MATH  Google Scholar 

  • Wei L, Wang D, Hutson A (2015) An investigation of quantile function estimators relative to quantile confidence interval coverage. Commun Stat Theory Methods 44(10):2107–2135

    Article  MathSciNet  MATH  Google Scholar 

  • Weibull W (1939) The phenomenon of rupture in solids. Ing Vetensk Akad Handl 17(153):1–55

    Google Scholar 

  • Wolter K (2007) Introduction to variance estimation. Springer, New York

    MATH  Google Scholar 

  • Yang S (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80(392):1004–1011

    Article  MathSciNet  MATH  Google Scholar 

  • Yoshizawa C, Sen P, Davis E (1985) Asymptotic equivalence of the Harrel-Davis median estimator and the sample median. Commun Stat Theory Methods 14(9):2129–2136

    Article  MATH  Google Scholar 

Download references

Acknowledgements

I gratefully acknowledge support by the German Research Foundation within the project QUESSAMI (281573942) and by the MIUR-DAAD Joint Mobility Program (57265468). This work uses data from the Deutsche Bundesbank Panel on Household Finances. The results published and the related observations and analysis may not correspond to results or analysis of the data producers. I thank the editors and the referees for their constructive comments that helped to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ann-Kristin Kreutzmann.

Caption Electronic Supplementary Material

Appendix A

Appendix A

For the sake of completeness, the expressions of quantile estimators that are introduced in Table 2 but not mentioned in the text are shown in this Appendix. Furthermore, the six properties that are used by Hyndman and Fan (1996) are summarized.

1.1 A1  Inverse of the empirical cumulative distribution function

Dielmann et al. (1994) states that this quantile estimator is neither mean nor median unbiased. For a further discussion of its properties we refer to Juritz et al. (1983).

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{1}\quad&\text{if}\quad p=0;\\ X_{(i)}\quad&\text{if}\quad 0<p\leq 1\quad\text{and}\quad g=0;\\ X_{(i+1)}\quad&\text{if}\quad 0<p\leq 1\quad\text{and}\quad g\neq 0,\end{cases}\end{aligned}$$

where \(i=\lfloor np\rfloor\) and \(g=np-i\).

1.2 A2  Observation closest to \(np\)

This definition crucially depends on the rounding. While in R and SAS the rounding takes place to the next even integer, the definition in SPSS differs from the one below since it uses simple rounding.

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p\leq\frac{0.5}{n};\\ X_{(i)}\quad&\text{if}\quad\frac{0.5}{n}<p\leq 1,\quad i\text{ is even and}\quad g=0;\\ X_{(i+1)}\quad&\text{if}\quad\frac{0.5}{n}<p\leq 1,\quad i\text{ is odd and}\quad g\neq 0,\end{cases}\end{aligned}$$

where \(i=\lfloor np\rfloor\) and \(g=np-0.5-i\).

1.3 A3  Linear interpolation of the empirical distribution function

This definition is proposed by Parzen (1979).

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p<\frac{1}{n};\\ (1-\gamma)X_{(i)}+\gamma X_{(i+1)}\quad&\text{if}\quad\frac{1}{n}\leq p<1;\\ X_{(n)}\quad&\text{if}\quad p=1,\end{cases}\end{aligned}$$

where \(i=\lfloor np_{k}\rfloor\), \(p_{k}=\frac{np}{n}\), \(\gamma=np_{k}-i\).

1.4 A4  Approximation to \(F(E(X_{k}))\) for the normal distribution

This definition is especially preferable when the underlying distribution is normal (Blom 1958). Thus, it is often used for normal quantile-quantile plots.

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p<\frac{5/8}{n+1/4};\\ (1-\gamma)X_{(i)}+\gamma X_{(i+1)}\quad&\text{if}\quad\frac{5/8}{n+1/4}\leq p<\frac{n-3/8}{n+1/4};\\ X_{(n)}\quad&\text{if}\quad p\geq\frac{n-3/8}{n+1/4},\end{cases}\end{aligned}$$

where \(i=\lfloor np_{k}+\frac{p}{4}+\frac{3}{8}\rfloor\), \(p_{k}=\frac{\left(np+\frac{p}{4}+\frac{3}{8}\right)-\frac{3}{8}}{n+\frac{1}{4}}\), \(\gamma=np_{k}+\frac{p}{4}+\frac{3}{8}-i\).

1.5 A5  Six desirable properties for sample quantile

 

Table 6 Replication of Table 1 in Hyndman and Fan (1996) that shows their definition of six desirable properties for a sample quantile. For more information about the properties it is referred to Hyndman and Fan (1996)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kreutzmann, AK. Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions. AStA Wirtsch Sozialstat Arch 12, 245–270 (2018). https://doi.org/10.1007/s11943-018-0234-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11943-018-0234-z

Keywords

Schlüsselwörter

Navigation