Abstract
Means, quantiles and extreme values are common statistics for the description of distributions. However, estimating sample quantiles with the default definition in different software programs leads to unequal results. This is due to the fact that software programs use different quantile definitions. Since most practitioners are not aware of this fact and use different quantile definitions interchangeably, this work compares the default definitions in the software programs SPSS, R, SAS™ software, and Stata and additional quantile definitions that are suggested by the literature. The work especially focuses on how the quantile estimators perform in the context of describing the distribution of income and wealth. Furthermore, the possibilities of considering sampling weights in the quantile estimation and methods for producing variance estimates using the above-mentioned software are discussed.
Zusammenfassung
Mittelwerte, Quantile und Extremwerte sind übliche Statistiken, die zur Beschreibung von Verteilungen genutzt werden. Allerdings sind die Ergebnisse für Quantile, die mit verschiedener Software berechnet werden, nicht zwingend gleich. Dies ist darauf zurückzuführen, dass Quantilsdefinitionen verschiedener Software-Programme teils nicht einheitlich sind. Da diese unterschiedlichen Definitionen vielen Anwendern nicht bewusst sind und die Funktionen in der Software austauschbar genutzt werden, vergleicht diese Arbeit unterschiedliche Quantilsdefinitionen in den Software-Programmen SPSS, R, SAS™ Software und Stata. Außerdem werden Quantilsdefinitionen betrachtet, die in vorherigen Vergleichen in der Literatur empfohlen werden. Diese Arbeit betrachtet besonders die Güte der unterschiedlichen Quantilsdefinitionen für die Beschreibung von Einkommens- und Vermögensverteilungen. Außerdem werden Möglichkeiten zur Berücksichtigung von Survey-Gewichten bei der Quantilsschätzung, sowie zur Varianzsschätzung in den genannten Software-Programmen diskutiert.
Similar content being viewed by others
References
Alfons A, Templ M (2013) Estimation of social exclusion indicators from complex surveys: the R package laeken. J Stat Softw 54(15):1–25
Babu G (1986) A note on bootstrapping the variance of sample quantile. Ann Inst Stat Math 38(3):439–443
Bell WR, Basel WW, Maples JJ (2016) An overview of the U.S. Census Bureau’s small area income and poverty estimates program. In: Pratesi M (ed) Analysis of poverty data by small area estimation. John Wiley & Sons, Hoboken, pp 379–403
Beste J, Grabka MM, Goebel J (2018) Armut in Deutschland. AStA Wirtsch Sozialstat Arch 12(1):27–62
Bhat CR (1994) Imputing a continuous income variable from grouped and missing income observations. Econ Lett 46(4):311–319
Blom G (1958) Statistical estimates and transformed beta-variables. John Wiley & Sons, Hoboken
Bundesinstitut für Bau‑, Stadt-, und Raumforschung (2017) Indikatoren und Karten zur Raum- und Stadtentwicklung. Datenlizenz Deutschland – Namensnennung – Version 2.0. http://www.inkar.de/. Accessed 12 Apr 2018
Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Stat Math 63(1):157–179
Cheung K, Lee S (2005) Variance estimation for sample quantiles using the m out of n bootstrap. Ann Inst Stat Math 57(2):279–290
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
Datta GS, Lahiri P, Maiti T (2002) Empirical Bayes estimation of median income of four-person families by state using time series and cross-sectional data. J Stat Plan Inference 102(1):83–97
David H, Nagaraja H (2003) Order statistics. John Wiley & Sons, Hoboken
Deutsche Bundesbank (2016) Vermögen und Finanzen privater Haushalte in Deutschland: Ergebnisse der Vermögensbefragung 2014. Monatsbericht, Deutsche Bundesbank
Dielmann T, Lowry C, Pfaffenberger R (1994) A comparison of quantile estimators. Commun Stat Simul Comput 23(2):355–371
Edgeworth FY (1886) XLVI. Problems in probabilities. Lond Edinb Dublin Philos Mag J Sci 22(137):371–384
Eubank RL (2004) Quantiles. In: Kotz S, Read CB, Balakrishnan N, Vidakovic B, Johnson NL (eds) Encyclopedia of statistical sciences. John Wiley & Sons, Hoboken
eurostat (2013) Statistik der Europäischen Union über Einkommen und Lebensbedingungen (EU-SILC). https://ec.europa.eu/eurostat/de/web/microdata/european-union-statistics-on-income-and-living-conditions. Accessed 18 Sept 2018
eurostat (2018a) Distribution of income by quantiles – EU-SILC survey. http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ilc_di01&lang=en. Accessed 12 Apr 2018
eurostat (2018b) Smarter, greener, more inclusive? Indicators to support the Europe 2020 strategy. Publications Office of the European Union, Luxembourg
Fan J, Tang M, Tian M (2014) Kernel quantile estimator with ICI adaptive bandwidth selection technique. Acta Math Sin Engl Ser 30(4):710–722
Forschungsdaten- und Servicezentrum (FDSZ) der Deutschen Bundesbank (2014) Panel on Household Finances (PHF) https://doi.org/10.12757/Bbk.PHF.02.02.01 (Plus one additional attribute (district code))
Galton F (1889) Natural inheritance. Macmillan, New York
Genton MG, Ma Y, Parzen E (2006) Discussion of “Sur une limitation très générale de la dispersion de la médiane” by M. Fréchet. J Soc Fr Statistique (2009) 147(2):51–60
Geraci M (2016) Qtools: a collection of models and tools for quantile inference. R J 8(2):117–138
Graf M, Nedyalkova D (2014) Modeling of income and indicators of poverty and social exclusion using the generalized beta distribution of the second kind. Rev Income Wealth 60(4):821–842
Gumbel EJ (1939) La probabilité des hypothèses. C R Acad Sci 209:645–647
Harrell FE, Davis C (1982) A new distribution-free quantile estimator. Biometrika 69(3):635–640
Harrell FE Jr, Dupont C et al (2018) Hmisc: Harrell miscellaneous. R package version 4.1-1. https://CRAN.R-project.org/package=Hmisc. Accessed: 20. Nov 2017
Hazen A (1914) Storage to be provided in impounding reservoirs for municipal water supply. Trans Am Soc Civ Eng 77:1539–1641
Hosking J (1990) L‑moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol 52(1):105–124
Hyndman R, Fan Y (1996) Sample quantiles in statistical packages. Am Stat 50(4):361–365
IBM (2013) IBM SPSS statistics for Windows, version 25.0
Johnson NL, Kotz S (1970) Continuous univariate distributions. Houghton Mifflin Harcourt, Boston
Juritz JM, Juritz JWF, Stephens M (1983) On the accuracy of simulated percentage points. J Am Stat Assoc 78(382):441–444
Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. John Wiley & Sons, Hoboken
Knerr P, Aust F, Chudziak N, Gilberg R, Kleudgen M (2015) Methodenbericht – Private Haushalte und ihre Finanzen (PHF) 2. Erhebungswelle – Anonymisierte Fassung –. Methodenbericht, infas Institut für angewandte Sozialwissenschaft GmbH
Kolenikov S (2017) epctile – estimation and inference for percentiles. http://staskolenikov.net/stata. Accessed: 20. Feb 2017
Kreutzmann AK, Pannier S, Rojas-Perilla N, Schmid T, Templ M, Tzavidis N (2019) The R package emdi for estimating and mapping regionally disaggregated indicators. J Stat Softw.
Langford E (2006) Quartiles in elementary statistics. J Stat Educ 50(4):361–365
Lavallée P, Beaumont JF (2015) Why we should put some weight on weights. Survey methods: insights from the field, pp 1–18
Lohr SL (2010) Sampling: design and analysis. Cengage Learning, Boston
Longford N (2011) Small-sample estimators of the quantiles of the normal, log-normal and Pareto distributions. J Stat Comput Simul 82(9):1383–1395
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(8):1–19
Ma Y, Genton MG, Parzen E (2011) Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63(2):227–243
Majumder KL, Bhattacharjee GP (1973) Algorithm AS63: the incomplete beta integral. J R Stat Soc Ser C Appl Stat 22(3):409–411
Makkonen L, Pajari M (2014) Defining sample quantiles by the true rank probability. J Probab Stat. https://doi.org/10.1155/2014/326579
Marchetti S, Giusti C, Pratesi M (2016) The use of Twitter data to improve small area estimates of households’ share of food consumption expenditure in Italy. AStA Wirtsch Sozialstat Arch 10(2-3):79–93
Marchetti S, Beręsewicz M, Salvati N, Szymkowiak M, Wawrowski Ł (2018) The use of a three-level M‑quantile model to map poverty at local administrative unit 1 in Poland. J R Stat Soc Ser A 181(4):1–28
McDonald J (1984) Some generalized functions for the size distribution of income. Econometrica 52(3):647–663
McDonald J, Bordley R (1996) Something new, something old: parametric models for the size distribution of income. J Income Distrib 6(1):91–103
Muenchen RA (2017) The popularity of data science software. http://r4stats.com/articles/popularity/. Accessed 27 Feb 2018
Münnich R, Burgard JP, Vogt M (2013) Small Area-Statistik: Methoden und Anwendungen. AStA Wirtsch Sozialstat Arch 6(3-4):149–191
Okolewski A, Rychlik T (2001) Sharp distribution-free bounds on the bias in estimating quantiles via order statistics. Stat Probab Lett 52(2):207–213
Parrish R (1990) Comparison of quantile estimators in normal sampling. Biometrics 46(1):247–257
Parzen E (1979) Nonparametric statistical data modeling. J Am Stat Assoc 74(365):105–121
Phien H (1990) A note on the computation of the incomplete beta function. Adv Eng Softw 12(1):39–44
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (https://www.R-project.org/)
Rust KF, Rao JNK (1996) Variance estimation for complex surveys using replication techniques. Stat Methods Med Res 5(3):283–310
SAS Institute Inc (2018) Version 9.4 of the SAS system
Schmid T, Bruckschen F, Salvati N, Zbiranski T (2017) Constructing sociodemographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J R Stat Soc Ser A 180(4):1163–1190
Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22(5):750–751
Sfakianakis M, Verginis D (2008) A new family of nonparametric quantile estimators. Commun Stat Simul Comput 37(2):337–345
Shao J (1988) A note on bootstrap variance estimation. Technical report, Purdue University
Shao J, Wu C (1989) A general theory for jackknife variance estimation. Ann Stat 17(3):1176–1197
Shao J, Wu C (1992) Asymptotic properties of the balanced repeated replication method for sample quantiles. Ann Stat 20(3):1571–1593
Sheather S, Marron J (1990) Kernel quantile estimators. J Am Stat Assoc 85(410):410–416
StataCorp (2015) Stata statistical software: release 15. StataCorp LLC, College Station
Steinhauer HW, Aßmann C, Zinn S, Goßmann S, Rässler S (2015) Sampling and weighting cohort samples in institutional contexts. AStA Wirtsch Sozialstat Arch 9(2):131–157
Tzavidis N, Zhang LC, Luna A, Schmid T, Rojas-Perilla N (2018) From start to finish: a framework for the production of small area official statistics. J R Stat Soc Ser A 181(4):927–979
Vélez JI, Correa JC (2014) Should we think of a different median estimator? Comun Estad 7(1):11–17
Walker AM (1968) A note on the asymptotic distribution of sample quantiles. J R Stat Soc Series B Stat Methodol 30(3):570–575
Wei L, Wang D, Hutson A (2015) An investigation of quantile function estimators relative to quantile confidence interval coverage. Commun Stat Theory Methods 44(10):2107–2135
Weibull W (1939) The phenomenon of rupture in solids. Ing Vetensk Akad Handl 17(153):1–55
Wolter K (2007) Introduction to variance estimation. Springer, New York
Yang S (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80(392):1004–1011
Yoshizawa C, Sen P, Davis E (1985) Asymptotic equivalence of the Harrel-Davis median estimator and the sample median. Commun Stat Theory Methods 14(9):2129–2136
Acknowledgements
I gratefully acknowledge support by the German Research Foundation within the project QUESSAMI (281573942) and by the MIUR-DAAD Joint Mobility Program (57265468). This work uses data from the Deutsche Bundesbank Panel on Household Finances. The results published and the related observations and analysis may not correspond to results or analysis of the data producers. I thank the editors and the referees for their constructive comments that helped to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Appendix A
Appendix A
For the sake of completeness, the expressions of quantile estimators that are introduced in Table 2 but not mentioned in the text are shown in this Appendix. Furthermore, the six properties that are used by Hyndman and Fan (1996) are summarized.
1.1 A1 Inverse of the empirical cumulative distribution function
Dielmann et al. (1994) states that this quantile estimator is neither mean nor median unbiased. For a further discussion of its properties we refer to Juritz et al. (1983).
where \(i=\lfloor np\rfloor\) and \(g=np-i\).
1.2 A2 Observation closest to \(np\)
This definition crucially depends on the rounding. While in R and SAS the rounding takes place to the next even integer, the definition in SPSS differs from the one below since it uses simple rounding.
where \(i=\lfloor np\rfloor\) and \(g=np-0.5-i\).
1.3 A3 Linear interpolation of the empirical distribution function
This definition is proposed by Parzen (1979).
where \(i=\lfloor np_{k}\rfloor\), \(p_{k}=\frac{np}{n}\), \(\gamma=np_{k}-i\).
1.4 A4 Approximation to \(F(E(X_{k}))\) for the normal distribution
This definition is especially preferable when the underlying distribution is normal (Blom 1958). Thus, it is often used for normal quantile-quantile plots.
where \(i=\lfloor np_{k}+\frac{p}{4}+\frac{3}{8}\rfloor\), \(p_{k}=\frac{\left(np+\frac{p}{4}+\frac{3}{8}\right)-\frac{3}{8}}{n+\frac{1}{4}}\), \(\gamma=np_{k}+\frac{p}{4}+\frac{3}{8}-i\).
1.5 A5 Six desirable properties for sample quantile
Rights and permissions
About this article
Cite this article
Kreutzmann, AK. Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions. AStA Wirtsch Sozialstat Arch 12, 245–270 (2018). https://doi.org/10.1007/s11943-018-0234-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11943-018-0234-z