Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

Kreutzmann, Ann-Kristin

doi:10.1007/s11943-018-0234-z

Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

Die Schätzung von Quantilen: Herausforderungen und Probleme im Kontext von Einkommens- und Vermögensverteilungen

Originalveröffentlichung
Published: 23 November 2018

Volume 12, pages 245–270, (2018)
Cite this article

AStA Wirtschafts- und Sozialstatistisches Archiv Aims and scope Submit manuscript

Ann-Kristin Kreutzmann¹

404 Accesses
2 Citations
Explore all metrics

Abstract

Means, quantiles and extreme values are common statistics for the description of distributions. However, estimating sample quantiles with the default definition in different software programs leads to unequal results. This is due to the fact that software programs use different quantile definitions. Since most practitioners are not aware of this fact and use different quantile definitions interchangeably, this work compares the default definitions in the software programs SPSS, R, SAS^™ software, and Stata and additional quantile definitions that are suggested by the literature. The work especially focuses on how the quantile estimators perform in the context of describing the distribution of income and wealth. Furthermore, the possibilities of considering sampling weights in the quantile estimation and methods for producing variance estimates using the above-mentioned software are discussed.

Zusammenfassung

Mittelwerte, Quantile und Extremwerte sind übliche Statistiken, die zur Beschreibung von Verteilungen genutzt werden. Allerdings sind die Ergebnisse für Quantile, die mit verschiedener Software berechnet werden, nicht zwingend gleich. Dies ist darauf zurückzuführen, dass Quantilsdefinitionen verschiedener Software-Programme teils nicht einheitlich sind. Da diese unterschiedlichen Definitionen vielen Anwendern nicht bewusst sind und die Funktionen in der Software austauschbar genutzt werden, vergleicht diese Arbeit unterschiedliche Quantilsdefinitionen in den Software-Programmen SPSS, R, SAS^™ Software und Stata. Außerdem werden Quantilsdefinitionen betrachtet, die in vorherigen Vergleichen in der Literatur empfohlen werden. Diese Arbeit betrachtet besonders die Güte der unterschiedlichen Quantilsdefinitionen für die Beschreibung von Einkommens- und Vermögensverteilungen. Außerdem werden Möglichkeiten zur Berücksichtigung von Survey-Gewichten bei der Quantilsschätzung, sowie zur Varianzsschätzung in den genannten Software-Programmen diskutiert.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

Article 10 February 2017

Estimating Income Distributions From Grouped Data: A Minimum Quantile Distance Approach

Article Open access 15 November 2023

Assumption-light and computationally cheap inference on inequality measures by sample splitting: the Student t approach

Article Open access 24 July 2023

References

Alfons A, Templ M (2013) Estimation of social exclusion indicators from complex surveys: the R package laeken. J Stat Softw 54(15):1–25
Article Google Scholar
Babu G (1986) A note on bootstrapping the variance of sample quantile. Ann Inst Stat Math 38(3):439–443
Article MathSciNet MATH Google Scholar
Bell WR, Basel WW, Maples JJ (2016) An overview of the U.S. Census Bureau’s small area income and poverty estimates program. In: Pratesi M (ed) Analysis of poverty data by small area estimation. John Wiley & Sons, Hoboken, pp 379–403
Google Scholar
Beste J, Grabka MM, Goebel J (2018) Armut in Deutschland. AStA Wirtsch Sozialstat Arch 12(1):27–62
Article Google Scholar
Bhat CR (1994) Imputing a continuous income variable from grouped and missing income observations. Econ Lett 46(4):311–319
Article MATH Google Scholar
Blom G (1958) Statistical estimates and transformed beta-variables. John Wiley & Sons, Hoboken
MATH Google Scholar
Bundesinstitut für Bau‑, Stadt-, und Raumforschung (2017) Indikatoren und Karten zur Raum- und Stadtentwicklung. Datenlizenz Deutschland – Namensnennung – Version 2.0. http://www.inkar.de/. Accessed 12 Apr 2018
Google Scholar
Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Stat Math 63(1):157–179
Article MathSciNet MATH Google Scholar
Cheung K, Lee S (2005) Variance estimation for sample quantiles using the m out of n bootstrap. Ann Inst Stat Math 57(2):279–290
Article MathSciNet MATH Google Scholar
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
MATH Google Scholar
Datta GS, Lahiri P, Maiti T (2002) Empirical Bayes estimation of median income of four-person families by state using time series and cross-sectional data. J Stat Plan Inference 102(1):83–97
Article MathSciNet MATH Google Scholar
David H, Nagaraja H (2003) Order statistics. John Wiley & Sons, Hoboken
Book MATH Google Scholar
Deutsche Bundesbank (2016) Vermögen und Finanzen privater Haushalte in Deutschland: Ergebnisse der Vermögensbefragung 2014. Monatsbericht, Deutsche Bundesbank
Google Scholar
Dielmann T, Lowry C, Pfaffenberger R (1994) A comparison of quantile estimators. Commun Stat Simul Comput 23(2):355–371
Article MATH Google Scholar
Edgeworth FY (1886) XLVI. Problems in probabilities. Lond Edinb Dublin Philos Mag J Sci 22(137):371–384
Article MATH Google Scholar
Eubank RL (2004) Quantiles. In: Kotz S, Read CB, Balakrishnan N, Vidakovic B, Johnson NL (eds) Encyclopedia of statistical sciences. John Wiley & Sons, Hoboken
Google Scholar
eurostat (2013) Statistik der Europäischen Union über Einkommen und Lebensbedingungen (EU-SILC). https://ec.europa.eu/eurostat/de/web/microdata/european-union-statistics-on-income-and-living-conditions. Accessed 18 Sept 2018
Google Scholar
eurostat (2018a) Distribution of income by quantiles – EU-SILC survey. http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ilc_di01&lang=en. Accessed 12 Apr 2018
Google Scholar
eurostat (2018b) Smarter, greener, more inclusive? Indicators to support the Europe 2020 strategy. Publications Office of the European Union, Luxembourg
Google Scholar
Fan J, Tang M, Tian M (2014) Kernel quantile estimator with ICI adaptive bandwidth selection technique. Acta Math Sin Engl Ser 30(4):710–722
Article MathSciNet MATH Google Scholar
Forschungsdaten- und Servicezentrum (FDSZ) der Deutschen Bundesbank (2014) Panel on Household Finances (PHF) https://doi.org/10.12757/Bbk.PHF.02.02.01 (Plus one additional attribute (district code))
Book Google Scholar
Galton F (1889) Natural inheritance. Macmillan, New York
Book Google Scholar
Genton MG, Ma Y, Parzen E (2006) Discussion of “Sur une limitation très générale de la dispersion de la médiane” by M. Fréchet. J Soc Fr Statistique (2009) 147(2):51–60
MATH Google Scholar
Geraci M (2016) Qtools: a collection of models and tools for quantile inference. R J 8(2):117–138
Article Google Scholar
Graf M, Nedyalkova D (2014) Modeling of income and indicators of poverty and social exclusion using the generalized beta distribution of the second kind. Rev Income Wealth 60(4):821–842
Google Scholar
Gumbel EJ (1939) La probabilité des hypothèses. C R Acad Sci 209:645–647
MathSciNet MATH Google Scholar
Harrell FE, Davis C (1982) A new distribution-free quantile estimator. Biometrika 69(3):635–640
Article MathSciNet MATH Google Scholar
Harrell FE Jr, Dupont C et al (2018) Hmisc: Harrell miscellaneous. R package version 4.1-1. https://CRAN.R-project.org/package=Hmisc. Accessed: 20. Nov 2017
Google Scholar
Hazen A (1914) Storage to be provided in impounding reservoirs for municipal water supply. Trans Am Soc Civ Eng 77:1539–1641
Google Scholar
Hosking J (1990) L‑moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol 52(1):105–124
MathSciNet MATH Google Scholar
Hyndman R, Fan Y (1996) Sample quantiles in statistical packages. Am Stat 50(4):361–365
Google Scholar
IBM (2013) IBM SPSS statistics for Windows, version 25.0
Google Scholar
Johnson NL, Kotz S (1970) Continuous univariate distributions. Houghton Mifflin Harcourt, Boston
MATH Google Scholar
Juritz JM, Juritz JWF, Stephens M (1983) On the accuracy of simulated percentage points. J Am Stat Assoc 78(382):441–444
Article MathSciNet Google Scholar
Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. John Wiley & Sons, Hoboken
Book MATH Google Scholar
Knerr P, Aust F, Chudziak N, Gilberg R, Kleudgen M (2015) Methodenbericht – Private Haushalte und ihre Finanzen (PHF) 2. Erhebungswelle – Anonymisierte Fassung –. Methodenbericht, infas Institut für angewandte Sozialwissenschaft GmbH
Google Scholar
Kolenikov S (2017) epctile – estimation and inference for percentiles. http://staskolenikov.net/stata. Accessed: 20. Feb 2017
Google Scholar
Kreutzmann AK, Pannier S, Rojas-Perilla N, Schmid T, Templ M, Tzavidis N (2019) The R package emdi for estimating and mapping regionally disaggregated indicators. J Stat Softw.
Langford E (2006) Quartiles in elementary statistics. J Stat Educ 50(4):361–365
Google Scholar
Lavallée P, Beaumont JF (2015) Why we should put some weight on weights. Survey methods: insights from the field, pp 1–18
Google Scholar
Lohr SL (2010) Sampling: design and analysis. Cengage Learning, Boston
MATH Google Scholar
Longford N (2011) Small-sample estimators of the quantiles of the normal, log-normal and Pareto distributions. J Stat Comput Simul 82(9):1383–1395
Article MathSciNet MATH Google Scholar
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(8):1–19
Article Google Scholar
Ma Y, Genton MG, Parzen E (2011) Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63(2):227–243
Article MathSciNet MATH Google Scholar
Majumder KL, Bhattacharjee GP (1973) Algorithm AS63: the incomplete beta integral. J R Stat Soc Ser C Appl Stat 22(3):409–411
Google Scholar
Makkonen L, Pajari M (2014) Defining sample quantiles by the true rank probability. J Probab Stat. https://doi.org/10.1155/2014/326579
Article MathSciNet MATH Google Scholar
Marchetti S, Giusti C, Pratesi M (2016) The use of Twitter data to improve small area estimates of households’ share of food consumption expenditure in Italy. AStA Wirtsch Sozialstat Arch 10(2-3):79–93
Article Google Scholar
Marchetti S, Beręsewicz M, Salvati N, Szymkowiak M, Wawrowski Ł (2018) The use of a three-level M‑quantile model to map poverty at local administrative unit 1 in Poland. J R Stat Soc Ser A 181(4):1–28
Article MathSciNet Google Scholar
McDonald J (1984) Some generalized functions for the size distribution of income. Econometrica 52(3):647–663
Article MATH Google Scholar
McDonald J, Bordley R (1996) Something new, something old: parametric models for the size distribution of income. J Income Distrib 6(1):91–103
Google Scholar
Muenchen RA (2017) The popularity of data science software. http://r4stats.com/articles/popularity/. Accessed 27 Feb 2018
Google Scholar
Münnich R, Burgard JP, Vogt M (2013) Small Area-Statistik: Methoden und Anwendungen. AStA Wirtsch Sozialstat Arch 6(3-4):149–191
Article Google Scholar
Okolewski A, Rychlik T (2001) Sharp distribution-free bounds on the bias in estimating quantiles via order statistics. Stat Probab Lett 52(2):207–213
Article MathSciNet MATH Google Scholar
Parrish R (1990) Comparison of quantile estimators in normal sampling. Biometrics 46(1):247–257
Article MATH Google Scholar
Parzen E (1979) Nonparametric statistical data modeling. J Am Stat Assoc 74(365):105–121
Article MathSciNet MATH Google Scholar
Phien H (1990) A note on the computation of the incomplete beta function. Adv Eng Softw 12(1):39–44
Article Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (https://www.R-project.org/)
Google Scholar
Rust KF, Rao JNK (1996) Variance estimation for complex surveys using replication techniques. Stat Methods Med Res 5(3):283–310
Article Google Scholar
SAS Institute Inc (2018) Version 9.4 of the SAS system
Google Scholar
Schmid T, Bruckschen F, Salvati N, Zbiranski T (2017) Constructing sociodemographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J R Stat Soc Ser A 180(4):1163–1190
Article MathSciNet Google Scholar
Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22(5):750–751
Article Google Scholar
Sfakianakis M, Verginis D (2008) A new family of nonparametric quantile estimators. Commun Stat Simul Comput 37(2):337–345
Article MathSciNet MATH Google Scholar
Shao J (1988) A note on bootstrap variance estimation. Technical report, Purdue University
Book Google Scholar
Shao J, Wu C (1989) A general theory for jackknife variance estimation. Ann Stat 17(3):1176–1197
Article MathSciNet MATH Google Scholar
Shao J, Wu C (1992) Asymptotic properties of the balanced repeated replication method for sample quantiles. Ann Stat 20(3):1571–1593
Article MathSciNet MATH Google Scholar
Sheather S, Marron J (1990) Kernel quantile estimators. J Am Stat Assoc 85(410):410–416
Article MathSciNet MATH Google Scholar
StataCorp (2015) Stata statistical software: release 15. StataCorp LLC, College Station
Google Scholar
Steinhauer HW, Aßmann C, Zinn S, Goßmann S, Rässler S (2015) Sampling and weighting cohort samples in institutional contexts. AStA Wirtsch Sozialstat Arch 9(2):131–157
Article Google Scholar
Tzavidis N, Zhang LC, Luna A, Schmid T, Rojas-Perilla N (2018) From start to finish: a framework for the production of small area official statistics. J R Stat Soc Ser A 181(4):927–979
Article MathSciNet Google Scholar
Vélez JI, Correa JC (2014) Should we think of a different median estimator? Comun Estad 7(1):11–17
Google Scholar
Walker AM (1968) A note on the asymptotic distribution of sample quantiles. J R Stat Soc Series B Stat Methodol 30(3):570–575
MathSciNet MATH Google Scholar
Wei L, Wang D, Hutson A (2015) An investigation of quantile function estimators relative to quantile confidence interval coverage. Commun Stat Theory Methods 44(10):2107–2135
Article MathSciNet MATH Google Scholar
Weibull W (1939) The phenomenon of rupture in solids. Ing Vetensk Akad Handl 17(153):1–55
Google Scholar
Wolter K (2007) Introduction to variance estimation. Springer, New York
MATH Google Scholar
Yang S (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80(392):1004–1011
Article MathSciNet MATH Google Scholar
Yoshizawa C, Sen P, Davis E (1985) Asymptotic equivalence of the Harrel-Davis median estimator and the sample median. Commun Stat Theory Methods 14(9):2129–2136
Article MATH Google Scholar

Download references

Acknowledgements

I gratefully acknowledge support by the German Research Foundation within the project QUESSAMI (281573942) and by the MIUR-DAAD Joint Mobility Program (57265468). This work uses data from the Deutsche Bundesbank Panel on Household Finances. The results published and the related observations and analysis may not correspond to results or analysis of the data producers. I thank the editors and the referees for their constructive comments that helped to improve the paper.

Author information

Authors and Affiliations

Institute for Statistics and Econometrics, Freie Universität Berlin, 14195, Berlin, Germany
Ann-Kristin Kreutzmann

Authors

Ann-Kristin Kreutzmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ann-Kristin Kreutzmann.

Caption Electronic Supplementary Material

Code examples and outputs based on synthetic data in the different software programs

Synthetic data and the code used to obtain the results in the aforementioned PDF-file

Tables with results from the simulation studies

Appendix A

For the sake of completeness, the expressions of quantile estimators that are introduced in Table 2 but not mentioned in the text are shown in this Appendix. Furthermore, the six properties that are used by Hyndman and Fan (1996) are summarized.

1.1 A1 Inverse of the empirical cumulative distribution function

Dielmann et al. (1994) states that this quantile estimator is neither mean nor median unbiased. For a further discussion of its properties we refer to Juritz et al. (1983).

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{1}\quad&\text{if}\quad p=0;\\ X_{(i)}\quad&\text{if}\quad 0<p\leq 1\quad\text{and}\quad g=0;\\ X_{(i+1)}\quad&\text{if}\quad 0<p\leq 1\quad\text{and}\quad g\neq 0,\end{cases}\end{aligned}$$

where $i=\lfloor np\rfloor$ and $g=np-i$.

1.2 A2 Observation closest to $np$

This definition crucially depends on the rounding. While in R and SAS the rounding takes place to the next even integer, the definition in SPSS differs from the one below since it uses simple rounding.

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p\leq\frac{0.5}{n};\\ X_{(i)}\quad&\text{if}\quad\frac{0.5}{n}<p\leq 1,\quad i\text{ is even and}\quad g=0;\\ X_{(i+1)}\quad&\text{if}\quad\frac{0.5}{n}<p\leq 1,\quad i\text{ is odd and}\quad g\neq 0,\end{cases}\end{aligned}$$

where $i=\lfloor np\rfloor$ and $g=np-0.5-i$.

1.3 A3 Linear interpolation of the empirical distribution function

This definition is proposed by Parzen (1979).

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p<\frac{1}{n};\\ (1-\gamma)X_{(i)}+\gamma X_{(i+1)}\quad&\text{if}\quad\frac{1}{n}\leq p<1;\\ X_{(n)}\quad&\text{if}\quad p=1,\end{cases}\end{aligned}$$

where $i=\lfloor np_{k}\rfloor$, $p_{k}=\frac{np}{n}$, $\gamma=np_{k}-i$.

1.4 A4 Approximation to $F(E(X_{k}))$ for the normal distribution

This definition is especially preferable when the underlying distribution is normal (Blom 1958). Thus, it is often used for normal quantile-quantile plots.

$$\begin{aligned}\displaystyle Q_{p}=\begin{cases}X_{(1)}\quad&\text{if}\quad p<\frac{5/8}{n+1/4};\\ (1-\gamma)X_{(i)}+\gamma X_{(i+1)}\quad&\text{if}\quad\frac{5/8}{n+1/4}\leq p<\frac{n-3/8}{n+1/4};\\ X_{(n)}\quad&\text{if}\quad p\geq\frac{n-3/8}{n+1/4},\end{cases}\end{aligned}$$

where $i=\lfloor np_{k}+\frac{p}{4}+\frac{3}{8}\rfloor$, $p_{k}=\frac{\left(np+\frac{p}{4}+\frac{3}{8}\right)-\frac{3}{8}}{n+\frac{1}{4}}$, $\gamma=np_{k}+\frac{p}{4}+\frac{3}{8}-i$.

1.5 A5 Six desirable properties for sample quantile

Table 6 Replication of Table 1 in Hyndman and Fan (1996) that shows their definition of six desirable properties for a sample quantile. For more information about the properties it is referred to Hyndman and Fan (1996)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kreutzmann, AK. Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions. AStA Wirtsch Sozialstat Arch 12, 245–270 (2018). https://doi.org/10.1007/s11943-018-0234-z

Download citation

Received: 26 June 2018
Accepted: 12 November 2018
Published: 23 November 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11943-018-0234-z

Keywords

Schlüsselwörter

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

Estimating Income Distributions From Grouped Data: A Minimum Quantile Distance Approach

Assumption-light and computationally cheap inference on inequality measures by sample splitting: the Student t approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Caption Electronic Supplementary Material

Code examples and outputs based on synthetic data in the different software programs

Synthetic data and the code used to obtain the results in the aforementioned PDF-file

Tables with results from the simulation studies

Appendix A

1.1 A1 Inverse of the empirical cumulative distribution function

1.2 A2 Observation closest to \(np\)

1.3 A3 Linear interpolation of the empirical distribution function

1.4 A4 Approximation to \(F(E(X_{k}))\) for the normal distribution

1.5 A5 Six desirable properties for sample quantile

Rights and permissions

About this article

Cite this article

Keywords

Schlüsselwörter

Navigation

Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

Abstract

Zusammenfassung

Access this article

Similar content being viewed by others

When large n is not enough – Distribution-free interval estimators for ratios of quantiles

Estimating Income Distributions From Grouped Data: A Minimum Quantile Distance Approach

Assumption-light and computationally cheap inference on inequality measures by sample splitting: the Student t approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Caption Electronic Supplementary Material

Code examples and outputs based on synthetic data in the different software programs

Synthetic data and the code used to obtain the results in the aforementioned PDF-file

Tables with results from the simulation studies

Appendix A

Appendix A

1.1 A1 Inverse of the empirical cumulative distribution function

1.2 A2 Observation closest to \(np\)

1.3 A3 Linear interpolation of the empirical distribution function

1.4 A4 Approximation to \(F(E(X_{k}))\) for the normal distribution

1.5 A5 Six desirable properties for sample quantile

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Schlüsselwörter

Search

Navigation