Advertisement

Estimation of sample quantiles: challenges and issues in the context of income and wealth distributions

  • Ann-Kristin Kreutzmann
Originalveröffentlichung
  • 30 Downloads

Abstract

Means, quantiles and extreme values are common statistics for the description of distributions. However, estimating sample quantiles with the default definition in different software programs leads to unequal results. This is due to the fact that software programs use different quantile definitions. Since most practitioners are not aware of this fact and use different quantile definitions interchangeably, this work compares the default definitions in the software programs SPSS, R, SAS software, and Stata and additional quantile definitions that are suggested by the literature. The work especially focuses on how the quantile estimators perform in the context of describing the distribution of income and wealth. Furthermore, the possibilities of considering sampling weights in the quantile estimation and methods for producing variance estimates using the above-mentioned software are discussed.

Keywords

Quantile definitions Software comparison Weighted quantile estimator Weighted Harrell-Davis estimator 

Die Schätzung von Quantilen: Herausforderungen und Probleme im Kontext von Einkommens- und Vermögensverteilungen

Zusammenfassung

Mittelwerte, Quantile und Extremwerte sind übliche Statistiken, die zur Beschreibung von Verteilungen genutzt werden. Allerdings sind die Ergebnisse für Quantile, die mit verschiedener Software berechnet werden, nicht zwingend gleich. Dies ist darauf zurückzuführen, dass Quantilsdefinitionen verschiedener Software-Programme teils nicht einheitlich sind. Da diese unterschiedlichen Definitionen vielen Anwendern nicht bewusst sind und die Funktionen in der Software austauschbar genutzt werden, vergleicht diese Arbeit unterschiedliche Quantilsdefinitionen in den Software-Programmen SPSS, R, SAS Software und Stata. Außerdem werden Quantilsdefinitionen betrachtet, die in vorherigen Vergleichen in der Literatur empfohlen werden. Diese Arbeit betrachtet besonders die Güte der unterschiedlichen Quantilsdefinitionen für die Beschreibung von Einkommens- und Vermögensverteilungen. Außerdem werden Möglichkeiten zur Berücksichtigung von Survey-Gewichten bei der Quantilsschätzung, sowie zur Varianzsschätzung in den genannten Software-Programmen diskutiert.

Schlüsselwörter

Quantilsdefinitionen Vergleich von Software Gewichtete Quantilsschätzer Gewichteter Harrell-Davis Schätzer 

Notes

Acknowledgements

I gratefully acknowledge support by the German Research Foundation within the project QUESSAMI (281573942) and by the MIUR-DAAD Joint Mobility Program (57265468). This work uses data from the Deutsche Bundesbank Panel on Household Finances. The results published and the related observations and analysis may not correspond to results or analysis of the data producers. I thank the editors and the referees for their constructive comments that helped to improve the paper.

Supplementary material

11943_2018_234_MOESM1_ESM.pdf (162 kb)
Code examples and outputs based on synthetic data in the different software programs
11943_2018_234_MOESM2_ESM.zip (2 kb)
Synthetic data and the code used to obtain the results in the aforementioned PDF-file
11943_2018_234_MOESM3_ESM.pdf (75 kb)
Tables with results from the simulation studies

References

  1. Alfons A, Templ M (2013) Estimation of social exclusion indicators from complex surveys: the R package laeken. J Stat Softw 54(15):1–25Google Scholar
  2. Babu G (1986) A note on bootstrapping the variance of sample quantile. Ann Inst Stat Math 38(3):439–443MathSciNetzbMATHGoogle Scholar
  3. Bell WR, Basel WW, Maples JJ (2016) An overview of the U.S. Census Bureau’s small area income and poverty estimates program. In: Pratesi M (ed) Analysis of poverty data by small area estimation. John Wiley & Sons, Hoboken, pp 379–403Google Scholar
  4. Beste J, Grabka MM, Goebel J (2018) Armut in Deutschland. AStA Wirtsch Sozialstat Arch 12(1):27–62Google Scholar
  5. Bhat CR (1994) Imputing a continuous income variable from grouped and missing income observations. Econ Lett 46(4):311–319zbMATHGoogle Scholar
  6. Blom G (1958) Statistical estimates and transformed beta-variables. John Wiley & Sons, HobokenzbMATHGoogle Scholar
  7. Bundesinstitut für Bau‑, Stadt-, und Raumforschung (2017) Indikatoren und Karten zur Raum- und Stadtentwicklung. Datenlizenz Deutschland – Namensnennung – Version 2.0. http://www.inkar.de/. Accessed 12 Apr 2018Google Scholar
  8. Chatterjee A (2011) Asymptotic properties of sample quantiles from a finite population. Ann Inst Stat Math 63(1):157–179MathSciNetzbMATHGoogle Scholar
  9. Cheung K, Lee S (2005) Variance estimation for sample quantiles using the m out of n bootstrap. Ann Inst Stat Math 57(2):279–290MathSciNetzbMATHGoogle Scholar
  10. Cramér H (1946) Mathematical methods of statistics. Princeton University Press, PrincetonzbMATHGoogle Scholar
  11. Datta GS, Lahiri P, Maiti T (2002) Empirical Bayes estimation of median income of four-person families by state using time series and cross-sectional data. J Stat Plan Inference 102(1):83–97MathSciNetzbMATHGoogle Scholar
  12. David H, Nagaraja H (2003) Order statistics. John Wiley & Sons, HobokenzbMATHGoogle Scholar
  13. Deutsche Bundesbank (2016) Vermögen und Finanzen privater Haushalte in Deutschland: Ergebnisse der Vermögensbefragung 2014. Monatsbericht, Deutsche BundesbankGoogle Scholar
  14. Dielmann T, Lowry C, Pfaffenberger R (1994) A comparison of quantile estimators. Commun Stat Simul Comput 23(2):355–371zbMATHGoogle Scholar
  15. Edgeworth FY (1886) XLVI. Problems in probabilities. Lond Edinb Dublin Philos Mag J Sci 22(137):371–384zbMATHGoogle Scholar
  16. Eubank RL (2004) Quantiles. In: Kotz S, Read CB, Balakrishnan N, Vidakovic B, Johnson NL (eds) Encyclopedia of statistical sciences. John Wiley & Sons, HobokenGoogle Scholar
  17. eurostat (2013) Statistik der Europäischen Union über Einkommen und Lebensbedingungen (EU-SILC). https://ec.europa.eu/eurostat/de/web/microdata/european-union-statistics-on-income-and-living-conditions. Accessed 18 Sept 2018Google Scholar
  18. eurostat (2018a) Distribution of income by quantiles – EU-SILC survey. http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=ilc_di01&lang=en. Accessed 12 Apr 2018Google Scholar
  19. eurostat (2018b) Smarter, greener, more inclusive? Indicators to support the Europe 2020 strategy. Publications Office of the European Union, LuxembourgGoogle Scholar
  20. Fan J, Tang M, Tian M (2014) Kernel quantile estimator with ICI adaptive bandwidth selection technique. Acta Math Sin Engl Ser 30(4):710–722MathSciNetzbMATHGoogle Scholar
  21. Forschungsdaten- und Servicezentrum (FDSZ) der Deutschen Bundesbank (2014) Panel on Household Finances (PHF)  https://doi.org/10.12757/Bbk.PHF.02.02.01 (Plus one additional attribute (district code))Google Scholar
  22. Galton F (1889) Natural inheritance. Macmillan, New YorkGoogle Scholar
  23. Genton MG, Ma Y, Parzen E (2006) Discussion of “Sur une limitation très générale de la dispersion de la médiane” by M. Fréchet. J Soc Fr Statistique (2009) 147(2):51–60Google Scholar
  24. Geraci M (2016) Qtools: a collection of models and tools for quantile inference. R J 8(2):117–138Google Scholar
  25. Graf M, Nedyalkova D (2014) Modeling of income and indicators of poverty and social exclusion using the generalized beta distribution of the second kind. Rev Income Wealth 60(4):821–842Google Scholar
  26. Gumbel EJ (1939) La probabilité des hypothèses. C R Acad Sci 209:645–647MathSciNetzbMATHGoogle Scholar
  27. Harrell FE, Davis C (1982) A new distribution-free quantile estimator. Biometrika 69(3):635–640MathSciNetzbMATHGoogle Scholar
  28. Harrell FE Jr, Dupont C et al (2018) Hmisc: Harrell miscellaneous. R package version 4.1-1. https://CRAN.R-project.org/package=Hmisc. Accessed: 20. Nov 2017Google Scholar
  29. Hazen A (1914) Storage to be provided in impounding reservoirs for municipal water supply. Trans Am Soc Civ Eng 77:1539–1641Google Scholar
  30. Hosking J (1990) L‑moments: analysis and estimation of distributions using linear combinations of order statistics. J R Stat Soc Series B Stat Methodol 52(1):105–124MathSciNetzbMATHGoogle Scholar
  31. Hyndman R, Fan Y (1996) Sample quantiles in statistical packages. Am Stat 50(4):361–365Google Scholar
  32. IBM (2013) IBM SPSS statistics for Windows, version 25.0Google Scholar
  33. Johnson NL, Kotz S (1970) Continuous univariate distributions. Houghton Mifflin Harcourt, BostonzbMATHGoogle Scholar
  34. Juritz JM, Juritz JWF, Stephens M (1983) On the accuracy of simulated percentage points. J Am Stat Assoc 78(382):441–444MathSciNetGoogle Scholar
  35. Kleiber C, Kotz S (2003) Statistical size distributions in economics and actuarial sciences. John Wiley & Sons, HobokenzbMATHGoogle Scholar
  36. Knerr P, Aust F, Chudziak N, Gilberg R, Kleudgen M (2015) Methodenbericht – Private Haushalte und ihre Finanzen (PHF) 2. Erhebungswelle – Anonymisierte Fassung –. Methodenbericht, infas Institut für angewandte Sozialwissenschaft GmbHGoogle Scholar
  37. Kolenikov S (2017) epctile – estimation and inference for percentiles. http://staskolenikov.net/stata. Accessed: 20. Feb 2017Google Scholar
  38. Kreutzmann AK, Pannier S, Rojas-Perilla N, Schmid T, Templ M, Tzavidis N (2019) The R package emdi for estimating and mapping regionally disaggregated indicators. J Stat Softw.Google Scholar
  39. Langford E (2006) Quartiles in elementary statistics. J Stat Educ 50(4):361–365Google Scholar
  40. Lavallée P, Beaumont JF (2015) Why we should put some weight on weights. Survey methods: insights from the field, pp 1–18Google Scholar
  41. Lohr SL (2010) Sampling: design and analysis. Cengage Learning, BostonzbMATHGoogle Scholar
  42. Longford N (2011) Small-sample estimators of the quantiles of the normal, log-normal and Pareto distributions. J Stat Comput Simul 82(9):1383–1395MathSciNetzbMATHGoogle Scholar
  43. Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(8):1–19Google Scholar
  44. Ma Y, Genton MG, Parzen E (2011) Asymptotic properties of sample quantiles of discrete distributions. Ann Inst Stat Math 63(2):227–243MathSciNetzbMATHGoogle Scholar
  45. Majumder KL, Bhattacharjee GP (1973) Algorithm AS63: the incomplete beta integral. J R Stat Soc Ser C Appl Stat 22(3):409–411Google Scholar
  46. Makkonen L, Pajari M (2014) Defining sample quantiles by the true rank probability. J Probab Stat.  https://doi.org/10.1155/2014/326579 MathSciNetzbMATHGoogle Scholar
  47. Marchetti S, Giusti C, Pratesi M (2016) The use of Twitter data to improve small area estimates of households’ share of food consumption expenditure in Italy. AStA Wirtsch Sozialstat Arch 10(2-3):79–93Google Scholar
  48. Marchetti S, Beręsewicz M, Salvati N, Szymkowiak M, Wawrowski Ł (2018) The use of a three-level M‑quantile model to map poverty at local administrative unit 1 in Poland. J R Stat Soc Ser A 181(4):1–28Google Scholar
  49. McDonald J (1984) Some generalized functions for the size distribution of income. Econometrica 52(3):647–663zbMATHGoogle Scholar
  50. McDonald J, Bordley R (1996) Something new, something old: parametric models for the size distribution of income. J Income Distrib 6(1):91–103Google Scholar
  51. Muenchen RA (2017) The popularity of data science software. http://r4stats.com/articles/popularity/. Accessed 27 Feb 2018Google Scholar
  52. Münnich R, Burgard JP, Vogt M (2013) Small Area-Statistik: Methoden und Anwendungen. AStA Wirtsch Sozialstat Arch 6(3-4):149–191Google Scholar
  53. Okolewski A, Rychlik T (2001) Sharp distribution-free bounds on the bias in estimating quantiles via order statistics. Stat Probab Lett 52(2):207–213MathSciNetzbMATHGoogle Scholar
  54. Parrish R (1990) Comparison of quantile estimators in normal sampling. Biometrics 46(1):247–257zbMATHGoogle Scholar
  55. Parzen E (1979) Nonparametric statistical data modeling. J Am Stat Assoc 74(365):105–121MathSciNetzbMATHGoogle Scholar
  56. Phien H (1990) A note on the computation of the incomplete beta function. Adv Eng Softw 12(1):39–44Google Scholar
  57. R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (https://www.R-project.org/)Google Scholar
  58. Rust KF, Rao JNK (1996) Variance estimation for complex surveys using replication techniques. Stat Methods Med Res 5(3):283–310Google Scholar
  59. SAS Institute Inc (2018) Version 9.4 of the SAS systemGoogle Scholar
  60. Schmid T, Bruckschen F, Salvati N, Zbiranski T (2017) Constructing sociodemographic indicators for national statistical institutes using mobile phone data: estimating literacy rates in Senegal. J R Stat Soc Ser A 180(4):1163–1190MathSciNetGoogle Scholar
  61. Schoonjans F, De Bacquer D, Schmid P (2011) Estimation of population percentiles. Epidemiology 22(5):750–751Google Scholar
  62. Sfakianakis M, Verginis D (2008) A new family of nonparametric quantile estimators. Commun Stat Simul Comput 37(2):337–345MathSciNetzbMATHGoogle Scholar
  63. Shao J (1988) A note on bootstrap variance estimation. Technical report, Purdue UniversityGoogle Scholar
  64. Shao J, Wu C (1989) A general theory for jackknife variance estimation. Ann Stat 17(3):1176–1197MathSciNetzbMATHGoogle Scholar
  65. Shao J, Wu C (1992) Asymptotic properties of the balanced repeated replication method for sample quantiles. Ann Stat 20(3):1571–1593MathSciNetzbMATHGoogle Scholar
  66. Sheather S, Marron J (1990) Kernel quantile estimators. J Am Stat Assoc 85(410):410–416MathSciNetzbMATHGoogle Scholar
  67. StataCorp (2015) Stata statistical software: release 15. StataCorp LLC, College StationGoogle Scholar
  68. Steinhauer HW, Aßmann C, Zinn S, Goßmann S, Rässler S (2015) Sampling and weighting cohort samples in institutional contexts. AStA Wirtsch Sozialstat Arch 9(2):131–157Google Scholar
  69. Tzavidis N, Zhang LC, Luna A, Schmid T, Rojas-Perilla N (2018) From start to finish: a framework for the production of small area official statistics. J R Stat Soc Ser A 181(4):927–979Google Scholar
  70. Vélez JI, Correa JC (2014) Should we think of a different median estimator? Comun Estad 7(1):11–17Google Scholar
  71. Walker AM (1968) A note on the asymptotic distribution of sample quantiles. J R Stat Soc Series B Stat Methodol 30(3):570–575MathSciNetzbMATHGoogle Scholar
  72. Wei L, Wang D, Hutson A (2015) An investigation of quantile function estimators relative to quantile confidence interval coverage. Commun Stat Theory Methods 44(10):2107–2135MathSciNetzbMATHGoogle Scholar
  73. Weibull W (1939) The phenomenon of rupture in solids. Ing Vetensk Akad Handl 17(153):1–55Google Scholar
  74. Wolter K (2007) Introduction to variance estimation. Springer, New YorkzbMATHGoogle Scholar
  75. Yang S (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80(392):1004–1011MathSciNetzbMATHGoogle Scholar
  76. Yoshizawa C, Sen P, Davis E (1985) Asymptotic equivalence of the Harrel-Davis median estimator and the sample median. Commun Stat Theory Methods 14(9):2129–2136zbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute for Statistics and EconometricsFreie Universität BerlinBerlinGermany

Personalised recommendations