Abstract
We propose a method to estimate a sample skewness from the given summary statistics and give explicit formulas for the most common scenarios. We show that our method provides a nearly unbiased estimator for the non-parametric skewness measure. We empirically evaluate the performance on real-life data sets of COVID-19 vaccination status. We also demonstrate how the method can be applied to detect the skewness of the underlying distribution.
Similar content being viewed by others
REFERENCES
P. C. Austin, ‘‘Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples,’’ Statistics in Medicine 28 (25), 3083–3107 (2009).
N. Balakrishnan, J. Rychtář, D. Taylor, and S. Walter, ‘‘Unified approach to optimal estimation of mean and variance from sample summaries,’’ Statistical Methods in Medical Research 31 (11), 2087–2103 (2022).
M. Bland, ‘‘Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum,’’ International Journal of Statistics in Medical Research 4 (1), 57–64 (2015).
C. Bonferroni, ‘‘Teoria statistica delle classi e calcolo delle probabilita,’’ Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze 8, 3–62 (1936).
S. Cai, J. Zhou, and J. Pan, ‘‘Estimating the sample mean and standard deviation from order statistics and sample size in meta-analysis,’’ Statistical Methods in Medical Research 30 (12), 2701–2719 (2021).
M. Capanni, F. Calella, M. Biagini, S. Genise, L. Raimondi, G. Bedogni, G. Svegliati-Baroni, F. Sofi, S. Milani, and R. Abbate, ‘‘Prolonged n-3 polyunsaturated fatty acid supplementation ameliorates hepatic steatosis in patients with non-alcoholic fatty liver disease: A pilot study,’’ Alimentary Pharmacology and Therapeutics 23 (8), 1143–1151 (2006).
CDC. COVID-19 vaccinations in the United States. https://covid.cdc.gov/covid-data-tracker/ #vaccinations_vacc-total-admin-rate-total. Accessed June 16, 2022.
J. G. Eisenhauer, ‘‘Estimating sample means and standard deviations from quartiles and extrema,’’ Journal of Probability and Statistical Science 18 (2), 129–144 (2020).
J. G. Eisenhauer, ‘‘A note on estimating unreported sample statistics for meta-analysis,’’ Asian Journal of Probability and Statistics, 12–20 (2021).
J. P. Higgins, J. Thomas, J. Chandler, M. Cumpston, T. Li, M. J. Page, and V. A. Welch, Cochrane Handbook for Systematic Reviews of Interventions (John Wiley and Sons, Hoboken, New Jersey, 2019).
S. P. Hozo, B. Djulbegovic, and I. Hozo, ‘‘Estimating the mean and variance from the median, range, and the size of a sample,’’ BMC Medical Research Methodology 5 (1), 13 (2005).
D. Kwon and I. M. Reis, ‘‘Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC),’’ BMC Medical Research Methodology 15 (1), 1–12 (2015).
D. Luo, X. Wan, J. Liu, and T. Tong, ‘‘Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range,’’ Statistical Methods in Medical Research 27 (6), 1785–1805 (2018).
D. Luo, X. Wan, J. Liu, and T. Tong, ‘‘Testing normality using the summary statistics with application to meta-analysis,’’ arXiv preprint arXiv:1801.09456 (2018).
S. McGrath, X. Zhao, R. Steele, B. D. Thombs, A. Benedetti, and Collaboration, D. S. D. D., ‘‘Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis,’’ Statistical Methods in Medical Research 29 (9), 2520–2537 (2020).
M. D. Moran, ‘‘Arguments for rejecting the sequential Bonferroni in ecological studies,’’ Oikos 100 (2), 403–405 (2003).
A. Ramírez and C. Cox, ‘‘Improving on the range rule of thumb,’’ Rose-Hulman Undergraduate Mathematics Journal 13 (2), 1 (2012).
S. H. Rice, The expected value of the ratio of correlated random variables. https://www.depts.ttu. edu/biology/people/Faculty/Rice/home/ratio-derive.pdf. Accessed May 18, 2015.
S. H. Rice and A. Papadopoulos, Evolution with stochastic fitness and stochastic migration, PloS One 4 (10) (2009).
J. Rychtář and D. Taylor, ‘‘Estimating the sample variance from the sample size and range,’’ Statistics in Medicine 39 (30), 4667–4686 (2020).
J. Rychtář and D. T. Taylor, ‘‘Moran process and Wright–Fisher process favor low variability,’’ Discrete and Continuous Dynamical Systems-B 26 (7), 3491 (2021).
J. Shi, D. Luo, X. Wan, Y. Liu, J. Liu, Z. Bian, and T. Tong, ‘‘Detecting the skewness of data from the sample size and the five-number summary’’. arXiv preprint:2010.05749 (2020).
J. Shi, D. Luo, H. Weng, X.-T. Zeng, L. Lin, H. Chu, and T. Tong, ‘‘Optimally estimating the sample standard deviation from the five-number summary,’’ Research Synthesis Methods 11 (5), 641–654 (2020).
J. Shi, T. Tong, Y. Wang, and M. G. Genton, ‘‘Estimating the mean and variance from the five-number summary of a log-normal distribution,’’ Statistics and Its Interface 13 (4), 519–531 (2020).
S. M. Stigler, ‘‘Studies in the history of probability and statistics. XXXII: Laplace, Fisher, and the discovery of the concept of sufficiency,’’ Biometrika 60 (3), 439–445 (1973).
N. Thatcher, E. De Campos, D. Bell, W. P. Steward, G. Varghese, R. Morant, J. Vansteenkiste, R. Rosso, S. Ewers, E. Sundal, et al. ‘‘Epoetin alpha prevents anaemia and reduces transfusion requirements in patients undergoing primarily platinum-based chemotherapy for small cell lung cancer,’’ British Journal of Cancer 80 (3), 396–402 (1999).
S. D. Walter, J. Rychtář, D. Taylor, and N. Balakrishnan, ‘‘Estimation of standard deviations and inverse-variance weights from an observed range,’’ Statistics in Medicine 41, 242–257 (2022).
S. D. Walter and X. Yao,‘‘ Effect sizes can be calculated for studies reporting ranges for outcome variables in systematic reviews,’’ Journal of Clinical Epidemiology 60 (8), 849–852 (2007).
X. Wan, W. Wang, J. Liu, and T. Tong, ‘‘Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range,’’ BMC Medical Research Methodology 14 (1), 135 (2014).
C. J. Weir, V. Assi, L. Na, S. C. Lewis, G. D. Murray, P. Langhorne, and M. C. Brady, ‘‘Unreported summary statistics in trial publications and risk of bias in stroke rehabilitation systematic reviews: An international survey of review authors and examination of practical solutions,’’ Journal of Stroke Medicine 2 (2), 136–142 (2019).
C. J. Weir, I. Butcher, V. Assi, S. C. Lewis, G. D. Murray, P. Langhorne, and M. C. Brady, ‘‘Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: A systematic review,’’ BMC Medical Research Methodology 18 (1), 1–14 (2018).
Funding
The research of the first author was funded by the Natural Sciences and Engineering Research Council of Canada RGPIN-2020-06733. The funding agency had no input in study design, analysis and interpretation of data, in the writing of the report, nor in the decision to submit the article for publication.
Author information
Authors and Affiliations
Contributions
Narayanaswamy Balakrishnan: conceptualization, formal analysis, investigation, methodology, funding acquisition, writing—original draft, writing—review and editing. Jan Rychtář: formal analysis, investigation, methodology, software, visualization, writing—original draft, writing—review and editing. Dewey Taylor: formal analysis, investigation, methodology, software, visualization, writing—original draft, writing—review and editing.
Corresponding authors
About this article
Cite this article
Balakrishnan, N., Rychtář, J. & Taylor, D. Estimating Sample Skewness from Sample Data Summaries and Associated Evaluation of Normality. Math. Meth. Stat. 32, 260–273 (2023). https://doi.org/10.3103/S106653072304004X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S106653072304004X