Abstract
Statistical presentation of data is key to understanding patterns and drawing inferences about biomedical phenomena. In this article, we provide an overview of basic statistical considerations for data analysis. Assessment of whether tested parameters are distributed normally is important to decide whether to employ parametric or non-parametric data analyses. The nature of variables (continuous or discrete) also determines analysis strategies. Normally distributed data can be presented using means with standard deviations (SD), whereas non-parametric measures such as medians (with range or interquartile range) should be used for non-normal distributions. While the SD provides a measure of data dispersion, the standard error provides estimates of the 95% confidence interval i.e. the actual mean in the population. Univariable analyses should be directed to denote effect sizes, as well as test a priori hypothesis (i.e. null hypothesis significance testing). Univariable analyses should be followed up by suitable adjusted multivariable analyses such as linear or logistic regression. Linear correlation statistics can help assess whether two variables change hand in hand. Concordance rather than correlation should be used to compare outcome measures of disease states. Prior sample size calculation to ensure adequate study power is recommended for studies which have analogues in the literature with SDs. Statistical considerations for systematic reviews should include appropriate use of meta-analysis, assessment of heterogeneity, publication bias assessment when there are more than ten studies, and quality assessment of studies. Since statistical errors are responsible for a significant proportion of retractions, appropriate statistical analysis is mandatory during study planning and data analysis.
Similar content being viewed by others
References
Ali Z, Bhaskar SB (2016) Basic statistical tools in research and data analysis. Indian J Anaesth 60:662–669. https://doi.org/10.4103/0019-5049.190623
Haefeli M, Elfering A (2006) Pain assessment. Eur Spine J 15(Suppl 1):S17–S24. https://doi.org/10.1007/s00586-005-1044-x
Thiese MS (2014) Observational and interventional study design types; an overview. Biochem Med 24:199–210. https://doi.org/10.11613/BM.2014.022
Benlidayi IC (2019) Implement statistics at each step of your research. Rheumatol Int 39:1303–1304. https://doi.org/10.1007/s00296-019-04327-3
Misra DP, Agarwal V (2020) Integrity of clinical research conduct, reporting, publishing, and post-publication promotion in rheumatology. Clin Rheumatol 39:1049–1060. https://doi.org/10.1007/s10067-020-04965-0
McCue C (2007) 5 - Data. In: McCue C (ed) Data Mining and Predictive Analysis. Butterworth-Heinemann, Burlington, pp 67–92
Habibzadeh F, Habibzadeh P (2015) How much precision in reporting statistics is enough? Croat Med J 56:490–492. https://doi.org/10.3325/cmj.2015.56.490
Altman DG, Bland JM (1995) Statistics notes: the normal distribution. BMJ 310:298–298. https://doi.org/10.1136/bmj.310.6975.298
Manikandan S (2011) Measures of central tendency: median and mode. J Pharmacol Pharmacother 2:214–215. https://doi.org/10.4103/0976-500X.83300
Ghasemi A, Zahediasl S (2012) Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab 10:486–489. https://doi.org/10.5812/ijem.3505
Habibzadeh F (2017) Statistical data editing in scientific articles. J Korean Med Sci 32:1072–1076
Manikandan S (2010) Data transformation. J Pharmacol Pharmacother 1:126–127. https://doi.org/10.4103/0976-500X.72373
Habibzadeh F (2013) Common statistical mistakes in manuscripts submitted to biomedical journals. Eur Sci Editing 39:92–94
Wan X, Wang W, Liu J, Tong T (2014) Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med Res Methodol 14:135. https://doi.org/10.1186/1471-2288-14-135
Banerjee A, Chaudhury S (2010) Statistics without tears: populations and samples. Ind Psychiatry J 19:60–65. https://doi.org/10.4103/0972-6748.77642
Zhao L, Tian L, Cai T, Claggett B, Wei LJ (2013) Effectively selecting a target population for a future comparative study. J Am Stat Assoc 108:527–539. https://doi.org/10.1080/01621459.2013.770705
Altman DG, Bland JM (2005) Standard deviations and standard errors. BMJ 331:903–903. https://doi.org/10.1136/bmj.331.7521.903
Altman DG (2005) Why we need confidence intervals. World J Surg 29:554–556. https://doi.org/10.1007/s00268-005-7911-0
du Prel J-B, Hommel G, Röhrig B, Blettner M (2009) Confidence interval or p-value?: part 4 of a series on evaluation of scientific publications. Deutsches Arzteblatt Int 106:335–339. https://doi.org/10.3238/arztebl.2009.0335
de Graaf MA, Jager KJ, Zoccali C, Dekker FW (2011) Matching, an appealing method to avoid confounding? Nephron Clin Pract 118:c315–c318. https://doi.org/10.1159/000323136
Pearce N (2016) Analysis of matched case-control studies. BMJ 352:i969. https://doi.org/10.1136/bmj.i969
Pernet C (2015) Null hypothesis significance testing: a short tutorial. F1000Research 4:621–621. https://doi.org/10.12688/f1000research.6963.3
Banerjee A, Chitnis UB, Jadhav SL, Bhawalkar JS, Chaudhury S (2009) Hypothesis testing, type I and type II errors. Ind Psychiatry J 18:127–131. https://doi.org/10.4103/0972-6748.62274
Misra DP, Wakhlu A, Agarwal V, Sharma A, Negi VS (2017) Appropriate statistical analysis and research reporting. J Korean Med Sci 32:1379–1380. https://doi.org/10.3346/jkms.2017.32.8.1379
Ioannidis JPA (2019) What have we (not) learnt from millions of scientific papers with p values? Am Stat 73:20–25. https://doi.org/10.1080/00031305.2018.1447512
Ahmed S, Dhooria A (2020) Pitfalls in statistical analysis—a reviewers’ perspective. Indian J Rheumatol 15:39–45. https://doi.org/10.4103/injr.injr_32_20
Tsiamalou P, Brotis A (2020) Biostatistics as a tool for medical research: what are we doing wrong? Mediterr J Rheumatol 30:196–200. https://doi.org/10.31138/mjr.30.4.196
ICMJE recommendations http://www.icmje.org/icmje-recommendations.pdf [Updated December 2019; Accessed on 09 January 2020]
Calculation of required sample size. In: Kirkwood BR, Sterne JAC (eds) Essential Medical Statistics Blackwell Science Ltd.; 2003:413–428
Charan J, Biswas T (2013) How to calculate sample size for different study designs in medical research? Indian J Psychol Med 35:121–126. https://doi.org/10.4103/0253-7176.116232
Moher D, Dulberg CS, Wells GA (1994) Statistical power, sample size, and their reporting in randomized controlled trials. JAMA 272:122–124
Dumas-Mallet E, Button KS, Boraud T, Gonon F, Munafò MR (2017) Low statistical power in biomedical science: a review of three human research domains. R Soc Open Sci 4:160254–160254. https://doi.org/10.1098/rsos.160254
Nayak BK (2010) Understanding the relevance of sample size calculation. Indian J Ophthalmol 58:469–470. https://doi.org/10.4103/0301-4738.71673
Schmidt B, Gemeinholzer B, Treloar A (2016) Open data in global environmental research: the Belmont forum’s open data survey. PLoS ONE 11:e0146695–e0146695. https://doi.org/10.1371/journal.pone.0146695
Sullivan GM, Feinn R (2012) Using effect size-or why the p value is not enough. J Grad Med Educ 4:279–282. https://doi.org/10.4300/JGME-D-12-00156.1
Ialongo C (2016) Understanding the effect size and its measures. Biochem Med (Zagreb) 26:150–163. https://doi.org/10.11613/bm.2016.015
Bewick V, Cheek L, Ball J (2004) Statistics review 11: assessing risk. Crit Care 8:287–291. https://doi.org/10.1186/cc2908
Kitchen CMR (2009) Nonparametric vs parametric tests of location in biomedical research. Am J Ophthalmol 147:571–572. https://doi.org/10.1016/j.ajo.2008.06.031
Benlidayi IC (2019) Statistical accuracy in rheumatology research. Mediterr J Rheumatol 30:207–215. https://doi.org/10.31138/mjr.30.4.207
Bland JM, Altman DG (2000) Statistics notes. The odds ratio. BMJ 320:1468–1468. https://doi.org/10.1136/bmj.320.7247.1468
Armstrong RA (2014) When to use the Bonferroni correction. Ophthalmic Physiol Opt 34:502–508. https://doi.org/10.1111/opo.12131
Panagiotou OA, Ioannidis JPA, for the Genome-Wide Significance P (2011) What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol 41:273–286. https://doi.org/10.1093/ije/dyr178
Kim H-Y (2017) Statistical notes for clinical researchers: chi squared test and Fisher’s exact test. Restor Dent Endod 42:152–155. https://doi.org/10.5395/rde.2017.42.2.152
Bland JM, Altman DG (1994) Correlation, regression, and repeated data. BMJ 308:896. https://doi.org/10.1136/bmj.308.6933.896
Mukaka MM (2012) Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24:69–71
Schober P, Boer C, Schwarte LA (2018) Correlation coefficients: appropriate use and interpretation. Anesth Analg 126:1763–1768. https://doi.org/10.1213/ane.0000000000002864
Kwiecien R, Kopp-Schneider A, Blettner M (2011) Concordance analysis: part 16 of a series on evaluation of scientific publications. Deutsches Arzteblatt Int 108:515–521. https://doi.org/10.3238/arztebl.2011.0515
Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1:307–310
Bewick V, Cheek L, Ball J (2003) Statistics review 7: correlation and regression. Crit Care 7:451–459. https://doi.org/10.1186/cc2401
Sperandei S (2014) Understanding logistic regression analysis. Biochem Med 24:12–18. https://doi.org/10.11613/BM.2014.003
Bender R (2009) Introduction to the use of regression models in epidemiology. Methods Mol Biol 471:179–195. https://doi.org/10.1007/978-1-59745-416-2_9
Alexopoulos EC (2010) Introduction to multivariate regression analysis. Hippokratia 14:23–28
Schneider A, Hommel G, Blettner M (2010) Linear regression analysis: part 14 of a series on evaluation of scientific publications. Deutsches Arzteblatt Int 107:776–782. https://doi.org/10.3238/arztebl.2010.0776
Kim JH (2019) Multicollinearity and misleading statistical results. Korean J Anesthesiol 72:558–569. https://doi.org/10.4097/kja.19087
Tu YK, Clerehugh V, Gilthorpe MS (2004) Collinearity in linear regression is a serious problem in oral health research. Eur J Oral Sci 112:389–397. https://doi.org/10.1111/j.1600-0722.2004.00160.x
Gasparyan AY, Ayvazyan L, Mukanova U, Yessirkepov M, Kitas GD (2019) The platelet-to-lymphocyte ratio as an inflammatory marker in rheumatic diseases. Ann Lab Med 39:345–357. https://doi.org/10.3343/alm.2019.39.4.345
Ekelund S (2012) ROC curves—what are they and how are they used? Point of Care 11:16–21
Cook CE (2008) Clinimetrics corner: the minimal clinically important change score (mcid): a necessary pretense. J Man Manip Ther 16:E82–E83. https://doi.org/10.1179/jmt.2008.16.4.82E
Wells GA, Tugwell P, Kraag GR, Baker PR, Groh J, Redelmeier DA (1993) Minimum important difference between patients with rheumatoid arthritis: the patient’s perspective. J Rheumatol 20:557–560
Ranganathan P, Pramesh CS, Buyse M (2015) Common pitfalls in statistical analysis: clinical versus statistical significance. Perspect Clin Res 6:169–170. https://doi.org/10.4103/2229-3485.159943
Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The extent and consequences of p-hacking in science. PLoS Biol 13:e1002106. https://doi.org/10.1371/journal.pbio.1002106
Bruns SB, Ioannidis JPA (2016) p-curve and p-hacking in observational research. PLoS ONE 11:e0149144–e0149144. https://doi.org/10.1371/journal.pone.0149144
Hill AB (2015) The environment and disease: association or causation? 1965. J R Soc Med 108:32–37. https://doi.org/10.1177/0141076814562718
Parascandola M, Weed DL (2001) Causation in epidemiology. J Epidemiol Community Health 55:905–912. https://doi.org/10.1136/jech.55.12.905
Hong EP, Park JW (2012) Sample size and statistical power calculation in genetic association studies. Genomics Inform 10:117–122. https://doi.org/10.5808/GI.2012.10.2.117
Lee WC (2003) Searching for disease-susceptibility loci by testing for Hardy-Weinberg disequilibrium in a gene bank of affected individuals. Am J Epidemiol 158:397–400. https://doi.org/10.1093/aje/kwg150
Little J, Higgins JP, Ioannidis JP et al (2009) STrengthening the REporting of Genetic Association Studies (STREGA)–an extension of the STROBE statement. Genet Epidemiol 33:581–598. https://doi.org/10.1002/gepi.20410
Misra DP, Agarwal V (2018) Systematic reviews: challenges for their justification, related comprehensive searches, and implications. J Korean Med Sci 33:9. https://doi.org/10.3346/jkms.2018.33.e92
Kelley GA, Kelley KS (2019) Systematic reviews and meta-analysis in rheumatology: a gentle introduction for clinicians. Clin Rheumatol 38:2029–2038. https://doi.org/10.1007/s10067-019-04590-6
Abou-Raya A, Abou-Raya S, Khadrawe T (2018) Retracted: methotrexate in the treatment of symptomatic knee osteoarthritis: randomised placebo-controlled trial. Ann Rheum Dis 77:e46. https://doi.org/10.1136/annrheumdis-2013-204856
Steinfeld SD, Demols P, Salmon I, Kiss R, Appelboom T (2013) Notice of retraction of two articles (“Infliximab in patients with primary Sjögren’s syndrome: a pilot study” and “Infliximab in patients with primary Sjögren’s syndrome: one-year followup”). Arthritis Rheum 65:814–814. https://doi.org/10.1002/art.37874
Kivity S, Shoenfeld Y, Arango M-T et al (2017) Retracted: anti-ribosomal-phosphoprotein autoantibodies penetrate to neuronal cells via neuronal growth associated protein, affecting neuronal cells in vitro. Rheumatology 56:1827–1827. https://doi.org/10.1093/rheumatology/kex259
Abou-Raya A, Abou-Raya S, Helmii M (2018) The effect of vitamin d supplementation on inflammatory and hemostatic markers and disease activity in patients with systemic lupus erythematosus: a randomized placebo-controlled trial [retraction of: J Rheumatol. 2013 Mar;40(3):265–272]. J Rheumatol 45:1713. https://doi.org/10.3899/jrheum.111594.ret1
Alfawaz DD, Siebert S, Derakhshan MH (2019) RETRACTED: 249 The relative efficacy of secukinumab in psoriatic arthritis and ankylosing spondylitis: a systematic review and meta-analysis. Rheumatology. https://doi.org/10.1093/rheumatology/kez107.065
Moots R, Liu H (2011) Retraction. Rheumatology 50:2147–2147. https://doi.org/10.1093/rheumatology/ker376
Lukić IK, Marusić M (2001) Appointment of statistical editor and quality of statistics in a small medical journal. Croat Med J 42:500–503
Misra DP, Ravindran V, Wakhlu A, Sharma A, Agarwal V, Negi VS (2017) Publishing in black and white: the relevance of listing of scientific journals. Rheumatol Int 37:1773–1778. https://doi.org/10.1007/s00296-017-3830-2
Funding
No funding was received for this study.
Author information
Authors and Affiliations
Contributions
The conception and design of the study—DPM and AYG; acquisition of data, analysis and interpretation of data—DPM, OZ, and AYG. Drafting the article—DPM; Revising it critically for important intellectual content—OZ and AYG. Final approval of the version to be submitted—DPM, OZ, and AYG. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved—DPM, OZ, and AYG.
Corresponding author
Ethics declarations
Disclosure
The authors have no potential conflict of interest to disclose.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Misra, D.P., Zimba, O. & Gasparyan, A.Y. Statistical data presentation: a primer for rheumatology researchers. Rheumatol Int 41, 43–55 (2021). https://doi.org/10.1007/s00296-020-04740-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00296-020-04740-z