Clinical and Translational Oncology

, Volume 20, Issue 8, pp 954–965 | Cite as

Top ten errors of statistical analysis in observational studies for cancer research

  • A. Carmona-BayonasEmail author
  • P. Jimenez-Fonseca
  • A. Fernández-Somoano
  • F. Álvarez-Manceñido
  • E. Castañón
  • A. Custodio
  • F. A. de la Peña
  • R. M. Payo
  • L. P. Valiente
Review Article


Observational studies using registry data make it possible to compile quality information and can surpass clinical trials in some contexts. However, data heterogeneity, analytical complexity, and the diversity of aspects to be taken into account when interpreting results makes it easy for mistakes to be made and calls for mastery of statistical methodology. Some questionable research practices that include poor analytical data management are responsible for the low reproducibility of some results; yet, there is a paucity of information in the literature regarding specific statistical pitfalls of cancer studies. In addition to proposing how to avoid or solve them, this article seeks to expose ten common problematic situations in the analysis of cancer registries: convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values, and data dredging.


Cancer research Error Observational studies Pitfalls Registry Statistical analysis 



Priscilla Chase Duran is acknowledged for editing the manuscript.

Compliance with ethical standards

Conflict of interest

None to declare. This is an academic study. No financial support has been received from external sources.

Ethical statement

The study has been performed in accordance with the ethical standards of the Declaration of Helsinki and its later amendments.

Informed consent

Not required.


  1. 1.
    Garcia-Albeniz X, Chan JM, Paciorek AT, Logan RW, Kenfield SA, Cooperberg MR, et al. Immediate versus deferred initiation of androgen deprivation therapy in prostate cancer patients with PSA-only relapse. J Clin Oncol. 2014;32(15):817–24.Google Scholar
  2. 2.
    Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124.CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Jager LR, Leek JT. An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics. 2014;15(1):1–12.CrossRefPubMedGoogle Scholar
  4. 4.
    John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012;23(5):524–32.CrossRefPubMedGoogle Scholar
  5. 5.
    Baker M. 1500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452–4.CrossRefPubMedGoogle Scholar
  6. 6.
    Suissa S. Immortal time bias in pharmaco-epidemiology. Am J Epidemiol. 2008;167(4):492–9.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Gore SM, Jones G, Thompson SG. The lancet’s statistical review process: areas for improvement by authors. Lancet. 1992;340(8811):100–2.CrossRefPubMedGoogle Scholar
  8. 8.
    Goodman SN, Altman DG, George SL. Statistical reviewing policies of medical journals. J Gen Intern Med. 1998;13(11):753–6.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Fernandes-Taylor S, Hyun JK, Reeder RN, Harris AHS. Common statistical and research design problems in manuscripts submitted to high-impact medical journals. BMC Res Notes. 2011;4(1):304.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Wicherts JM, Borsboom D, Kats J, Molenaar D. The poor availability of psychological research data for reanalysis. Am Psychol. 2006;61(7):726.CrossRefPubMedGoogle Scholar
  11. 11.
    Vickers AJ. Sharing raw data from clinical trials: what progress since we first asked “Whose data set is it anyway?”. Trials. 2016;17(1):227.CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Bland M. An introduction to medical statistics. 4th ed. Oxford: Oxford University Press; 2015.Google Scholar
  13. 13.
    Kirkwood BR, Sterne JAC. Essential medical statistics. Massachusetts: Wiley; 2010.Google Scholar
  14. 14.
    Petrie A, Sabin C. Medical statistics at a glance. 3rd ed. Chichester: Wiley; 2013.Google Scholar
  15. 15.
    Carmona-Bayonas A, Font C, Fonseca PJ, Fenoy F, Otero R, Beato C, et al. On the necessity of new decision-making methods for cancer-associated, symptomatic, pulmonary embolism. Thromb Res. 2016;143:76–85.CrossRefPubMedGoogle Scholar
  16. 16.
    Carmona-Bayonas A, Fonseca PJ, Puig CF, Fenoy F, Candelera RO, Beato C, et al. Predicting serious complications in patients with cancer and pulmonary embolism using decision tree modeling: the EPIPHANY index. Br J Cancer. 2017;116(8):994–1001.CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Fonseca PJ, Carmona-Bayonas A, García IM, Marcos R, Castañón E, Antonio M, et al. A nomogram for predicting complications in patients with solid tumours and seemingly stable febrile neutropenia. Br J Cancer. 2016;114:1191–8.CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    van Es N, Di Nisio M, Cesarman G, Kleinjan A, Otten H-M, Mahé I, et al. Comparison of risk prediction scores for venous thromboembolism in cancer patients: a prospective cohort study. Haematologica. 2017;102(9):1494–501.CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Vickers AJ, Cronin AM. Everything you always wanted to know about evaluating prediction models (but were too afraid to ask). Urology. 2010;76(6):1298.CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902–7.CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Chaudhury A, Balakrishnan A, Thai C, Holmstrom B, Nanjappa S, Ma Z, et al. Validation of the khorana score in a large cohort of cancer patients with venous thromboembolism. Blood. 2016;128(2):879.Google Scholar
  22. 22.
    Del Priore G, Zandieh P, Lee M-J. Treatment of continuous data as categoric variables in obstetrics and gynecology. Obstet Gynecol. 1997;89(3):351–4.CrossRefPubMedGoogle Scholar
  23. 23.
    MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychol Methods. 2002;7(1):19.CrossRefPubMedGoogle Scholar
  24. 24.
    Ravichandran C, Fitzmaurice GM. To dichotomize or not to dichotomize? Nutrition. 2008;24(6):610–1.CrossRefPubMedGoogle Scholar
  25. 25.
    Austin PC, Brunner LJ. Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat Med. 2004;23(7):1159–78.CrossRefPubMedGoogle Scholar
  26. 26.
    Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. JNCI J Natl Cancer Inst. 1994;86(11):829–35.CrossRefPubMedGoogle Scholar
  27. 27.
    DeCoster J. Iselin A-MR, Gallucci M. A conceptual and empirical examination of justifications for dichotomization. Psychol Methods. 2009;14(4):349–66.CrossRefPubMedGoogle Scholar
  28. 28.
    Jiménez-Fonseca P, Carmona-Bayonas A, Hernández R, Custodio A, Cano JM, Lacalle A, et al. Lauren subtypes of advanced gastric cancer influence survival and response to chemotherapy: real-World Data from the AGAMENON National Cancer Registry. Br J Cancer. 2017;117(6):775–82.CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    George BJ, Beasley TM, Brown AW, Dawson J, Dimova R, Divers J, et al. Common scientific and statistical errors in obesity research. Obesity. 2016;24(4):781–90.CrossRefPubMedGoogle Scholar
  30. 30.
    Morton V, Torgerson DJ. Effect of regression to the mean on decision making in health care. BMJ Br Med J. 2003;326(7398):1083.CrossRefGoogle Scholar
  31. 31.
    Tsuboi M, Ezaki K, Tobinai K, Ohashi Y, Saijo N. Weekly administration of epoetin beta for chemotherapy-induced anemia in cancer patients: results of a multicenter, Phase III, randomized, double-blind, placebo-controlled study. Jpn J Clin Oncol. 2009;39(3):163–8.CrossRefPubMedGoogle Scholar
  32. 32.
    Bland JM, Altman DG. Statistics notes: some examples of regression towards the mean. BMJ. 1994;309(6957):780.CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Aronson JK. Biomarkers and surrogate endpoints. Br J Clin Pharmacol. 2005;59(5):491–4.CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Hamilton RJ, Goldberg KC, Platz EA, Freedland SJ. The influence of statin medications on prostate-specific antigen levels. JNCI J Natl Cancer Inst. 2008;100(21):1511–8.CrossRefPubMedGoogle Scholar
  35. 35.
    Miyamoto RK, Thompson IM. The reliability of digital rectal exam, PSA, repeat prostate biopsy, and endorectal MRI for following patients with clinically localized prostate cancer on active surveillance. J Urol. 2008;179(4):154.CrossRefGoogle Scholar
  36. 36.
    Cummings SR, Palermo L, Browner W, Marcus R, Wallace R, Pearson J, et al. Monitoring osteoporosis therapy with bone densitometry: misleading changes and regression to the mean. JAMA. 2000;283(10):1318–21.CrossRefPubMedGoogle Scholar
  37. 37.
    Vitolins MZ, Griffin L, Tomlinson WV, Vuky J, Adams PT, Moose D, et al. Randomized trial to assess the impact of venlafaxine and soy protein on hot flashes and quality of life in men with prostate cancer. J Clin Oncol. 2013;31(32):4092–8.CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Wainer H. The most dangerous equation. Am Sci. 2007;95(3):249.CrossRefGoogle Scholar
  39. 39.
    Koller MT, Raatz H, Steyerberg EW, Wolbers M. Competing risks and the clinical community: irrelevance or ignorance? Stat Med. 2012;31(11–12):1089–97.CrossRefPubMedGoogle Scholar
  40. 40.
    Berry SD, Ngo L, Samelson EJ, Kiel DP. Competing risk of death: an important consideration in studies of older adults. J Am Geriatr Soc. 2010;58(4):783–7.CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Pietersen E, Ignatius E, Streicher EM, Mastrapa B, Padanilam X, Pooran A, et al. Long-term outcomes of patients with extensively drug-resistant tuberculosis in South Africa: a cohort study. Lancet. 2014;383(9924):1230–9.CrossRefPubMedGoogle Scholar
  42. 42.
    Ay C, Dunkler D, Simanek R, Thaler J, Koder S, Marosi C, et al. Prediction of venous thromboembolism in patients with cancer by measuring thrombin generation: results from the Vienna Cancer and Thrombosis Study. J Clin Oncol. 2011;29(15):2099–103.CrossRefPubMedGoogle Scholar
  43. 43.
    Ay C, Vormittag R, Dunkler D, Simanek R, Chiriac A-L, Drach J, et al. D-dimer and prothrombin fragment 1 + 2 predict venous thromboembolism in patients with cancer: results from the Vienna Cancer and Thrombosis Study. J Clin Oncol. 2009;27(25):4124–9.CrossRefPubMedGoogle Scholar
  44. 44.
    Campigotto F, Neuberg D, Zwicker JI. Biased estimation of thrombosis rates in cancer studies using the method of Kaplan and Meier. J Thromb Haemost. 2012;10(7):1449–51.CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Brown JD, Adams VR, Moga DC. Impact of time-varying treatment exposures on the risk of venous thromboembolism in multiple myeloma. Healthcare. 2016;4(4):93.CrossRefPubMedCentralGoogle Scholar
  46. 46.
    Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.CrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Thompson CA, Zhang Z-F, Arah OA. Competing risk bias to explain the inverse relationship between smoking and malignant melanoma. Eur J Epidemiol. 2013;28(7):557–67.CrossRefPubMedGoogle Scholar
  48. 48.
    Stragliotto G, Rahbar A, Solberg NW, Lilja A, Taher C, Orrego A, et al. Effects of valganciclovir as an addon therapy in patients with cytomegaloviruspositive glioblastoma: a randomized, double blind, hypothesis generating study. Int J Cancer. 2013;133(5):1204–13.CrossRefPubMedGoogle Scholar
  49. 49.
    Park HS, Gross CP, Makarov DV, James BY. Immortal time bias: a frequently unrecognized threat to validity in the evaluation of postoperative radiotherapy. Int J Radiat Oncol Biol Phys. 2012;83(5):1365–73.CrossRefPubMedGoogle Scholar
  50. 50.
    Parikh ND, Marshall VD, Singal AG, Nathan H, Lok AS, Balkrishnan R, et al. Survival and cost-effectiveness of sorafenib therapy in advanced hepatocellular carcinoma: an analysis of the SEER-Medicare database. Hepatology. 2017;65(1):122–33.CrossRefPubMedGoogle Scholar
  51. 51.
    Suissa S. Immortal time bias in pharmacoepidemiology. Am J Epidemiol. 2007;167(4):492–9.CrossRefPubMedGoogle Scholar
  52. 52.
    Redelmeier DA, Singh SM. Survival in Academy Award–winning actors and actresses. Ann Intern Med. 2001;134(10):955–62.CrossRefPubMedGoogle Scholar
  53. 53.
    Bonadonna G, Valagussa P. Dose-response effect of adjuvant chemotherapy in breast cancer. N Engl J Med. 1981;304(1):10–5.CrossRefPubMedGoogle Scholar
  54. 54.
    Simon R, Makuch RW. A non-parametric graphical representation of the relationship between survival and the occurrence of an event: application to responder versus non-responder bias. Stat Med. 1984;3(1):35–44.CrossRefPubMedGoogle Scholar
  55. 55.
    van Rein N, Cannegieter SC, Rosendaal FR, Reitsma PH, Lijfering WM. Suspected survivor bias in case-control studies: stratify on survival time and use a negative control. J Clin Epidemiol. 2017;67(2):232–5.CrossRefGoogle Scholar
  56. 56.
    Hu Z-H, Connett JE, Yuan J-M, Anderson KE. Role of survivor bias in pancreatic cancer case–control studies. Ann Epidemiol. 2016;26(1):50–6.CrossRefPubMedGoogle Scholar
  57. 57.
    Sy RW, Bannon PG, Bayfield MS, Brown C, Kritharides L. Survivor treatment selection bias and outcomes research: a case study of surgery in infective endocarditis. Circ Cardiovasc Qual Outcomes. 2009;2(5):469–74.CrossRefPubMedGoogle Scholar
  58. 58.
    Ho AM-H, Zamora JE, Holcomb JB, Ng CSH, Karmakar MK, Dion PW. The many faces of survivor bias in observational studies on trauma resuscitation requiring massive transfusion. Ann Emerg Med. 2017;66(1):45–8.CrossRefGoogle Scholar
  59. 59.
    Brundage M, Osoba D, Bezjak A, Tu D, Palmer M, Pater J. Lessons learned in the assessment of health-related quality of life: selected examples from the National Cancer Institute of Canada Clinical Trials Group. J Clin Oncol. 2007;25(32):5078–81.CrossRefPubMedGoogle Scholar
  60. 60.
    Nielsen SF, Nordestgaard BG, Bojesen SE. Statin use and reduced cancer-related mortality. N Engl J Med. 2012;367(19):1792–802.CrossRefPubMedGoogle Scholar
  61. 61.
    Griffiths R, Mikhael J, Gleeson M, Danese M, Dreyling M. Addition of rituximab to chemotherapy alone as first-line therapy improves overall survival in elderly patients with mantle cell lymphoma. Blood. 2011;118(18):4808–16.CrossRefPubMedPubMedCentralGoogle Scholar
  62. 62.
    Austin PC, Mamdani MM, Van Walraven C, Tu JV. Quantifying the impact of survivor treatment bias in observational studies. J Eval Clin Pract. 2006;12(6):601–12.CrossRefPubMedGoogle Scholar
  63. 63.
    Jeličić H, Phelps E, Lerner RM. Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. Dev Psychol. 2009;45(4):1195–9.CrossRefPubMedGoogle Scholar
  64. 64.
    Burton A, Altman DG. Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. Br J Cancer. 2004;91(1):4–8.CrossRefPubMedPubMedCentralGoogle Scholar
  65. 65.
    Rombach I, Rivero-Arias O, Gray AM, Jenkinson C, Burke O. The current practice of handling and reporting missing outcome data in eight widely used PROMs in RCT publications: a review of the current literature. Qual Life Res. 2016;25(7):1613–23.CrossRefPubMedPubMedCentralGoogle Scholar
  66. 66.
    Raboud JM, Montaner JSG, Thorne A, Singer J, Schechter MT. Group CHIVTNAS. Impact of missing data due to dropouts on estimates of the treatment effect in a randomized trial of antiretroviral therapy for HIV-infected individuals. JAIDS J Acquir Immune Defic Syndr. 1996;12(1):46–55.CrossRefGoogle Scholar
  67. 67.
    Rubin DB, Schenker N. Multiple imputation in healthcare databases: an overview and some applications. Stat Med. 1991;10(4):585–98.CrossRefPubMedGoogle Scholar
  68. 68.
    Harrell F. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. 2nd ed. New York: Springer; 2015.CrossRefGoogle Scholar
  69. 69.
    Vasan SK, Hwang J, Rostgaard K, Nyrén O, Ullum H, Pedersen OB, et al. ABO blood group and risk of cancer: a register-based cohort study of 1.6 million blood donors. Cancer Epidemiol. 2016;44:40–3.CrossRefPubMedGoogle Scholar
  70. 70.
    Sen PK. Multiple comparisons in interim analysis. J Stat Plan Inference. 1999;82(1):5–23.CrossRefGoogle Scholar
  71. 71.
    Smith GD, Ebrahim S. Data dredging, bias, or confounding: they can all get you into the BMJ and the Friday papers. BMJ Br Med J. 2002;325(7378):1437.CrossRefGoogle Scholar
  72. 72.
    Sterling TD. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J Am Stat Assoc. 1959;54(285):30–4.Google Scholar
  73. 73.
    Stacey AW, Pouly S, Czyz CN. An analysis of the use of multiple comparison corrections in ophthalmology research. An Analysis of the use of multiple comparison corrections. Invest Ophthalmol Vis Sci. 2012;53(4):1830–4.CrossRefPubMedGoogle Scholar

Copyright information

© Federación de Sociedades Españolas de Oncología (FESEO) 2017

Authors and Affiliations

  1. 1.Department of Hematology and Medical OncologyHospital Universitario Morales Meseguer, UMU, IMIBMurciaSpain
  2. 2.Department of Medical OncologyHospital Universitario Central de AsturiasOviedoSpain
  3. 3.IUOPA- Area of Preventive Medicine and Public Health; Department of MedicineUniversity of OviedoOviedoSpain
  4. 4.CIBER of Epidemiology and Public Health, CIBERESP, Instituto de Salud Carlos IIIMadridSpain
  5. 5.Department of Hospital PharmacyHospital Universitario Central de AsturiasOviedoSpain
  6. 6.Department of Medical OncologyClínica Universidad de NavarraPamplonaSpain
  7. 7.Department of Medical OncologyHospital Universitario La Paz, CIBERONC CB16/12/00398MadridSpain
  8. 8.Faculty of Medicine and Health SciencesUniversity of OviedoOviedoSpain
  9. 9.Department of Statistical Analysis and Big DataCatholic University of Murcia (UCAM)MurciaSpain

Personalised recommendations