European Journal of Epidemiology

, Volume 32, Issue 1, pp 21–29 | Cite as

Statistical inference in abstracts of major medical and epidemiology journals 1975–2014: a systematic review

  • Andreas Stang
  • Markus Deckert
  • Charles Poole
  • Kenneth J. Rothman


Since its introduction in the twentieth century, null hypothesis significance testing (NHST), a hybrid of significance testing (ST) advocated by Fisher and null hypothesis testing (NHT) developed by Neyman and Pearson, has become widely adopted but has also been a source of debate. The principal alternative to such testing is estimation with point estimates and confidence intervals (CI). Our aim was to estimate time trends in NHST, ST, NHT and CI reporting in abstracts of major medical and epidemiological journals. We reviewed 89,533 abstracts in five major medical journals and seven major epidemiological journals, 1975–2014, and estimated time trends in the proportions of abstracts containing statistical inference. In those abstracts, we estimated time trends in the proportions relying on NHST and its major variants, ST and NHT, and in the proportions reporting CIs without explicit use of NHST (CI-only approach). The CI-only approach rose monotonically during the study period in the abstracts of all journals. In Epidemiology abstracts, as a result of the journal’s editorial policy, the CI-only approach has always been the most common approach. In the other 11 journals, the NHST approach started out more common, but by 2014, this disparity had narrowed, disappeared or reversed in 9 of them. The exceptions were JAMA, New England Journal of Medicine, and Lancet abstracts, where the predominance of the NHST approach prevailed over time. In 2014, the CI-only approach is as popular as the NHST approach in the abstracts of 4 of the epidemiology journals: the American Journal of Epidemiology (48%), the Annals of Epidemiology (55%), Epidemiology (79%) and the International Journal of Epidemiology (52%). The reporting of CIs without explicitly interpreting them as statistical tests is becoming more common in abstracts, particularly in epidemiology journals. Although NHST is becoming less popular in abstracts of most epidemiology journals studied and some widely read medical journals, it is still very common in the abstracts of other widely read medical journals, especially in the hybrid form of ST and NHT in which p values are reported numerically along with declarations of the presence or absence of statistical significance.


Statistics Confidence intervals Statistics and numerical data 



The authors thank Sander Greenland, DrPH for valuable comments on an early draft.


Andreas Stang receives a grant from the German Federal Ministry of Education and Science (BMBF), Grant Number 01ER1305.

Compliance with ethical standards

Conflict of interest

None of the authors declares a conflict of interest.

Authors’ contributions

AS, MD, CP, and KJR were involved in the study design. AS and MD performed the statistical analyses. AS wrote the first draft of the report. All authors contributed to the final version.

Supplementary material

10654_2016_211_MOESM1_ESM.docx (51 kb)
Supplementary material 1 (DOCX 51 kb)


  1. 1.
    Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Krüger L. The empire of chance. How probability changed science and everyday life. Cambridge: Cambridge University Press; 1989.CrossRefGoogle Scholar
  2. 2.
    Anderson DR, Burnham KP, Thompson WL. Null hypothesis testing: problems, prevalence, and an alternative. J Wildl Manag. 2000;64(4):912–23.CrossRefGoogle Scholar
  3. 3.
    Rothman KJ. A show of confidence. N Engl J Med. 1978;299(24):1362–3.CrossRefPubMedGoogle Scholar
  4. 4.
    International Committee of Medical Journal Editors. Uniform requirements for manuscripts submitted to biomedical journals. Br Med J (Clin Res Ed). 1988;296(6619):401–5.CrossRefGoogle Scholar
  5. 5.
    Gardner MJ, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed). 1986;292(6522):746–50.CrossRefGoogle Scholar
  6. 6.
    Rothman KJ. Significance questing. Ann Intern Med. 1986;105(3):445–7.CrossRefPubMedGoogle Scholar
  7. 7.
    Weinberg CR. It’s time to rehabilitate the P-value. Epidemiology. 2001;12(3):288–90.CrossRefPubMedGoogle Scholar
  8. 8.
    Sterne JA, Davey SG. Sifting the evidence-what’s wrong with significance tests? BMJ. 2001;322(7280):226–31.CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Greenland S, Senn SJ, Rothman KJ, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31(4):337–50.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Wasserstein RL, Lazar NA. The ASA’s statement on p-values: context, process, and purpose. Am Stat. 2016;70(2):129–33.CrossRefGoogle Scholar
  11. 11.
    Walter SD. Methods of reporting statistical results from medical research studies. Am J Epidemiol. 1995;141(10):896–906.CrossRefPubMedGoogle Scholar
  12. 12.
    Gastwirth JL. Statistical considerations support the supreme court’s decision in Matrixx Initiatives v. Siracusano. Jurimetrics. 2012;52:155–75.Google Scholar
  13. 13.
    Trafimow D, Marks M. Editorial. Basic Appl Soc Psych. 2015;37:1–2.CrossRefGoogle Scholar
  14. 14.
    Anonymous. Psychology journal bans P values. Nature 2015; 519:9.Google Scholar
  15. 15.
    Savitz DA, Tolo KA, Poole C. Statistical significance testing in the American Journal of Epidemiology, 1970–1990. Am J Epidemiol. 1994;139(10):1047–52.CrossRefPubMedGoogle Scholar
  16. 16.
    Fidler F, Thomason N, Cumming G, Finch S, Leeman J. Editors can lead researchers to confidence intervals, but can’t make them think: statistical reform lessons from medicine. Psychol Sci. 2004;15(2):119–26.CrossRefPubMedGoogle Scholar
  17. 17.
    MacArthur RD, Jackson GG. An evaluation of the use of statistical methodology in the. J Infect Dis. 1984;149(3):349–54.CrossRefPubMedGoogle Scholar
  18. 18.
    Vacha-Haase T, Nilsson JE, Reetz DR, Lance TS, Thompson B. Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory Psychol. 2000;10(3):413–25.CrossRefGoogle Scholar
  19. 19.
    Chavalarias D, Wallach JD, Li AH, Ioannidis JP. Evolution of reporting P values in the biomedical literature, 1990–2015. JAMA. 2016;315(11):1141–8.CrossRefPubMedGoogle Scholar
  20. 20.
    Fritz A, Scherndl T, Kühlberger A. A comprehensive review of reporting practices in psychological journals: are effect sizes really enough? Theory Psychol. 2012;23(1):98–112.CrossRefGoogle Scholar
  21. 21.
    Thompson B. Journal editorial policies regarding statistical significance tests: heat is to fire as p is to importance. Educ Psychol Rev. 1999;11(2):157–69.CrossRefGoogle Scholar
  22. 22.
    Cleveland WS, Devlin S, Grosse E. Regression by local fitting. J Econom. 1988;37:87–114.CrossRefGoogle Scholar
  23. 23.
    Cleveland WS, Grosse E. Computational methods for local regression. Stat Comput. 1991;1:47–62.CrossRefGoogle Scholar
  24. 24.
    Newcombe RG. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat Med. 1998;17(8):857–72.CrossRefPubMedGoogle Scholar
  25. 25.
    Milne PH. Presentation graphics for engineering, science, and business. London: E & FN Spon; 2005.Google Scholar
  26. 26.
    Horton NJ, Switzer SS. Statistical methods in the journal. N Engl J Med. 2005;353(18):1977–9.CrossRefPubMedGoogle Scholar
  27. 27.
    Felson DT, Cupples LA, Meenan RF. Misuse of statistical methods in Arthritis and Rheumatism. 1982 versus 1967-68. Arthritis Rheum. 1984;27(9):1018–22.CrossRefPubMedGoogle Scholar
  28. 28.
    Arnold LD, Braganza M, Salih R, Colditz GA. Statistical trends in the Journal of the American Medical Association and implications for training across the continuum of medical education. PLoS ONE. 2013;8(10):e77301.CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Jin Z, Yu D, Zhang L, et al. A retrospective survey of research design and statistical analyses in selected Chinese medical journals in 1998 and 2008. PLoS ONE. 2010;5(5):e10822.CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Guidance for Industry. E9 Statistical Principles for Clinical Trials. Food and Drug Administration 1998. Accessed Oct 4, 2015.
  31. 31.
    Deeks JJ, Higgins JPT, Altman DG. Analysing data and undertaking meta-analyses. In: Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions version 510 (updated March 2011): Cochrane Collaboration (; 2011.
  32. 32.
    Koricheva J, Gurevitch J. Place of meta-analysis among other methods of research synthesis. In: Koricheva J, Gurevitch J, Mengerson K, editors. Handbook of meta-analysis in ecology and evolution. Princeton: Princeton University Press; 2013. p. 1–13.Google Scholar
  33. 33.
    Freemantle N, Geddes J. Understanding and interpreting systematic reviews and meta-analyses. Part 2: meta-analyses. Evid Based. Mental Health. 1998;1:102–4.Google Scholar
  34. 34.
    Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. Chichester: Wiley; 2009. P. 251–5, 297–302, 325–31.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Andreas Stang
    • 1
    • 2
  • Markus Deckert
    • 1
  • Charles Poole
    • 3
  • Kenneth J. Rothman
    • 4
  1. 1.Center of Clinical Epidemiology, Institute of Medical Informatics, Biometry and EpidemiologyUniversity Hospital of EssenEssenGermany
  2. 2.Department of EpidemiologyBoston University School of Public HealthBostonUSA
  3. 3.Department of Epidemiology, Gillings School of Global Public HealthUniversity of North CarolinaChapel HillUSA
  4. 4.RTI Health SolutionsDurhamUSA

Personalised recommendations