Lang JM, Rothman KJ, Cann CI. That confounded P-value. Epidemiology. 1998;9:7–8.
CAS
PubMed
Article
Google Scholar
Trafimow D, Marks M. Editorial. Basic Appl Soc Psychol. 2015;37:1–2.
Article
Google Scholar
Ashworth A. Veto on the use of null hypothesis testing and p intervals: right or wrong? Taylor & Francis Editor. 2015. Resources online, http://editorresources.taylorandfrancisgroup.com/veto-on-the-use-of-null-hypothesis-testing-and-p-intervals-right-or-wrong/. Accessed 27 Feb 2016.
Flanagan O. Journal’s ban on null hypothesis significance testing: reactions from the statistical arena. 2015. Stats Life online, https://www.statslife.org.uk/opinion/2114-journal-s-ban-on-null-hypothesis-significance-testing-reactions-from-the-statistical-arena. Accessed 27 Feb 2016.
Altman DG, Machin D, Bryant TN, Gardner MJ, eds. Statistics with confidence. 2nd ed. London: BMJ Books; 2000.
Google Scholar
Atkins L, Jarrett D. The significance of “significance tests”. In: Irvine J, Miles I, Evans J, editors. Demystifying social statistics. London: Pluto Press; 1979.
Google Scholar
Cox DR. The role of significance tests (with discussion). Scand J Stat. 1977;4:49–70.
Google Scholar
Cox DR. Statistical significance tests. Br J Clin Pharmacol. 1982;14:325–31.
CAS
PubMed
PubMed Central
Article
Google Scholar
Cox DR, Hinkley DV. Theoretical statistics. New York: Chapman and Hall; 1974.
Book
Google Scholar
Freedman DA, Pisani R, Purves R. Statistics. 4th ed. New York: Norton; 2007.
Google Scholar
Gigerenzer G, Swijtink Z, Porter T, Daston L, Beatty J, Kruger L. The empire of chance: how probability changed science and everyday life. New York: Cambridge University Press; 1990.
Google Scholar
Harlow LL, Mulaik SA, Steiger JH. What if there were no significance tests?. New York: Psychology Press; 1997.
Google Scholar
Hogben L. Statistical theory. London: Allen and Unwin; 1957.
Google Scholar
Kaye DH, Freedman DA. Reference guide on statistics. In: Reference manual on scientific evidence, 3rd ed. Washington, DC: Federal Judicial Center; 2011. p. 211–302.
Morrison DE, Henkel RE, editors. The significance test controversy. Chicago: Aldine; 1970.
Google Scholar
Oakes M. Statistical inference: a commentary for the social and behavioural sciences. Chichester: Wiley; 1986.
Google Scholar
Pratt JW. Bayesian interpretation of standard inference statements. J Roy Stat Soc B. 1965;27:169–203.
Google Scholar
Rothman KJ, Greenland S, Lash TL. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008.
Google Scholar
Ware JH, Mosteller F, Ingelfinger JA. p-Values. In: Bailar JC, Hoaglin DC, editors. Ch. 8. Medical uses of statistics. 3rd ed. Hoboken, NJ: Wiley; 2009. p. 175–94.
Google Scholar
Ziliak ST, McCloskey DN. The cult of statistical significance: how the standard error costs us jobs, justice and lives. Ann Arbor: U Michigan Press; 2008.
Google Scholar
Altman DG, Bland JM. Absence of evidence is not evidence of absence. Br Med J. 1995;311:485.
CAS
Article
Google Scholar
Anscombe FJ. The summarizing of clinical experiments by significance levels. Stat Med. 1990;9:703–8.
CAS
PubMed
Article
Google Scholar
Bakan D. The test of significance in psychological research. Psychol Bull. 1966;66:423–37.
CAS
PubMed
Article
Google Scholar
Bandt CL, Boen JR. A prevalent misconception about sample size, statistical significance, and clinical importance. J Periodontol. 1972;43:181–3.
CAS
PubMed
Article
Google Scholar
Berkson J. Tests of significance considered as evidence. J Am Stat Assoc. 1942;37:325–35.
Article
Google Scholar
Bland JM, Altman DG. Best (but oft forgotten) practices: testing for treatment effects in randomized trials by separate analyses of changes from baseline in each group is a misleading approach. Am J Clin Nutr. 2015;102:991–4.
PubMed
Article
CAS
Google Scholar
Chia KS. “Significant-itis”—an obsession with the P-value. Scand J Work Environ Health. 1997;23:152–4.
CAS
PubMed
Article
Google Scholar
Cohen J. The earth is round (p < 0.05). Am Psychol. 1994;47:997–1003.
Article
Google Scholar
Evans SJW, Mills P, Dawson J. The end of the P-value? Br Heart J. 1988;60:177–80.
CAS
PubMed
PubMed Central
Article
Google Scholar
Fidler F, Loftus GR. Why figures with error bars should replace p values: some conceptual arguments and empirical demonstrations. J Psychol. 2009;217:27–37.
Google Scholar
Gardner MA, Altman DG. Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J. 1986;292:746–50.
CAS
Article
Google Scholar
Gelman A. P-values and statistical practice. Epidemiology. 2013;24:69–72.
PubMed
Article
Google Scholar
Gelman A, Loken E. The statistical crisis in science: Data-dependent analysis—a “garden of forking paths”—explains why many statistically significant comparisons don’t hold up. Am Sci. 2014;102:460–465. Erratum at http://andrewgelman.com/2014/10/14/didnt-say-part-2/. Accessed 27 Feb 2016.
Gelman A, Stern HS. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60:328–31.
Article
Google Scholar
Gigerenzer G. Mindless statistics. J Socioecon. 2004;33:567–606.
Google Scholar
Gigerenzer G, Marewski JN. Surrogate science: the idol of a universal method for scientific inference. J Manag. 2015;41:421–40.
Google Scholar
Goodman SN. A comment on replication, p-values and evidence. Stat Med. 1992;11:875–9.
CAS
PubMed
Article
Google Scholar
Goodman SN. P-values, hypothesis tests and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol. 1993;137:485–96.
CAS
PubMed
Google Scholar
Goodman SN. Towards evidence-based medical statistics, I: the P-value fallacy. Ann Intern Med. 1999;130:995–1004.
CAS
PubMed
Article
Google Scholar
Goodman SN. A dirty dozen: twelve P-value misconceptions. Semin Hematol. 2008;45:135–40.
PubMed
Article
Google Scholar
Greenland S. Null misinterpretation in statistical testing and its impact on health risk assessment. Prev Med. 2011;53:225–8.
PubMed
Article
Google Scholar
Greenland S. Nonsignificance plus high power does not imply support for the null over the alternative. Ann Epidemiol. 2012;22:364–8.
PubMed
Article
Google Scholar
Greenland S. Transparency and disclosure, neutrality and balance: shared values or just shared words? J Epidemiol Community Health. 2012;66:967–70.
PubMed
Article
Google Scholar
Greenland S, Poole C. Problems in common interpretations of statistics in scientific articles, expert reports, and testimony. Jurimetrics. 2011;51:113–29.
Google Scholar
Greenland S, Poole C. Living with P-values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology. 2013;24:62–8.
PubMed
Article
Google Scholar
Greenland S, Poole C. Living with statistics in observational research. Epidemiology. 2013;24:73–8.
PubMed
Article
Google Scholar
Grieve AP. How to test hypotheses if you must. Pharm Stat. 2015;14:139–50.
PubMed
Article
Google Scholar
Hoekstra R, Finch S, Kiers HAL, Johnson A. Probability as certainty: dichotomous thinking and the misuse of p-values. Psychon Bull Rev. 2006;13:1033–7.
PubMed
Article
Google Scholar
Hurlbert Lombardi CM. Final collapse of the Neyman–Pearson decision theoretic framework and rise of the neoFisherian. Ann Zool Fenn. 2009;46:311–49.
Article
Google Scholar
Kaye DH. Is proof of statistical significance relevant? Wash Law Rev. 1986;61:1333–66.
Google Scholar
Lambdin C. Significance tests as sorcery: science is empirical—significance tests are not. Theory Psychol. 2012;22(1):67–90.
Article
Google Scholar
Langman MJS. Towards estimation and confidence intervals. BMJ. 1986;292:716.
CAS
PubMed
PubMed Central
Article
Google Scholar
LeCoutre M-P, Poitevineau J, Lecoutre B. Even statisticians are not immune to misinterpretations of null hypothesis tests. Int J Psychol. 2003;38:37–45.
Article
Google Scholar
Lew MJ. Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know P. Br J Pharmacol. 2012;166:1559–67.
CAS
PubMed
PubMed Central
Article
Google Scholar
Loftus GR. Psychology will be a much better science when we change the way we analyze data. Curr Dir Psychol. 1996;5:161–71.
Article
Google Scholar
Matthews JNS, Altman DG. Interaction 2: Compare effect sizes not P values. Br Med J. 1996;313:808.
CAS
Article
Google Scholar
Pocock SJ, Ware JH. Translating statistical findings into plain English. Lancet. 2009;373:1926–8.
PubMed
Article
Google Scholar
Pocock SJ, Hughes MD, Lee RJ. Statistical problems in the reporting of clinical trials. N Eng J Med. 1987;317:426–32.
CAS
Article
Google Scholar
Poole C. Beyond the confidence interval. Am J Public Health. 1987;77:195–9.
CAS
PubMed
PubMed Central
Article
Google Scholar
Poole C. Confidence intervals exclude nothing. Am J Public Health. 1987;77:492–3.
CAS
PubMed
PubMed Central
Article
Google Scholar
Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology. 2001;12:291–4.
CAS
PubMed
Article
Google Scholar
Rosnow RL, Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Am Psychol. 1989;44:1276–84.
Article
Google Scholar
Rothman KJ. A show of confidence. NEJM. 1978;299:1362–3.
CAS
PubMed
Article
Google Scholar
Rothman KJ. Significance questing. Ann Intern Med. 1986;105:445–7.
CAS
PubMed
Article
Google Scholar
Rozeboom WM. The fallacy of null-hypothesis significance test. Psychol Bull. 1960;57:416–28.
CAS
PubMed
Article
Google Scholar
Salsburg DS. The religion of statistics as practiced in medical journals. Am Stat. 1985;39:220–3.
Google Scholar
Schmidt FL. Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychol Methods. 1996;1:115–29.
Article
Google Scholar
Schmidt FL, Hunter JE. Methods of meta-analysis: correcting error and bias in research findings. 3rd ed. Thousand Oaks: Sage; 2014.
Google Scholar
Sterne JAC, Davey Smith G. Sifting the evidence—what’s wrong with significance tests? Br Med J. 2001;322:226–31.
CAS
Article
Google Scholar
Thompson WD. Statistical criteria in the interpretation of epidemiologic data. Am J Public Health. 1987;77:191–4.
CAS
PubMed
PubMed Central
Article
Google Scholar
Thompson B. The “significance” crisis in psychology and education. J Soc Econ. 2004;33:607–13.
Article
Google Scholar
Wagenmakers E-J. A practical solution to the pervasive problem of p values. Psychon Bull Rev. 2007;14:779–804.
PubMed
Article
Google Scholar
Walker AM. Reporting the results of epidemiologic studies. Am J Public Health. 1986;76:556–8.
CAS
PubMed
PubMed Central
Article
Google Scholar
Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ. 2014;348:g2215. doi:10.1136/bmj.g2215.
PubMed
Article
Google Scholar
Stigler SM. The history of statistics. Cambridge, MA: Belknap Press; 1986.
Google Scholar
Neyman J. Outline of a theory of statistical estimation based on the classical theory of probability. Philos Trans R Soc Lond A. 1937;236:333–80.
Article
Google Scholar
Edwards W, Lindman H, Savage LJ. Bayesian statistical inference for psychological research. Psychol Rev. 1963;70:193–242.
Article
Google Scholar
Berger JO, Sellke TM. Testing a point null hypothesis: the irreconcilability of P-values and evidence. J Am Stat Assoc. 1987;82:112–39.
Google Scholar
Edwards AWF. Likelihood. 2nd ed. Baltimore: Johns Hopkins University Press; 1992.
Google Scholar
Goodman SN, Royall R. Evidence and scientific research. Am J Public Health. 1988;78:1568–74.
CAS
PubMed
PubMed Central
Article
Google Scholar
Royall R. Statistical evidence. New York: Chapman and Hall; 1997.
Google Scholar
Sellke TM, Bayarri MJ, Berger JO. Calibration of p values for testing precise null hypotheses. Am Stat. 2001;55:62–71.
Article
Google Scholar
Goodman SN. Introduction to Bayesian methods I: measuring the strength of evidence. Clin Trials. 2005;2:282–90.
PubMed
Article
Google Scholar
Lehmann EL. Testing statistical hypotheses. 2nd ed. Wiley: New York; 1986.
Book
Google Scholar
Senn SJ. Two cheers for P-values. J Epidemiol Biostat. 2001;6(2):193–204.
CAS
PubMed
Article
Google Scholar
Senn SJ. Letter to the Editor re: Goodman 1992. Stat Med. 2002;21:2437–44.
PubMed
Article
Google Scholar
Mayo DG, Cox DR. Frequentist statistics as a theory of inductive inference. In: J Rojo, editor. Optimality: the second Erich L. Lehmann symposium, Lecture notes-monograph series, Institute of Mathematical Statistics (IMS). 2006;49: 77–97.
Murtaugh PA. In defense of P-values (with discussion). Ecology. 2014;95(3):611–53.
PubMed
Article
Google Scholar
Hedges LV, Olkin I. Vote-counting methods in research synthesis. Psychol Bull. 1980;88:359–69.
Article
Google Scholar
Chalmers TC, Lau J. Changes in clinical trials mandated by the advent of meta-analysis. Stat Med. 1996;15:1263–8.
CAS
PubMed
Article
Google Scholar
Maheshwari S, Sarraj A, Kramer J, El-Serag HB. Oral contraception and the risk of hepatocellular carcinoma. J Hepatol. 2007;47:506–13.
CAS
PubMed
Article
Google Scholar
Cox DR. The planning of experiments. New York: Wiley; 1958. p. 161.
Google Scholar
Smith AH, Bates M. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology. 1992;3:449–52.
CAS
PubMed
Article
Google Scholar
Goodman SN. Letter to the editor re Smith and Bates. Epidemiology. 1994;5:266–8.
CAS
PubMed
Article
Google Scholar
Goodman SN, Berlin J. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200–6.
CAS
PubMed
Article
Google Scholar
Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat. 2001;55:19–24.
Article
Google Scholar
Senn SJ. Power is indeed irrelevant in interpreting completed studies. BMJ. 2002;325:1304.
PubMed
PubMed Central
Article
Google Scholar
Lash TL, Fox MP, Maclehose RF, Maldonado G, McCandless LC, Greenland S. Good practices for quantitative bias analysis. Int J Epidemiol. 2014;43:1969–85.
PubMed
Article
Google Scholar
Dwan K, Gamble C, Williamson PR, Kirkham JJ, Reporting Bias Group. Systematic review of the empirical evidence of study publication bias and outcome reporting bias—an updated review. PLoS One. 2013;8:e66844.
CAS
PubMed
PubMed Central
Article
Google Scholar
Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias due to selective inclusion and reporting of outcomes and analyses in systematic reviews of randomised trials of healthcare interventions. Cochrane Database Syst Rev. 2014;10:MR000035.
You B, Gan HK, Pond G, Chen EX. Consistency in the analysis and reporting of primary end points in oncology randomized controlled trials from registration to publication: a systematic review. J Clin Oncol. 2012;30:210–6.
PubMed
Article
Google Scholar
Button K, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J, Robinson ESJ, Munafò MR. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14:365–76.
CAS
PubMed
Article
Google Scholar
Eyding D, Lelgemann M, Grouven U, Härter M, Kromp M, Kaiser T, Kerekes MF, Gerken M, Wieseler B. Reboxetine for acute treatment of major depression: systematic review and meta-analysis of published and unpublished placebo and selective serotonin reuptake inhibitor controlled trials. BMJ. 2010;341:c4737.
PubMed
PubMed Central
Article
Google Scholar
Land CE. Estimating cancer risks from low doses of ionizing radiation. Science. 1980;209:1197–203.
CAS
PubMed
Article
Google Scholar
Land CE. Statistical limitations in relation to sample size. Environ Health Perspect. 1981;42:15–21.
CAS
PubMed
PubMed Central
Article
Google Scholar
Greenland S. Dealing with uncertainty about investigator bias: disclosure is informative. J Epidemiol Community Health. 2009;63:593–8.
PubMed
Article
Google Scholar
Xu L, Freeman G, Cowling BJ, Schooling CM. Testosterone therapy and cardiovascular events among men: a systematic review and meta-analysis of placebo-controlled randomized trials. BMC Med. 2013;11:108.
CAS
PubMed
PubMed Central
Article
Google Scholar
Neyman J, Pearson ES. On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika. 1928;20A:175–240.
Google Scholar
Pearson ES. Statistical concepts in the relation to reality. J R Stat Soc B. 1955;17:204–7.
Google Scholar
Fisher RA. Statistical methods and scientific inference. Edinburgh: Oliver and Boyd; 1956.
Google Scholar
Hill AB. The environment and disease: association or causation? Proc R Soc Med. 1965;58:295–300.
CAS
PubMed
PubMed Central
Google Scholar
Casella G, Berger RL. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc. 1987;82:106–11.
Article
Google Scholar
Casella G, Berger RL. Comment. Stat Sci. 1987;2:344–417.
Article
Google Scholar
Yates F. The influence of statistical methods for research workers on the development of the science of statistics. J Am Stat Assoc. 1951;46:19–34.
Google Scholar
Cumming G. Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge; 2011.
Google Scholar
Morey RD, Hoekstra R, Rouder JN, Lee MD, Wagenmakers E-J. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev (in press).
Rosenthal R, Rubin DB. The counternull value of an effect size: a new statistic. Psychol Sci. 1994;5:329–34.
Article
Google Scholar
Mayo DG, Spanos A. Severe testing as a basic concept in a Neyman–Pearson philosophy of induction. Br J Philos Sci. 2006;57:323–57.
Article
Google Scholar
Whitehead A. Meta-analysis of controlled clinical trials. New York: Wiley; 2002.
Book
Google Scholar
Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. Introduction to meta-analysis. New York: Wiley; 2009.
Book
Google Scholar
Chen D-G, Peace KE. Applied meta-analysis with R. New York: Chapman & Hall/CRC; 2013.
Google Scholar
Cooper H, Hedges LV, Valentine JC. The handbook of research synthesis and meta-analysis. Thousand Oaks: Sage; 2009.
Google Scholar
Greenland S, O’Rourke K. Meta-analysis Ch. 33. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. 3rd ed. Philadelphia: Lippincott-Wolters-Kluwer; 2008. p. 682–5.
Google Scholar
Petitti DB. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 2nd ed. New York: Oxford U Press; 2000.
Google Scholar
Sterne JAC. Meta-analysis: an updated collection from the Stata journal. College Station, TX: Stata Press; 2009.
Google Scholar
Weinberg CR. It’s time to rehabilitate the P-value. Epidemiology. 2001;12:288–90.
CAS
PubMed
Article
Google Scholar