Statistical Testing

  • Jan Van den Broeck
  • Jonathan R. Brestoff


Statistical testing is used for exploring hypotheses about the possible existence of effects (differences, statistical relations). One chooses a statistical test mainly on the basis of which type of variable or which distributional characteristic of a variable is to be compared and related. Each statistical test has its own type of test statistic that captures the amount of effect/difference observed in the sample data. The problem with observed effects in samples is that they are influenced by sampling variation (chance) and may not accurately represent real population effects. P-values are therefore attached to the observed values of a test statistic in an attempt to acquire better insight into whether an observed effect is real. P-values are the probability of finding the observed value of the test statistic, or a value more extreme than it, when the null hypothesis (that there is absence of an effect or difference) is in fact true. As such, P-values are sometimes but not always a good basis for accepting or rejecting a null hypothesis. After discussing the uses of statistical testing in epidemiology and different types of hypotheses to test, we discuss the interpretations of P-values and conclude with a brief overview of commonly used statistical tests.


Null Hypothesis Alternative Hypothesis Prior Probability Null Hypothesis Test Null Case 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Abelson RP (1995) Statistics as principled argument. Lawrence Erlbaum Associates, Hilsdale, pp 1–221. ISBN 0805805281Google Scholar
  2. Altman DG, Bland JM (1995) Absence of evidence is not evidence of absence. BMJ 311:485PubMedCrossRefGoogle Scholar
  3. Evans MD et al (2009) Outcomes of resection and non-resection strategies in management of patients with advanced colorectal cancer. World J Surg Oncol 7:28PubMedCentralPubMedCrossRefGoogle Scholar
  4. Goodman SN, Berlin JA (1994) The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med 121:200–206PubMedCrossRefGoogle Scholar
  5. Miettinen OS (1985) Theoretical epidemiology. Delmar, New York, pp 1–359. ISBN 0827343132Google Scholar
  6. Miettinen OS (2009a) Up from ‘false positives’ in genetic – and other – epidemiology. Eur J Epidemiol 24:1–5PubMedCrossRefGoogle Scholar
  7. Miettinen OS (2009b). Ziliak ST and McCloskey DN. The cult of statistical significance. How the standard error costs us jobs, justice, and lives (book review). Eur J Epidemiol 24:111–114CrossRefGoogle Scholar
  8. Nagakawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045CrossRefGoogle Scholar
  9. Perneger TV (1998) What’s wrong with Bonferroni adjustments. BMJ 316:1236–1238PubMedCrossRefGoogle Scholar
  10. Rothman KJ (1990) No adjustments are needed for multiple comparisons. Epidemiology 1:43–47PubMedCrossRefGoogle Scholar
  11. Rothman KJ (2010) Curbing type I and type II errors. Eur J Epidemiol 25:223–224PubMedCentralPubMedCrossRefGoogle Scholar
  12. Sterne JAC, Davey Smith G (2001) Sifting the evidence – what’s wrong with significance tests? BMJ 322:226–231PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Centre for International Health, Faculty of Medicine and DentistryUniversity of BergenBergenNorway
  2. 2.Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations