# Statistical Testing

• Jan Van den Broeck
• Jonathan R. Brestoff
Chapter

## Abstract

Statistical testing is used for exploring hypotheses about the possible existence of effects (differences, statistical relations). One chooses a statistical test mainly on the basis of which type of variable or which distributional characteristic of a variable is to be compared and related. Each statistical test has its own type of test statistic that captures the amount of effect/difference observed in the sample data. The problem with observed effects in samples is that they are influenced by sampling variation (chance) and may not accurately represent real population effects. P-values are therefore attached to the observed values of a test statistic in an attempt to acquire better insight into whether an observed effect is real. P-values are the probability of finding the observed value of the test statistic, or a value more extreme than it, when the null hypothesis (that there is absence of an effect or difference) is in fact true. As such, P-values are sometimes but not always a good basis for accepting or rejecting a null hypothesis. After discussing the uses of statistical testing in epidemiology and different types of hypotheses to test, we discuss the interpretations of P-values and conclude with a brief overview of commonly used statistical tests.

## Keywords

Null Hypothesis Alternative Hypothesis Prior Probability Null Hypothesis Test Null Case
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. Abelson RP (1995) Statistics as principled argument. Lawrence Erlbaum Associates, Hilsdale, pp 1–221. ISBN 0805805281Google Scholar
2. Altman DG, Bland JM (1995) Absence of evidence is not evidence of absence. BMJ 311:485
3. Evans MD et al (2009) Outcomes of resection and non-resection strategies in management of patients with advanced colorectal cancer. World J Surg Oncol 7:28
4. Goodman SN, Berlin JA (1994) The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med 121:200–206
5. Miettinen OS (1985) Theoretical epidemiology. Delmar, New York, pp 1–359. ISBN 0827343132Google Scholar
6. Miettinen OS (2009a) Up from ‘false positives’ in genetic – and other – epidemiology. Eur J Epidemiol 24:1–5
7. Miettinen OS (2009b). Ziliak ST and McCloskey DN. The cult of statistical significance. How the standard error costs us jobs, justice, and lives (book review). Eur J Epidemiol 24:111–114
8. Nagakawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045
9. Perneger TV (1998) What’s wrong with Bonferroni adjustments. BMJ 316:1236–1238
10. Rothman KJ (1990) No adjustments are needed for multiple comparisons. Epidemiology 1:43–47
11. Rothman KJ (2010) Curbing type I and type II errors. Eur J Epidemiol 25:223–224
12. Sterne JAC, Davey Smith G (2001) Sifting the evidence – what’s wrong with significance tests? BMJ 322:226–231