Clinical Research General

Clinical Orthopaedics and Related Research®

, Volume 468, Issue 3, pp 885-892

P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers

  • David Jean BiauAffiliated withDépartement de Biostatistique et Informatique Médicale, INSERM–UMR-S 717, AP-HP, Université Paris 7, Hôpital Saint Louis Email author 
  • , Brigitte M. JollesAffiliated withHôpital Orthopédique Département de l’Appareil Locomoteur Centre Hospitalier, Universitaire Vaudois Université de Lausanne
  • , Raphaël PorcherAffiliated withDépartement de Biostatistique et Informatique Médicale, INSERM–UMR-S 717, AP-HP, Université Paris 7, Hôpital Saint Louis

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


In the 1920s, Ronald Fisher developed the theory behind the p value and Jerzy Neyman and Egon Pearson developed the theory of hypothesis testing. These distinct theories have provided researchers important quantitative tools to confirm or refute their hypotheses. The p value is the probability to obtain an effect equal to or more extreme than the one observed presuming the null hypothesis of no effect is true; it gives researchers a measure of the strength of evidence against the null hypothesis. As commonly used, investigators will select a threshold p value below which they will reject the null hypothesis. The theory of hypothesis testing allows researchers to reject a null hypothesis in favor of an alternative hypothesis of some effect. As commonly used, investigators choose Type I error (rejecting the null hypothesis when it is true) and Type II error (accepting the null hypothesis when it is false) levels and determine some critical region. If the test statistic falls into that critical region, the null hypothesis is rejected in favor of the alternative hypothesis. Despite similarities between the two, the p value and the theory of hypothesis testing are different theories that often are misunderstood and confused, leading researchers to improper conclusions. Perhaps the most common misconception is to consider the p value as the probability that the null hypothesis is true rather than the probability of obtaining the difference observed, or one that is more extreme, considering the null is true. Another concern is the risk that an important proportion of statistically significant results are falsely significant. Researchers should have a minimum understanding of these two theories so that they are better able to plan, conduct, interpret, and report scientific experiments.