On the correct interpretation of p values and the importance of random variables

Rochefort-Maranda, Guillaume

doi:10.1007/s11229-015-0807-0

On the correct interpretation of p values and the importance of random variables

Published: 04 July 2015

Volume 193, pages 1777–1793, (2016)
Cite this article

Synthese Aims and scope Submit manuscript

Guillaume Rochefort-Maranda¹

508 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

The p value is the probability under the null hypothesis of obtaining an experimental result that is at least as extreme as the one that we have actually obtained. That probability plays a crucial role in frequentist statistical inferences. But if we take the word ‘extreme’ to mean ‘improbable’, then we can show that this type of inference can be very problematic. In this paper, I argue that it is a mistake to make such an interpretation. Under minimal assumptions about the alternative hypothesis, I explain why ‘extreme’ means ‘outside the most precise predicted range of experimental outcomes for a given upper bound probability of error’. Doing so, I rebut recent formulations of recurrent criticisms against the frequentist approach in statistics and underscore the importance of random variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Article 07 February 2017

Notes

There are two main schools of thought in frequentist testing: the Fisherian and the Neyman–Pearson. The decision rule presented here is more adequate for a Neyman–Pearson framework. According to the latter, the rejection of H0 implies the acceptance of an alternative hypothesis (H1). The Neyman–Pearson approach accordingly aims to minimise the probability of rejecting H0 when H0 is true (the type-I error) and to minimise the probability of rejecting H1 when H1 is true (the type-II error). Fisher, on the other hand, was against a formal treatment of the type-II error. He also criticised the ‘accept/reject’ procedure and preferred to interpret the p value as providing degrees of evidence against H0. I will alert the reader when the differences can matter.
The null hypothesis is the default hypothesis. It is the one that we accept unless the evidence suggests that we should reject it.
What I mean by ‘entrenched’ is that they are recurrent and appear in high-profile publications.
Elliott Sober coined the expression ‘probabilistic modus tollens’. I shall also explain why he claims that it is invalid.
Ian Hacking traces back the origin of that fallacy to John Arbuthnot (1710) (Hacking 1965, p. 75).
Sober actually discusses an experiment involving a coin. But the point is essentially the same.
Wagenmakers’ article also provides references to other scientific work in which we can find the same argument.
The significance level of a test (\(\alpha \) for short) is the threshold that determines if a p value is low enough to reject H0.
A critical region is a set of extreme outcomes such that we would reject H0 if our test statistic belonged to it. If every possible outcome is as extreme as any other, then the critical region includes (or excludes) all of them, which is unreasonable.
When we are dealing with discrete variables, we talk about the distribution of mass and when we are working with continuous variables, we talk about the distribution of density.
That definition is more precise since there might be more than one variable involved in a statistical test.
I would like to point out that the puzzle is not very convincing. The only difference between the two distributions in Fig. 1 should be a difference of parameters. It is not obvious to see what kind of parameter would create both distributions when we change its value.
Here I make 50 rolls instead of ten because it validates the following chi-square test and it makes every possible vectors very improbable.
A computer simulation of a fair die generated the latter (see Appendix).
We define the distribution with 5 degrees of freedom because once we have counted the observed frequencies for 5 dimensions of our random vector, we can simply deduce the frequency associated with the remaining dimension.

References

Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd.
Google Scholar
Greco, D. (2011). Significance testing in theory and practice. British Journal for the Philosophy of Science, 62, 607–637.
Article Google Scholar
Hacking, I. (1965). Logic of statistical inference. Cambridge: Cambridge University Press.
Google Scholar
Hines, W. W., et al. (2003). Probability and statistics in engineering (4th ed.). New York: Wiley.
Google Scholar
Hogg, R. V., & Craig, A. T. (1995). Introduction to mathematical statistics (5th ed.). Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Jeffreys, H. (1961). Theory of probability. Oxford: Oxford University Press.
Google Scholar
Sober, E. (2008). Evidence and evolution. Cambridge: Cambridge University Press.
Book Google Scholar
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of \(P\) values. Psychonomic Bulletin & Review, 14, 779–804.
Article Google Scholar

Download references

Acknowledgments

I am grateful to the anonymous referees for their very helpful comments. I would also like to thank the participants at the ‘Journée de travail en philosophie analytique’ at Laval University.

Author information

Authors and Affiliations

Département de mathématiques et de statistique, Université Laval, Pavillon Alexandre-Vachon, 1045 Avenue de la Médecine, bureau 1056, Quebec, G1V 0A6, Canada
Guillaume Rochefort-Maranda

Authors

Guillaume Rochefort-Maranda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillaume Rochefort-Maranda.

Appendix

Here is the program that I used to obtain S with R:

library(TeachingDemos)

dice(rolls=50, ndice=1, sides=6, plot.it=TRUE)

Here is the program that I used to perform a ‘Goodness of Fit’ test with R:

vect=c(9,7,7,7,13,7)

vectprob=c(1/6,1/6,1/6,1/6,1/6,1/6)

chisq.test(vect, p=vectprob)

Here is the program that I used to maximise the multinomial mass funtion with R:

a=factorial(50)

b=factorial(8)

c=factorial(9)

\(denom=(c^{\hat{}}2)^{*}(b^{\hat{}}4)\)

d=(a/denom)

\(frac=1/(6^{\hat{}}50)\)

\( d*frac\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rochefort-Maranda, G. On the correct interpretation of p values and the importance of random variables. Synthese 193, 1777–1793 (2016). https://doi.org/10.1007/s11229-015-0807-0

Download citation

Received: 18 August 2014
Accepted: 20 June 2015
Published: 04 July 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11229-015-0807-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the correct interpretation of p values and the importance of random variables

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the correct interpretation of p values and the importance of random variables

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

The p-value Case, a Review of the Debate: Issues and Plausible Remedies

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation