The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment

Qian, Song S.; Cuffney, Thomas F.

doi:10.1007/s10661-018-6793-1

The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment

Published: 18 June 2018

Volume 190, article number 409, (2018)
Cite this article

Environmental Monitoring and Assessment Aims and scope Submit manuscript

340 Accesses
2 Citations
3 Altmetric
Explore all metrics

Abstract

Detecting and quantifying environmental thresholds is frequently an important step in understanding ecological responses to environmental stressors. We discuss two statistical issues often encountered in threshold detection and quantification when statistical null hypothesis testing is used as a main analytical tool. The hidden multiple-comparison trap (leading to a much higher risk of a false detection) and Raven’s paradox (rendering a “detection” meaningless) are often obscured when statistical hypothesis testing is used as part of a more elaborate model, especially models based on computer-intensive methods. Using two examples, we show that the hidden multiple-comparison trap can be exposed using computer simulation to estimate the probability of making a false detection; Raven’s paradox can be avoided by clearly stating the null and alternative hypotheses using scientific terms to substantiate that the rejection of the null is equivalent to proving that the alternative of interest is true. The hidden multiple-comparison trap implies that a null hypothesis testing based on a computer-intensive method should be used with caution. The implication of Raven’s paradox requires that we focus on providing evidence supporting the proposed hypothesis or model, rather than seeking evidence against the frequently irrelevant null hypothesis. These two problems, and many others related to null hypothesis testing, suggest that statistical hypothesis testing should be used only as a component of the body of evidence, perhaps, as the devil’s advocate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects

Article 09 April 2020

The Impact of Climate Change on Natural Disasters

References

Abelson, R. (1995). Statistics as principled argument. New York: Psychology Press.
Google Scholar
Baker, M., & King, R. (2010). A new method for detecting and interpreting biodiversity and ecological community thresholds. Methods in Ecology and Evolution, 1(1), 25–37.
Article Google Scholar
Banerjee, M., & McKeague, I. W. (2007). Confidence sets for split points in decision trees. The Annals of Statistics, 35(2), 543–574.
Article Google Scholar
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
Google Scholar
Box, G. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.
Article Google Scholar
Bühlmann, P., & Yu, B. (2002). Analyzing bagging. The Annals of Statistics, 30(4), 927–961.
Article Google Scholar
Cao, X., Wang, J., Liao, J., Sun, J., & Huang, Y. (2016). The threshold responses of phytoplankton community to nutrient gradient in a shallow eutrophic Chinese lake. Ecological Indicators, 61, 258–267.
Article CAS Google Scholar
Cuffney, T., & Qian, S. (2013). A critique of the use of indicator species scores for identifying thresholds in species responses. Freshwater Science, 32(2), 471–488.
Article Google Scholar
Cuffney, T., Qian, S., Brightbill, R., May, J., & Waite, I. (2011). Response to king and baker: limitation on threshold detection and characterization of community thresholds. Ecological Applications, 21(7), 2840–2845.
Article Google Scholar
Dufrêne, M., & Legendre, P. (1997). Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs, 67(3), 345–366.
Ellison, A., Gotelli, N., Inouye, B., & Strong, D. (2014). P values, hypothesis testing, and model selection: it’s d ́ej’a vu all over again. Ecology, 95(3), 609–610.
Article Google Scholar
Good, I. (1983). Good thinking: the foundations of probability and its applications. Minneapolis: University of Minnesota Press.
Google Scholar
Holm, S. (1979). Simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.
Google Scholar
Kahneman, D., & Tversky, A. (1972). Subjective probability: a judgment of representativeness. Cognitive Psychology, 3, 430–454.
Article Google Scholar
McElreath, R. (2016). Statistical rethinking: a Bayesian course with examples in R and Stan (pp. 469). Boca Raton: CRC Press.
Miltner, R. (2010). A method and rationale for deriving nutrient criteria for small rivers and streams in Ohio. Environmental Management, 45, 842–855.
Article Google Scholar
Neyman, J., & Pearson, E. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, CCXXXL(702), 289–337.
Article Google Scholar
Qian, S. (2014a). Ecological threshold and environmental management: a note on statistical methods for detecting thresholds. Ecological Indicators, 38, 192–197.
Article Google Scholar
Qian, S. (2014b). Statistics in ecology is for making a “principled” argument. Landscape Ecology, 29(6), 937–939.
Article Google Scholar
Qian, S. (2016). Environmental and ecological statistics with R (2nd ed.) (pp. 560). Boca Raton: Chapman and Hall/CRC Press.
Qian, S., & Cuffney, T. F. (2012). To threshold or not to threshold? That’s the question. Ecological Indicators, 15(1), 1–9.
Article Google Scholar
Qian, S., King, R., & Richardson, C. (2003). Two statistical methods for the detection of environmental thresholds. Ecological Modelling, 166, 87–97.
Article CAS Google Scholar
Stigler, S. (2016). The seven pillars of statistical wisdom. Cambridge: Harvard University Press.
Book Google Scholar
Tukey, J. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114.
Article CAS Google Scholar
Wagenhoff, A., Liess, A., Pastor, A., Clapcott, J., Goodwin, E., & Young, R. (2017). Thresholds in ecosystem structural and functional responses to agricultural stressors can inform limit setting in streams. Freshwater Science, 36(1), 178–194.
Article Google Scholar
Wasserstein, R., & Lazar, N. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statisticians, 70(2), 129–133.
Article Google Scholar

Download references

Acknowledgments

We thank Ian Waite, Chad Wagner, and two anonymous reviewers for reviewing an early version of the paper. Their comments and recommendations greatly improved the clarity and readability of the paper. Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the US Government.

Author information

Authors and Affiliations

Department of Environmental Sciences, The University of Toledo, Toledo, OH, USA
Song S. Qian
USGS South Atlantic Water Science Center, Raleigh, NC, USA
Thomas F. Cuffney

Authors

Song S. Qian
View author publications
You can also search for this author in PubMed Google Scholar
Thomas F. Cuffney
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Song S. Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, S.S., Cuffney, T.F. The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment. Environ Monit Assess 190, 409 (2018). https://doi.org/10.1007/s10661-018-6793-1

Download citation

Received: 28 October 2017
Accepted: 12 June 2018
Published: 18 June 2018
DOI: https://doi.org/10.1007/s10661-018-6793-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects

The Impact of Climate Change on Natural Disasters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects

The Impact of Climate Change on Natural Disasters

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation