Skip to main content
Log in

The multiple-comparison trap and the Raven’s paradoxperils of using null hypothesis testing in environmental assessment

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Detecting and quantifying environmental thresholds is frequently an important step in understanding ecological responses to environmental stressors. We discuss two statistical issues often encountered in threshold detection and quantification when statistical null hypothesis testing is used as a main analytical tool. The hidden multiple-comparison trap (leading to a much higher risk of a false detection) and Raven’s paradox (rendering a “detection” meaningless) are often obscured when statistical hypothesis testing is used as part of a more elaborate model, especially models based on computer-intensive methods. Using two examples, we show that the hidden multiple-comparison trap can be exposed using computer simulation to estimate the probability of making a false detection; Raven’s paradox can be avoided by clearly stating the null and alternative hypotheses using scientific terms to substantiate that the rejection of the null is equivalent to proving that the alternative of interest is true. The hidden multiple-comparison trap implies that a null hypothesis testing based on a computer-intensive method should be used with caution. The implication of Raven’s paradox requires that we focus on providing evidence supporting the proposed hypothesis or model, rather than seeking evidence against the frequently irrelevant null hypothesis. These two problems, and many others related to null hypothesis testing, suggest that statistical hypothesis testing should be used only as a component of the body of evidence, perhaps, as the devil’s advocate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Abelson, R. (1995). Statistics as principled argument. New York: Psychology Press.

    Google Scholar 

  • Baker, M., & King, R. (2010). A new method for detecting and interpreting biodiversity and ecological community thresholds. Methods in Ecology and Evolution, 1(1), 25–37.

    Article  Google Scholar 

  • Banerjee, M., & McKeague, I. W. (2007). Confidence sets for split points in decision trees. The Annals of Statistics, 35(2), 543–574.

    Article  Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.

    Google Scholar 

  • Box, G. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.

    Article  Google Scholar 

  • Bühlmann, P., & Yu, B. (2002). Analyzing bagging. The Annals of Statistics, 30(4), 927–961.

    Article  Google Scholar 

  • Cao, X., Wang, J., Liao, J., Sun, J., & Huang, Y. (2016). The threshold responses of phytoplankton community to nutrient gradient in a shallow eutrophic Chinese lake. Ecological Indicators, 61, 258–267.

    Article  CAS  Google Scholar 

  • Cuffney, T., & Qian, S. (2013). A critique of the use of indicator species scores for identifying thresholds in species responses. Freshwater Science, 32(2), 471–488.

    Article  Google Scholar 

  • Cuffney, T., Qian, S., Brightbill, R., May, J., & Waite, I. (2011). Response to king and baker: limitation on threshold detection and characterization of community thresholds. Ecological Applications, 21(7), 2840–2845.

    Article  Google Scholar 

  • Dufrêne, M., & Legendre, P. (1997). Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs, 67(3), 345–366.

  • Ellison, A., Gotelli, N., Inouye, B., & Strong, D. (2014). P values, hypothesis testing, and model selection: it’s d ́ej’a vu all over again. Ecology, 95(3), 609–610.

    Article  Google Scholar 

  • Good, I. (1983). Good thinking: the foundations of probability and its applications. Minneapolis: University of Minnesota Press.

    Google Scholar 

  • Holm, S. (1979). Simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.

    Google Scholar 

  • Kahneman, D., & Tversky, A. (1972). Subjective probability: a judgment of representativeness. Cognitive Psychology, 3, 430–454.

    Article  Google Scholar 

  • McElreath, R. (2016). Statistical rethinking: a Bayesian course with examples in R and Stan (pp. 469). Boca Raton: CRC Press.

  • Miltner, R. (2010). A method and rationale for deriving nutrient criteria for small rivers and streams in Ohio. Environmental Management, 45, 842–855.

    Article  Google Scholar 

  • Neyman, J., & Pearson, E. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, CCXXXL(702), 289–337.

    Article  Google Scholar 

  • Qian, S. (2014a). Ecological threshold and environmental management: a note on statistical methods for detecting thresholds. Ecological Indicators, 38, 192–197.

    Article  Google Scholar 

  • Qian, S. (2014b). Statistics in ecology is for making a “principled” argument. Landscape Ecology, 29(6), 937–939.

    Article  Google Scholar 

  • Qian, S. (2016). Environmental and ecological statistics with R (2nd ed.) (pp. 560). Boca Raton: Chapman and Hall/CRC Press.

  • Qian, S., & Cuffney, T. F. (2012). To threshold or not to threshold? That’s the question. Ecological Indicators, 15(1), 1–9.

    Article  Google Scholar 

  • Qian, S., King, R., & Richardson, C. (2003). Two statistical methods for the detection of environmental thresholds. Ecological Modelling, 166, 87–97.

    Article  CAS  Google Scholar 

  • Stigler, S. (2016). The seven pillars of statistical wisdom. Cambridge: Harvard University Press.

    Book  Google Scholar 

  • Tukey, J. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114.

    Article  CAS  Google Scholar 

  • Wagenhoff, A., Liess, A., Pastor, A., Clapcott, J., Goodwin, E., & Young, R. (2017). Thresholds in ecosystem structural and functional responses to agricultural stressors can inform limit setting in streams. Freshwater Science, 36(1), 178–194.

    Article  Google Scholar 

  • Wasserstein, R., & Lazar, N. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statisticians, 70(2), 129–133.

    Article  Google Scholar 

Download references

Acknowledgments

We thank Ian Waite, Chad Wagner, and two anonymous reviewers for reviewing an early version of the paper. Their comments and recommendations greatly improved the clarity and readability of the paper. Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Song S. Qian.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, S.S., Cuffney, T.F. The multiple-comparison trap and the Raven’s paradoxperils of using null hypothesis testing in environmental assessment. Environ Monit Assess 190, 409 (2018). https://doi.org/10.1007/s10661-018-6793-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-018-6793-1

Keywords

Navigation