Abstract
Detecting and quantifying environmental thresholds is frequently an important step in understanding ecological responses to environmental stressors. We discuss two statistical issues often encountered in threshold detection and quantification when statistical null hypothesis testing is used as a main analytical tool. The hidden multiple-comparison trap (leading to a much higher risk of a false detection) and Raven’s paradox (rendering a “detection” meaningless) are often obscured when statistical hypothesis testing is used as part of a more elaborate model, especially models based on computer-intensive methods. Using two examples, we show that the hidden multiple-comparison trap can be exposed using computer simulation to estimate the probability of making a false detection; Raven’s paradox can be avoided by clearly stating the null and alternative hypotheses using scientific terms to substantiate that the rejection of the null is equivalent to proving that the alternative of interest is true. The hidden multiple-comparison trap implies that a null hypothesis testing based on a computer-intensive method should be used with caution. The implication of Raven’s paradox requires that we focus on providing evidence supporting the proposed hypothesis or model, rather than seeking evidence against the frequently irrelevant null hypothesis. These two problems, and many others related to null hypothesis testing, suggest that statistical hypothesis testing should be used only as a component of the body of evidence, perhaps, as the devil’s advocate.
Similar content being viewed by others
References
Abelson, R. (1995). Statistics as principled argument. New York: Psychology Press.
Baker, M., & King, R. (2010). A new method for detecting and interpreting biodiversity and ecological community thresholds. Methods in Ecology and Evolution, 1(1), 25–37.
Banerjee, M., & McKeague, I. W. (2007). Confidence sets for split points in decision trees. The Annals of Statistics, 35(2), 543–574.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289–300.
Box, G. (1976). Science and statistics. Journal of the American Statistical Association, 71(356), 791–799.
Bühlmann, P., & Yu, B. (2002). Analyzing bagging. The Annals of Statistics, 30(4), 927–961.
Cao, X., Wang, J., Liao, J., Sun, J., & Huang, Y. (2016). The threshold responses of phytoplankton community to nutrient gradient in a shallow eutrophic Chinese lake. Ecological Indicators, 61, 258–267.
Cuffney, T., & Qian, S. (2013). A critique of the use of indicator species scores for identifying thresholds in species responses. Freshwater Science, 32(2), 471–488.
Cuffney, T., Qian, S., Brightbill, R., May, J., & Waite, I. (2011). Response to king and baker: limitation on threshold detection and characterization of community thresholds. Ecological Applications, 21(7), 2840–2845.
Dufrêne, M., & Legendre, P. (1997). Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs, 67(3), 345–366.
Ellison, A., Gotelli, N., Inouye, B., & Strong, D. (2014). P values, hypothesis testing, and model selection: it’s d ́ej’a vu all over again. Ecology, 95(3), 609–610.
Good, I. (1983). Good thinking: the foundations of probability and its applications. Minneapolis: University of Minnesota Press.
Holm, S. (1979). Simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.
Kahneman, D., & Tversky, A. (1972). Subjective probability: a judgment of representativeness. Cognitive Psychology, 3, 430–454.
McElreath, R. (2016). Statistical rethinking: a Bayesian course with examples in R and Stan (pp. 469). Boca Raton: CRC Press.
Miltner, R. (2010). A method and rationale for deriving nutrient criteria for small rivers and streams in Ohio. Environmental Management, 45, 842–855.
Neyman, J., & Pearson, E. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society A, CCXXXL(702), 289–337.
Qian, S. (2014a). Ecological threshold and environmental management: a note on statistical methods for detecting thresholds. Ecological Indicators, 38, 192–197.
Qian, S. (2014b). Statistics in ecology is for making a “principled” argument. Landscape Ecology, 29(6), 937–939.
Qian, S. (2016). Environmental and ecological statistics with R (2nd ed.) (pp. 560). Boca Raton: Chapman and Hall/CRC Press.
Qian, S., & Cuffney, T. F. (2012). To threshold or not to threshold? That’s the question. Ecological Indicators, 15(1), 1–9.
Qian, S., King, R., & Richardson, C. (2003). Two statistical methods for the detection of environmental thresholds. Ecological Modelling, 166, 87–97.
Stigler, S. (2016). The seven pillars of statistical wisdom. Cambridge: Harvard University Press.
Tukey, J. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114.
Wagenhoff, A., Liess, A., Pastor, A., Clapcott, J., Goodwin, E., & Young, R. (2017). Thresholds in ecosystem structural and functional responses to agricultural stressors can inform limit setting in streams. Freshwater Science, 36(1), 178–194.
Wasserstein, R., & Lazar, N. (2016). The ASA’s statement on p-values: context, process, and purpose. American Statisticians, 70(2), 129–133.
Acknowledgments
We thank Ian Waite, Chad Wagner, and two anonymous reviewers for reviewing an early version of the paper. Their comments and recommendations greatly improved the clarity and readability of the paper. Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the US Government.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Qian, S.S., Cuffney, T.F. The multiple-comparison trap and the Raven’s paradox—perils of using null hypothesis testing in environmental assessment. Environ Monit Assess 190, 409 (2018). https://doi.org/10.1007/s10661-018-6793-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-018-6793-1