What do significance tests really tell us about the environment?

Abstract

Routine application of significance tests does not extract the maximum information from environmental data and can lead to misleading conclusions. Reasons leading to this are: a significant result can often be reached merely by collecting enough samples; a statistically significant result is not necessarily practically significant; and reports of the presence or absence of statistically significant differences for multiple tests are not comparable unless identical sample sizes are used. These problems are demonstrated by application to pH data for grazed and retired fields, and by discussion of significance tests used in recent US regulations for groundwater quality. The advantages of equivalence tests, where the tester must state the difference of practical difference, are discussed and applied to the field pH problem. We recommend that environmental managers and scientists pay more attention to statistical power and decide on what is a practical difference. Confidence intervals for the size of the differences, accompanied where necessary by equivalence tests, are the preferred means of addressing the question: “is there a difference of practical significance?”

This is a preview of subscription content, access via your institution.

Literature Cited

  1. Bhattacharyya, G. K., and R. A. Johnson. 1977. Statistical concepts and methods. John Wiley & Sons, New York.

    Google Scholar 

  2. Bowker, A. H., and G. J. Lieberman. 1972. Engineering statistics, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.

    Google Scholar 

  3. Calderon, R. L., E. W. Mood, and A. P. Dufour. 1991. Health effects of swimmers and nonpoint sources of contaminated water.International Journal of Environmental Health Research 1:21–31.

    Article  Google Scholar 

  4. Chew, V. 1980. Testing differences among means: Correct interpretation and some alternatives.HortScience 15(4):467–470.

    Google Scholar 

  5. Cochran, W. G., and G. M. Cox. 1957. Experimental designs. Wiley, New York.

    Google Scholar 

  6. Cohen, J. 1977. Statistical power analysis for the behavioral sciences, revision of 1969 ed. Academic Press, New York, 474 pp.

    Google Scholar 

  7. Cohen, J. 1988. Statistical power analysis for the behavioural sciences, 2nd ed. Lawrence Erlbaum Associates, Hillsdale, New Jersey.

    Google Scholar 

  8. Downing, J. A. 1986. Spatial heterogeneity: Evolved behaviour or mathematical artifact?Nature 323:255–257.

    Article  Google Scholar 

  9. Ferguson, T. S. 1967. Mathematical statistics: A decision theoretic approach. Academic Press, New York, sec. 5.3.

    Google Scholar 

  10. Freund, J. E. 1971. Mathematical statistics, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.

    Google Scholar 

  11. Gardner, M. J., and D. G. Altman. 1986. Confidence intervals rather thanP values: Estimation rather than hypothesis testing.British Medical Journal 292:746–750.

    CAS  Article  Google Scholar 

  12. Goldstein, R. 1989. Power and sample size via MS/PC-DOS computers.The American Statistician 43(4):253–260.

    Article  Google Scholar 

  13. Green, R. H. 1989. Power analysis and practical strategies for environmental monitoring.Environmental Research 50:195–205.

    Article  CAS  Google Scholar 

  14. Hines, W. W., and D. C. Montgomery. 1980. Probability and statistics in engineering and management science, 2nd ed. John Wiley & Sons, New York.

    Google Scholar 

  15. Iman, R. I., and Conover, W. J. 1983. A modern approach to statistics. Wiley, New York.

    Google Scholar 

  16. Johnson, N. L., and S. Kotz. 1970. Continuous univariate distributions—2. Houghton-Mifflin, Boston.

    Google Scholar 

  17. Kirk, R. E. (ed.). 1972. Statistical issues: A reader for the behavioural sciences. Brooks/Cole Publishing Co., Monterey, California.

    Google Scholar 

  18. Kraemer, H. C., and M. Paik. 1979. A centralt approximation to the noncentralt distribution.Technometrics 21(3):357–360.

    Article  Google Scholar 

  19. Larsen, R. J., and M. L. Marx. 1986. An introduction to mathematical statistics and its applications, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.

    Google Scholar 

  20. Lehman, E. L. 1959. Testing statistical hypotheses. Wiley, New York, sec. 3.7.

    Google Scholar 

  21. Lettenmaier, D. P. 1976. Detection of trends in water quality data from records with dependent observations.Water Resources Research 12(5):1037–1046.

    Google Scholar 

  22. Lettenmaier, D. P. 1978. Design considerations for ambient stream equality monitoring.Water Resources Bulletin 14(4):884–902. Discussion by Egar, D. L., A. L. Wilson, R. K. Aylesworth, and reply by author in 15(6):1781–1786.

    Google Scholar 

  23. Millard, S. P. 1987. Environmental monitoring, statistics, and the law: Room for improvement.The American Statistician 41(4):249–253.

    Article  Google Scholar 

  24. Montgomery, R. H., and J. C. Loftis. 1987. Applicability of thet-test for detecting trends in water quality variables.Water Resources Bulletin 23(4):653–662. Discussion by Helsel D. R., and R. M. Hirsch, and reply by authors in 24(1):201–207.

    CAS  Google Scholar 

  25. Mood, A. M., and F. A. Graybill. 1963. Introduction to the theory of statistics. McGraw-Hill, New York, sec. 12.5.

    Google Scholar 

  26. Morrison, D. E., and R. E. Henkel. 1970. The significance test controversy. Aldin Publishing Company, Chicago, Illinois.

    Google Scholar 

  27. Nicholls, A. 1987. Personal communication. Ministry of the Environment, Dorset, Ontario, Canada, July 31.

    Google Scholar 

  28. Oakes, M. 1986. Statistical significance: a commentary for the social and behavioural sciences. John Wiley & Sons, New York.

    Google Scholar 

  29. Patel, H. I., and G. D. Gupta. 1984. A problem of equivalence in clinical trials.Biomedical Journal 26(5):471–474.

    Google Scholar 

  30. Pearson, E. S., and H. O. Hartley. 1976a. Biometrika tables for statisticians, vol. 1. Cambridge University Press, London.

    Google Scholar 

  31. Pearson, E. S., and H. O. Hartley. 1976b. Biometrika tables for statisticians, vol. 2. Cambridge University Press, London.

    Google Scholar 

  32. Perry, J. N. 1986. Multiple-comparison procedures: A dissenting view.Journal of Economic Entonology 79:1149–1155.

    CAS  Google Scholar 

  33. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. A. Vetterling. 1986. Numerical recipes: The art of scientific computing. Cambridge University Press, Cambridge, England.

    Google Scholar 

  34. Reckhow, K. H., J. T. Clements, and R. C. Dodd. 1990. Statistical evaluation of mechanistic water-quality models.Journal of Environmental Engineering 116(2):250–268.

    CAS  Google Scholar 

  35. Resnikoff, G. J., and G.J. Lieberman. 1957. Tables of the noncentralt distribution. Stanford University Press, Stanford, California.

    Google Scholar 

  36. Sokal, R. R., and F. J. Rohlf. 1981. Biometry. The principles and practice of statistics in biological research, 2nd ed., W. H. Freeman, New York.

    Google Scholar 

  37. Stevenson, A. H. 1953. Studies of bathing water quality and health.American Journal of Public Health 43:529–538.

    CAS  Article  Google Scholar 

  38. Trautmann, N. M., C. E. McCulloch and R. T. Oglesby. 1982. Statistical determination of data requirements for assessment of lake restoration programs.Canadian Journal of Fish and Aquatic Sciences 39:607–610.

    Article  Google Scholar 

  39. Toft, C. A., and P. J. Shea. 1983. Detecting community-wide patterns: estimating power strengthens statistical significance.The American Naturalist 122(5):618–625.

    Article  Google Scholar 

  40. Tukey, J. W. 1991. The philosophy of multiple comparisons.Statistical Science 6(1):100–116.

    Google Scholar 

  41. USEPA. 1989. Statistical analysis of ground-water monitoring data at RCRA facilities. Interim final guidance, Office of Solid Waste, Waste Management Division, US Environmental Protection Agency, Washington, DC 20460, February.

    Google Scholar 

  42. Ward, R. C., J. C. Loftis, H. P. DeLong and H. F. Bell. 1988. Groundwater quality: A data analysis protocol.Journal of the Water Pollution Control Federation 60(11):291–297.

    Google Scholar 

  43. Ward, R. C., J. C. Loftis, and G. B. McBride. 1990. Design of water quality monitoring systems. Van Nostrand Reinhold, New York, 231 pp.

    Google Scholar 

  44. Welkowitz, J., R. B. Ewen, and J. Cohen. 1982. Introductory statistics for the behavioural sciences, 3rd ed. Academic Press, New York.

    Google Scholar 

  45. Wolfowitz, J. 1967. Remarks on the theory of testing hypotheses.The New York Statistician 18(7):439–441.

    Google Scholar 

  46. Zar, J. H. 1984. Biostatistical analysis, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey, 718 pp.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Graham B. McBride.

Additional information

PhD candidate.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

McBride, G.B., Loftis, J.C. & Adkins, N.C. What do significance tests really tell us about the environment?. Environmental Management 17, 423–432 (1993). https://doi.org/10.1007/BF02394658

Download citation

Key words

  • Hypothesis tests
  • Statistical significance
  • Statistical power
  • Equivalence test