Environmental Management

, Volume 17, Issue 4, pp 423–432

What do significance tests really tell us about the environment?

  • Graham B. McBride
  • Jim C. Loftis
  • Nadine C. Adkins


Routine application of significance tests does not extract the maximum information from environmental data and can lead to misleading conclusions. Reasons leading to this are: a significant result can often be reached merely by collecting enough samples; a statistically significant result is not necessarily practically significant; and reports of the presence or absence of statistically significant differences for multiple tests are not comparable unless identical sample sizes are used. These problems are demonstrated by application to pH data for grazed and retired fields, and by discussion of significance tests used in recent US regulations for groundwater quality. The advantages of equivalence tests, where the tester must state the difference of practical difference, are discussed and applied to the field pH problem. We recommend that environmental managers and scientists pay more attention to statistical power and decide on what is a practical difference. Confidence intervals for the size of the differences, accompanied where necessary by equivalence tests, are the preferred means of addressing the question: “is there a difference of practical significance?”

Key words

Hypothesis tests Statistical significance Statistical power Equivalence test 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literature Cited

  1. Bhattacharyya, G. K., and R. A. Johnson. 1977. Statistical concepts and methods. John Wiley & Sons, New York.Google Scholar
  2. Bowker, A. H., and G. J. Lieberman. 1972. Engineering statistics, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.Google Scholar
  3. Calderon, R. L., E. W. Mood, and A. P. Dufour. 1991. Health effects of swimmers and nonpoint sources of contaminated water.International Journal of Environmental Health Research 1:21–31.CrossRefGoogle Scholar
  4. Chew, V. 1980. Testing differences among means: Correct interpretation and some alternatives.HortScience 15(4):467–470.Google Scholar
  5. Cochran, W. G., and G. M. Cox. 1957. Experimental designs. Wiley, New York.Google Scholar
  6. Cohen, J. 1977. Statistical power analysis for the behavioral sciences, revision of 1969 ed. Academic Press, New York, 474 pp.Google Scholar
  7. Cohen, J. 1988. Statistical power analysis for the behavioural sciences, 2nd ed. Lawrence Erlbaum Associates, Hillsdale, New Jersey.Google Scholar
  8. Downing, J. A. 1986. Spatial heterogeneity: Evolved behaviour or mathematical artifact?Nature 323:255–257.CrossRefGoogle Scholar
  9. Ferguson, T. S. 1967. Mathematical statistics: A decision theoretic approach. Academic Press, New York, sec. 5.3.Google Scholar
  10. Freund, J. E. 1971. Mathematical statistics, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.Google Scholar
  11. Gardner, M. J., and D. G. Altman. 1986. Confidence intervals rather thanP values: Estimation rather than hypothesis testing.British Medical Journal 292:746–750.CrossRefGoogle Scholar
  12. Goldstein, R. 1989. Power and sample size via MS/PC-DOS computers.The American Statistician 43(4):253–260.CrossRefGoogle Scholar
  13. Green, R. H. 1989. Power analysis and practical strategies for environmental monitoring.Environmental Research 50:195–205.CrossRefGoogle Scholar
  14. Hines, W. W., and D. C. Montgomery. 1980. Probability and statistics in engineering and management science, 2nd ed. John Wiley & Sons, New York.Google Scholar
  15. Iman, R. I., and Conover, W. J. 1983. A modern approach to statistics. Wiley, New York.Google Scholar
  16. Johnson, N. L., and S. Kotz. 1970. Continuous univariate distributions—2. Houghton-Mifflin, Boston.Google Scholar
  17. Kirk, R. E. (ed.). 1972. Statistical issues: A reader for the behavioural sciences. Brooks/Cole Publishing Co., Monterey, California.Google Scholar
  18. Kraemer, H. C., and M. Paik. 1979. A centralt approximation to the noncentralt distribution.Technometrics 21(3):357–360.CrossRefGoogle Scholar
  19. Larsen, R. J., and M. L. Marx. 1986. An introduction to mathematical statistics and its applications, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey.Google Scholar
  20. Lehman, E. L. 1959. Testing statistical hypotheses. Wiley, New York, sec. 3.7.Google Scholar
  21. Lettenmaier, D. P. 1976. Detection of trends in water quality data from records with dependent observations.Water Resources Research 12(5):1037–1046.Google Scholar
  22. Lettenmaier, D. P. 1978. Design considerations for ambient stream equality monitoring.Water Resources Bulletin 14(4):884–902. Discussion by Egar, D. L., A. L. Wilson, R. K. Aylesworth, and reply by author in 15(6):1781–1786.Google Scholar
  23. Millard, S. P. 1987. Environmental monitoring, statistics, and the law: Room for improvement.The American Statistician 41(4):249–253.CrossRefGoogle Scholar
  24. Montgomery, R. H., and J. C. Loftis. 1987. Applicability of thet-test for detecting trends in water quality variables.Water Resources Bulletin 23(4):653–662. Discussion by Helsel D. R., and R. M. Hirsch, and reply by authors in 24(1):201–207.Google Scholar
  25. Mood, A. M., and F. A. Graybill. 1963. Introduction to the theory of statistics. McGraw-Hill, New York, sec. 12.5.Google Scholar
  26. Morrison, D. E., and R. E. Henkel. 1970. The significance test controversy. Aldin Publishing Company, Chicago, Illinois.Google Scholar
  27. Nicholls, A. 1987. Personal communication. Ministry of the Environment, Dorset, Ontario, Canada, July 31.Google Scholar
  28. Oakes, M. 1986. Statistical significance: a commentary for the social and behavioural sciences. John Wiley & Sons, New York.Google Scholar
  29. Patel, H. I., and G. D. Gupta. 1984. A problem of equivalence in clinical trials.Biomedical Journal 26(5):471–474.Google Scholar
  30. Pearson, E. S., and H. O. Hartley. 1976a. Biometrika tables for statisticians, vol. 1. Cambridge University Press, London.Google Scholar
  31. Pearson, E. S., and H. O. Hartley. 1976b. Biometrika tables for statisticians, vol. 2. Cambridge University Press, London.Google Scholar
  32. Perry, J. N. 1986. Multiple-comparison procedures: A dissenting view.Journal of Economic Entonology 79:1149–1155.Google Scholar
  33. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. A. Vetterling. 1986. Numerical recipes: The art of scientific computing. Cambridge University Press, Cambridge, England.Google Scholar
  34. Reckhow, K. H., J. T. Clements, and R. C. Dodd. 1990. Statistical evaluation of mechanistic water-quality models.Journal of Environmental Engineering 116(2):250–268.Google Scholar
  35. Resnikoff, G. J., and G.J. Lieberman. 1957. Tables of the noncentralt distribution. Stanford University Press, Stanford, California.Google Scholar
  36. Sokal, R. R., and F. J. Rohlf. 1981. Biometry. The principles and practice of statistics in biological research, 2nd ed., W. H. Freeman, New York.Google Scholar
  37. Stevenson, A. H. 1953. Studies of bathing water quality and health.American Journal of Public Health 43:529–538.CrossRefGoogle Scholar
  38. Trautmann, N. M., C. E. McCulloch and R. T. Oglesby. 1982. Statistical determination of data requirements for assessment of lake restoration programs.Canadian Journal of Fish and Aquatic Sciences 39:607–610.CrossRefGoogle Scholar
  39. Toft, C. A., and P. J. Shea. 1983. Detecting community-wide patterns: estimating power strengthens statistical significance.The American Naturalist 122(5):618–625.CrossRefGoogle Scholar
  40. Tukey, J. W. 1991. The philosophy of multiple comparisons.Statistical Science 6(1):100–116.Google Scholar
  41. USEPA. 1989. Statistical analysis of ground-water monitoring data at RCRA facilities. Interim final guidance, Office of Solid Waste, Waste Management Division, US Environmental Protection Agency, Washington, DC 20460, February.Google Scholar
  42. Ward, R. C., J. C. Loftis, H. P. DeLong and H. F. Bell. 1988. Groundwater quality: A data analysis protocol.Journal of the Water Pollution Control Federation 60(11):291–297.Google Scholar
  43. Ward, R. C., J. C. Loftis, and G. B. McBride. 1990. Design of water quality monitoring systems. Van Nostrand Reinhold, New York, 231 pp.Google Scholar
  44. Welkowitz, J., R. B. Ewen, and J. Cohen. 1982. Introductory statistics for the behavioural sciences, 3rd ed. Academic Press, New York.Google Scholar
  45. Wolfowitz, J. 1967. Remarks on the theory of testing hypotheses.The New York Statistician 18(7):439–441.Google Scholar
  46. Zar, J. H. 1984. Biostatistical analysis, 2nd ed. Prentice-Hall, Englewood Cliffs, New Jersey, 718 pp.Google Scholar

Copyright information

© Springer-Verlag New York Inc. 1993

Authors and Affiliations

  • Graham B. McBride
    • 1
  • Jim C. Loftis
    • 2
  • Nadine C. Adkins
    • 2
  1. 1.Water Quality Centre, Ecosystems DivisionNational Institute of Water and Atmospheric ScienceHamiltonNew Zealand
  2. 2.Agricultural and Chemical Engineering DepartmentColorado State UniversityFort CollinsUSA

Personalised recommendations