Environmental Science and Pollution Research

, Volume 22, Issue 18, pp 13990–13999 | Cite as

Ecotoxicology is not normal

A comparison of statistical approaches for analysis of count and proportion data in ecotoxicology
Research Article

Abstract

Ecotoxicologists often encounter count and proportion data that are rarely normally distributed. To meet the assumptions of the linear model, such data are usually transformed or non-parametric methods are used if the transformed data still violate the assumptions. Generalized linear models (GLMs) allow to directly model such data, without the need for transformation. Here, we compare the performance of two parametric methods, i.e., (1) the linear model (assuming normality of transformed data), (2) GLMs (assuming a Poisson, negative binomial, or binomially distributed response), and (3) non-parametric methods. We simulated typical data mimicking low replicated ecotoxicological experiments of two common data types (counts and proportions from counts). We compared the performance of the different methods in terms of statistical power and Type I error for detecting a general treatment effect and determining the lowest observed effect concentration (LOEC). In addition, we outlined differences on a real-world mesocosm data set. For count data, we found that the quasi-Poisson model yielded the highest power. The negative binomial GLM resulted in increased Type I errors, which could be fixed using the parametric bootstrap. For proportions, binomial GLMs performed better than the linear model, except to determine LOEC at extremely low sample sizes. The compared non-parametric methods had generally lower power. We recommend that counts in one-factorial experiments should be analyzed using quasi-Poisson models and proportions from counts by binomial GLMs. These methods should become standard in ecotoxicology.

Keywords

Generalized linear models Transformations Simulation Power Type I error 

Supplementary material

11356_2015_4579_MOESM1_ESM.pdf (93 kb)
(PDF 93.2 KB)
11356_2015_4579_MOESM2_ESM.pdf (174 kb)
(PDF 173 KB)

References

  1. Anderson MJ, Crist TO, Chase JM, Vellend M, Inouye BD, Freestone AL, Sanders NJ, Cornell HV, Comita LS, Davies KF, Harrison SP, Kraft NJB, Stegen JC, Swenson NG (2011) Navigating the multiple meanings of beta diversity: a roadmap for the practicing ecologist. Ecol Lett 14(1):19–28CrossRefGoogle Scholar
  2. Bolker B, Brooks M, Clark C, Geange S, Poulsen J, Stevens M, White J (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol 24(3):127–135CrossRefGoogle Scholar
  3. ter Braak CJF, Šmilauer P (2014) Topics in constrained and unconstrained ordination. Plant Ecol. doi:10.1007/s11258-014-0356-5
  4. van den Brink PJ, Hattink J, Brock TCM, Bransen F, van Donk E (2000) Impact of the fungicide carbendazim in freshwater microcosms. II. Zooplankton, primary producers and final conclusions. Aquat Toxicol 48 (2-3):251–264CrossRefGoogle Scholar
  5. Brock TCM, Hammers-Wirtz M, Hommen U, Preuss TG, Ratte HT, Roessink I, Strauss T, Van den Brink PJ (2015) The minimum detectable difference (MDD) and the interpretation of treatment-related effects of pesticides in experimental ecosystems. Environ Sci Pollut Res 22(2):1160–1174CrossRefGoogle Scholar
  6. Dunnett CW (1955) A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 50(272):1096–1121CrossRefGoogle Scholar
  7. EFSA PPR (2013) Guidance on tiered risk assessment for plant protection products for aquatic organisms in edge-of-field surface waters. EFSA J 11(7):3290Google Scholar
  8. EPA (2002) Methods for Measuring the Acute Toxicity of Effluents and Receiving Waters to Freshwater and Marine Organisms. U.S. Environmental Protection AgencyGoogle Scholar
  9. Faraway JJ (2006) Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models. Chapman & Hall, Boca RatonGoogle Scholar
  10. Gelman A, Stern H (2006) The difference between “significant” and “not significant” is not itself statistically significant. Am Stat 60(4):328–331CrossRefGoogle Scholar
  11. Hauck WW, Donner A (1977) Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc 72(360):851CrossRefGoogle Scholar
  12. Hilbe JM (2014) Modeling Count Data. Cambridge University Press, New YorkCrossRefGoogle Scholar
  13. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6(2):65–70Google Scholar
  14. Hothorn LA (2014) Statistical evaluation of toxicological bioassays—a review. Toxicol Res 3(6):418–432CrossRefGoogle Scholar
  15. Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biom J 50(3):346–363CrossRefGoogle Scholar
  16. Ives AR (2015) For testing the significance of regression coefficients, go ahead and log-transform count data. Methods Ecol Evol. doi:10.1111/2041-210X.12386
  17. Jaki T, Hothorn LA (2013) Statistical evaluation of toxicological assays: Dunnett or Williams test—take both. Arch Toxicol 87(11):1901–1910CrossRefGoogle Scholar
  18. Johnson PCD, Barry SJE, Ferguson HM, Müller P (2015) Power analysis for generalized linear mixed models in ecology and evolution. Methods Ecol Evol 6(2):133–142CrossRefGoogle Scholar
  19. Konietschke F, Hothorn LA, Brunner E (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 6:738–759CrossRefGoogle Scholar
  20. Kuiper RM, Gerhard D, Hothorn LA (2014) Identification of the minimum effective dose for normally distributed endpoints using a model selection approach. Stat Biopharmaceutical Res 6(1):55–66CrossRefGoogle Scholar
  21. Landis WG, Chapman PM (2011) Well past time to stop using NOELs and LOELs. Integr Environ Assess Manag 7(4):vi–viiiCrossRefGoogle Scholar
  22. Laskowski R (1995) Some good reasons to ban the use of NOEC, LOEC and related concepts in ecotoxicology. Oikos 73(1):140–144CrossRefGoogle Scholar
  23. Nelder JA, Wedderburn RWM (1972) Generalized linear models. J R Stat Soc Ser A (General) 135(3):370–384CrossRefGoogle Scholar
  24. Newman MC (1993) Regression analysis of log-transformed data: Statistical bias and its correction. Environ Toxicol Chem 12(6):1129–1133CrossRefGoogle Scholar
  25. Newman MC (2012) Quantitative ecotoxicology. Taylor & Francis, Boca RatonGoogle Scholar
  26. OECD (2006) Current Approaches in the Statistical Analysis of Ecotoxicity Data: A Guidance to Application. No. 54. In: Series on Testing and Assessment. OECD, ParisGoogle Scholar
  27. O’Hara RB, Kotze DJ (2010) Do not log-transform count data. Methods Ecol Evol 1(2):118–122CrossRefGoogle Scholar
  28. Quinn GP, Keough MJ (2009) Experimental design and data analysis for biologists. Cambridge University Press, CambridgeGoogle Scholar
  29. R Core Team (2014) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/
  30. Rothery P (1988) A cautionary note on data transformation: bias in back-transformed means. Bird Study 35(3):219–221CrossRefGoogle Scholar
  31. Sanderson H (2002) Pesticide studies. Environ Sci Pollut Res 9(6):429–435CrossRefGoogle Scholar
  32. Stroup WW (2014) Rethinking the analysis of non-normal data in plant and soil science. Agron J. doi:10.2134/agronj2013.0342
  33. Szöcs E, Brink PJVd, Lagadic L, Caquet T, Roucaute M, Auber A, Bayona Y, Liess M, Ebke P, Ippolito A, Braak CJFt, Brock TCM, Schäfer RB (2015) Analysing chemical-induced changes in macroinvertebrate communities in aquatic mesocosm experiments: a comparison of methods. Ecotoxicology 24(4):760–769CrossRefGoogle Scholar
  34. Venables WN, Ripley BD (2002) Modern Applied Statistics with S, 4th edn. Springer, New YorkCrossRefGoogle Scholar
  35. Ver Hoef JM, Boveng PL (2007) Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data? Ecology 88(11):2766–2772CrossRefGoogle Scholar
  36. Wang M, Riffel M (2011) Making the right conclusions based on wrong results and small sample sizes: interpretation of statistical tests in ecotoxicology. Ecotoxicol Environ Saf 74(4): 684–92CrossRefGoogle Scholar
  37. Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16(3):275–289CrossRefGoogle Scholar
  38. Warton DI, Hui FKC (2011) The arcsine is asinine: the analysis of proportions in ecology. Ecology 92(1):3–10CrossRefGoogle Scholar
  39. Warton DI, Wright ST, Wang Y (2012) Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution 3(1):89–101CrossRefGoogle Scholar
  40. Weber CI, Peltier WH, Norbert-King TJ, Horning WB, Kessler F, Menkedick JR, Neiheisel TW, Lewis PA, Klemm DJ, Pickering Q, Robinson EL, Lazorchak JM, Wymer L, Freyberg RW (1989) Short-term methods for estimating the chronic toxicity of effluents and receiving waters to fresh- water organisms. Tech. Rep. EPA/600/4–89/001, Environmental Protection Agency, Cincinnati, OH: Environmental Monitoring Systems LaboratoryGoogle Scholar
  41. Wilks SS (1938) The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann Math Stat 9(1):60–62CrossRefGoogle Scholar
  42. Williams DA (1972) The comparison of several dose levels with a zero dose control. Biometrics:519–531Google Scholar
  43. Williams DA (1982) Extra-Binomial variation in logistic linear models. J R Stat Soc Ser C (Appl Stat) 31(2):144–148. doi:10.2307/2347977. http://www.jstor.org/stable/2347977
  44. Zuur AF (2013) A beginner’s guide to GLM and GLMM with R: a frequentist and Bayesian perspctive for ecologists. Highland Statistics, NewburghGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Institute for Environmental SciencesUniversity of Koblenz-LandauLandauGermany

Personalised recommendations