Skip to main content

Statistical evaluation of toxicological assays: Dunnett or Williams test—take both

Abstract

The US National Toxicology Program recommends the use of the parametric multiple comparison procedures of Dunnett and Williams for the evaluation of repeated toxicity studies. For endpoints where either increasing or decreasing effects are of toxicological relevance, we recommend the use of the two-sided Dunnett test exclusively. For the many other endpoints, where a priori only one direction is of toxicological relevance, however, we recommend the combination of Dunnett and Williams test. In particular, we recommend the so-called Umbrella-protected Williams test which offers insights for all interesting monotone and non-monotone alternatives while only suffering a marginal loss in power compared to the Dunnett test. We illustrate the power difference analytically and compare the approach for different endpoint types using three real data examples to alternative tests available. Nonparametric tests, which are suitable for the evaluation of skewed distributed or scores data, are also considered. Particular attention is given to the different interpretations of the findings revealed by the different test. R programs used for the analyses are provided.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Adaramoye OA, Adesanoye OA, Adewumi OM, Akanni O (2012) Studies on the toxicological effect of nevirapine, an antiretroviral drug, on the liver, kidney and testis of male Wistar rats. Hum Exp Toxicol 31(7):676–685. doi:10.1177/0960327111424304

    PubMed  Article  Google Scholar 

  2. Bretz F, Hothorn L (2003) Statistical analysis of monotone or non-monotone dose-response data from in vitro toxicological assays. ATLA-Altern Lab Anim 31(Suppl 1):81–96

    CAS  Google Scholar 

  3. Bretz F, Hothorn LA (2002) Detecting dose-response using contrasts: asymptotic power and sample size determination for binomial data. Stat Med 21(22):3325–3335

    PubMed  Article  Google Scholar 

  4. Bretz F, Hothorn T, Westfall P (2002) On multiple comparisons in R. R News 2:14–17

    Google Scholar 

  5. Denton DL, Diamond J, Zheng L (2011) Test of significance in toxicity: a statistical application for assessing whether an effluent or site water is truly toxic. Environ Toxicol Chem 30(5):1117–1126. doi:10.1002/etc.493

    PubMed  Article  CAS  Google Scholar 

  6. Dilba G, Bretz E, Guiard V, Hothorn LA (2004) Simultaneous confidence intervals for ratios with applications to the comparison of several treatments with a control. Method Inf Med 43(5):465–469

    CAS  Google Scholar 

  7. Dilba G, Schaarschmidt F, Hothorn L (2007) Inferences for ratios of normal means. R News 7:20–23

    Google Scholar 

  8. Dunnett CW (1955) A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 50(272):1096–1121

    Article  Google Scholar 

  9. Genz A, Bretz F (1999) Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. J Stat Comput Simul 63(4):361–378

    Article  Google Scholar 

  10. Hasler M, Hothorn LA (2008) Multiple contrast tests in the presence of heteroscedasticity. Biom J 50(5):793–800

    PubMed  Article  Google Scholar 

  11. Hasler M, Hothorn LA (2012) A multivariate Williams-type trend procedure. Stat Biopharm Res 4(1):57–65. doi:10.1080/19466315.2011.633868

    Article  Google Scholar 

  12. Herberich E, Hothorn LA (2012) Statistical evaluation of mortality in long-term carcinogenicity bioassays using a Williams-type procedure. Regul Toxicol Pharmacol 64:26–34

    PubMed  Article  Google Scholar 

  13. Hothorn LA (2007) How to deal with multiple treatment or dose groups in randomized clinical trials? Fundam Clin Pharmacol 21(2):137–154

    PubMed  Article  CAS  Google Scholar 

  14. Hothorn LA, Djira GD (2011) A ratio-to-control Williams-type test for trend. Pharma Stat 10(4):289–292. doi:10.1002/pst.464

    Article  Google Scholar 

  15. Hothorn LA, Gerhard D (2009) Statistical evaluation of the in vivo micronucleus assay. Arch Toxicol 83(6):625–634

    PubMed  Article  CAS  Google Scholar 

  16. Hothorn LA, Hasler M (2008) Proof of hazard and proof of safety in toxicological studies using simultaneous confidence intervals for differences and ratios to control. J Biopharm Stat 18:915–933

    PubMed  Article  Google Scholar 

  17. Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biometrical J 50(3):346–363

    Article  Google Scholar 

  18. Konietschke F (2013) nparcomp: an R software package for nonparametric multiple comparisons and simultaneous confidence intervals (submitted)

  19. Konietschke F, Hothorn LA (2012) Evaluation of toxicological studies using a non-parametric Shirley-type trend test for comparing several dose levels with a control group. Stat Biopharm Res 4:14–27

    Article  Google Scholar 

  20. Konietschke F, Hothorn LA (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 6:738–759. doi:10.1214/12-EJS691

    Article  Google Scholar 

  21. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat 47(260):583–621. doi:10.2307/2280779

    Article  Google Scholar 

  22. Kuiper RM, Gerhard D, Hothorn LA (2013) Identification of the minimum effective dose for normal distributed endpoints using a model selection approach (submitted)

  23. Manar R, Vasseur P, Bessi H (2012) Chronic toxicity of chlordane to Daphnia magna and Ceriodaphnia dubia: a comparative study. Environ Toxicol 27(2):90–97. doi:10.1002/tox.20616

    PubMed  Article  CAS  Google Scholar 

  24. R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0

  25. Schaarschmidt F, Sill M, Hothorn LA (2008) Approximate simultaneous confidence intervals for multiple contrasts of binomial proportions. Biom J 50(5):782–792

    PubMed  Article  Google Scholar 

  26. Schaarschmidt F, Sill M, Hothorn LA (2008) Poly-k-trend tests for survival adjusted analysis of tumor rates formulated as approximate multiple contrast test. J Biopharm Stat 18(5):934–948

    PubMed  Article  Google Scholar 

  27. Schaarschmidt F, Gerhard D, Sill M (2012) MCPAN: multiple comparisons using normal approximation. http://CRAN.R-project.org/package=MCPAN, r package version 1.1-14

  28. Shirley E (1977) A nonparametric equivalent of Williams’ test for contrasting increasing dose levels of a treatment. Biometrics 33(2):386–389

    PubMed  Article  CAS  Google Scholar 

  29. Steel RGD (1959) A multiple comparison rank sum test—treatments versus control. Biometrics 15(4):560–572. doi:10.2307/2527654

    Article  Google Scholar 

  30. Swain A, Turton J, Scudamore C, Maguire D, Pereira I, Freitas S, Smyth R, Munday M, Stamp C, Gandhi M, Sondh S, Ashall H, Francis I, Woodfine J, Bowles J, York M (2012) Nephrotoxicity of hexachloro-1:3-butadiene in the male Hanover Wistar rat; correlation of minimal histopathological changes with biomarkers of renal injury. J Appl Toxicol 32(6):417–428. doi:10.1002/jat.1727

    PubMed  Article  CAS  Google Scholar 

  31. US-NTP (2000) Toxicology and carcinogenesis studies of methyleugenol in f344/n rats and b6c3f1 mice. Technical report 491. Tech. rep., National Toxicology Program. US Department of Health and Human Services: National Institutes of Health, Washington DC

  32. US-NTP (2012) Testing information, statistical procedures, expanded overview. Tech. rep., National Toxicology Program, Department of Health and Human Services, Testing Information, Statistical Procedures, Expanded Overview (http://ntp.niehs.nih.gov/?objectid=72015E2C-BDB7-CEBA-F17F9ACA7AE5346D)

  33. Williams DA (1971) A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics 27(1):103–117

    PubMed  Article  CAS  Google Scholar 

  34. Williams DA (1972) The comparison of several dose levels with a zero dose control. Biometrics 28(2):519–531

    PubMed  Article  CAS  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the German Science Foundation grant DfG-HO1687 and the EC FP7 program project ESNATS for the last author (LAH).

Conflict of interest

The authors declare that there is no conflict of interest.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ludwig A. Hothorn.

Appendix

Appendix

In this section, we provide simple R-code used to analyze the examples. Comments are preceeded by # and are given for every command used.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jaki, T., Hothorn, L.A. Statistical evaluation of toxicological assays: Dunnett or Williams test—take both. Arch Toxicol 87, 1901–1910 (2013). https://doi.org/10.1007/s00204-013-1065-x

Download citation

Keywords

  • Dunnett test
  • Multiple comparisons
  • R program
  • Repeated toxicity studies
  • Umbrella alternative
  • Williams test