Archives of Toxicology

, Volume 87, Issue 11, pp 1901–1910 | Cite as

Statistical evaluation of toxicological assays: Dunnett or Williams test—take both

  • Thomas Jaki
  • Ludwig A. HothornEmail author
Regulatory Toxicology


The US National Toxicology Program recommends the use of the parametric multiple comparison procedures of Dunnett and Williams for the evaluation of repeated toxicity studies. For endpoints where either increasing or decreasing effects are of toxicological relevance, we recommend the use of the two-sided Dunnett test exclusively. For the many other endpoints, where a priori only one direction is of toxicological relevance, however, we recommend the combination of Dunnett and Williams test. In particular, we recommend the so-called Umbrella-protected Williams test which offers insights for all interesting monotone and non-monotone alternatives while only suffering a marginal loss in power compared to the Dunnett test. We illustrate the power difference analytically and compare the approach for different endpoint types using three real data examples to alternative tests available. Nonparametric tests, which are suitable for the evaluation of skewed distributed or scores data, are also considered. Particular attention is given to the different interpretations of the findings revealed by the different test. R programs used for the analyses are provided.


Dunnett test Multiple comparisons R program Repeated toxicity studies Umbrella alternative Williams test 



This work was supported in part by the German Science Foundation grant DfG-HO1687 and the EC FP7 program project ESNATS for the last author (LAH).

Conflict of interest

The authors declare that there is no conflict of interest.


  1. Adaramoye OA, Adesanoye OA, Adewumi OM, Akanni O (2012) Studies on the toxicological effect of nevirapine, an antiretroviral drug, on the liver, kidney and testis of male Wistar rats. Hum Exp Toxicol 31(7):676–685. doi: 10.1177/0960327111424304 PubMedCrossRefGoogle Scholar
  2. Bretz F, Hothorn L (2003) Statistical analysis of monotone or non-monotone dose-response data from in vitro toxicological assays. ATLA-Altern Lab Anim 31(Suppl 1):81–96Google Scholar
  3. Bretz F, Hothorn LA (2002) Detecting dose-response using contrasts: asymptotic power and sample size determination for binomial data. Stat Med 21(22):3325–3335PubMedCrossRefGoogle Scholar
  4. Bretz F, Hothorn T, Westfall P (2002) On multiple comparisons in R. R News 2:14–17Google Scholar
  5. Denton DL, Diamond J, Zheng L (2011) Test of significance in toxicity: a statistical application for assessing whether an effluent or site water is truly toxic. Environ Toxicol Chem 30(5):1117–1126. doi: 10.1002/etc.493 PubMedCrossRefGoogle Scholar
  6. Dilba G, Bretz E, Guiard V, Hothorn LA (2004) Simultaneous confidence intervals for ratios with applications to the comparison of several treatments with a control. Method Inf Med 43(5):465–469Google Scholar
  7. Dilba G, Schaarschmidt F, Hothorn L (2007) Inferences for ratios of normal means. R News 7:20–23Google Scholar
  8. Dunnett CW (1955) A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 50(272):1096–1121CrossRefGoogle Scholar
  9. Genz A, Bretz F (1999) Numerical computation of multivariate t-probabilities with application to power calculation of multiple contrasts. J Stat Comput Simul 63(4):361–378CrossRefGoogle Scholar
  10. Hasler M, Hothorn LA (2008) Multiple contrast tests in the presence of heteroscedasticity. Biom J 50(5):793–800PubMedCrossRefGoogle Scholar
  11. Hasler M, Hothorn LA (2012) A multivariate Williams-type trend procedure. Stat Biopharm Res 4(1):57–65. doi: 10.1080/19466315.2011.633868 CrossRefGoogle Scholar
  12. Herberich E, Hothorn LA (2012) Statistical evaluation of mortality in long-term carcinogenicity bioassays using a Williams-type procedure. Regul Toxicol Pharmacol 64:26–34PubMedCrossRefGoogle Scholar
  13. Hothorn LA (2007) How to deal with multiple treatment or dose groups in randomized clinical trials? Fundam Clin Pharmacol 21(2):137–154PubMedCrossRefGoogle Scholar
  14. Hothorn LA, Djira GD (2011) A ratio-to-control Williams-type test for trend. Pharma Stat 10(4):289–292. doi: 10.1002/pst.464 CrossRefGoogle Scholar
  15. Hothorn LA, Gerhard D (2009) Statistical evaluation of the in vivo micronucleus assay. Arch Toxicol 83(6):625–634PubMedCrossRefGoogle Scholar
  16. Hothorn LA, Hasler M (2008) Proof of hazard and proof of safety in toxicological studies using simultaneous confidence intervals for differences and ratios to control. J Biopharm Stat 18:915–933PubMedCrossRefGoogle Scholar
  17. Hothorn T, Bretz F, Westfall P (2008) Simultaneous inference in general parametric models. Biometrical J 50(3):346–363CrossRefGoogle Scholar
  18. Konietschke F (2013) nparcomp: an R software package for nonparametric multiple comparisons and simultaneous confidence intervals (submitted)Google Scholar
  19. Konietschke F, Hothorn LA (2012) Evaluation of toxicological studies using a non-parametric Shirley-type trend test for comparing several dose levels with a control group. Stat Biopharm Res 4:14–27CrossRefGoogle Scholar
  20. Konietschke F, Hothorn LA (2012) Rank-based multiple test procedures and simultaneous confidence intervals. Electron J Stat 6:738–759. doi: 10.1214/12-EJS691 CrossRefGoogle Scholar
  21. Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat 47(260):583–621. doi: 10.2307/2280779 CrossRefGoogle Scholar
  22. Kuiper RM, Gerhard D, Hothorn LA (2013) Identification of the minimum effective dose for normal distributed endpoints using a model selection approach (submitted)Google Scholar
  23. Manar R, Vasseur P, Bessi H (2012) Chronic toxicity of chlordane to Daphnia magna and Ceriodaphnia dubia: a comparative study. Environ Toxicol 27(2):90–97. doi: 10.1002/tox.20616 PubMedCrossRefGoogle Scholar
  24. R Development Core Team (2013) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria,, ISBN 3-900051-07-0
  25. Schaarschmidt F, Sill M, Hothorn LA (2008) Approximate simultaneous confidence intervals for multiple contrasts of binomial proportions. Biom J 50(5):782–792PubMedCrossRefGoogle Scholar
  26. Schaarschmidt F, Sill M, Hothorn LA (2008) Poly-k-trend tests for survival adjusted analysis of tumor rates formulated as approximate multiple contrast test. J Biopharm Stat 18(5):934–948PubMedCrossRefGoogle Scholar
  27. Schaarschmidt F, Gerhard D, Sill M (2012) MCPAN: multiple comparisons using normal approximation., r package version 1.1-14
  28. Shirley E (1977) A nonparametric equivalent of Williams’ test for contrasting increasing dose levels of a treatment. Biometrics 33(2):386–389PubMedCrossRefGoogle Scholar
  29. Steel RGD (1959) A multiple comparison rank sum test—treatments versus control. Biometrics 15(4):560–572. doi: 10.2307/2527654 CrossRefGoogle Scholar
  30. Swain A, Turton J, Scudamore C, Maguire D, Pereira I, Freitas S, Smyth R, Munday M, Stamp C, Gandhi M, Sondh S, Ashall H, Francis I, Woodfine J, Bowles J, York M (2012) Nephrotoxicity of hexachloro-1:3-butadiene in the male Hanover Wistar rat; correlation of minimal histopathological changes with biomarkers of renal injury. J Appl Toxicol 32(6):417–428. doi: 10.1002/jat.1727 PubMedCrossRefGoogle Scholar
  31. US-NTP (2000) Toxicology and carcinogenesis studies of methyleugenol in f344/n rats and b6c3f1 mice. Technical report 491. Tech. rep., National Toxicology Program. US Department of Health and Human Services: National Institutes of Health, Washington DCGoogle Scholar
  32. US-NTP (2012) Testing information, statistical procedures, expanded overview. Tech. rep., National Toxicology Program, Department of Health and Human Services, Testing Information, Statistical Procedures, Expanded Overview (
  33. Williams DA (1971) A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics 27(1):103–117PubMedCrossRefGoogle Scholar
  34. Williams DA (1972) The comparison of several dose levels with a zero dose control. Biometrics 28(2):519–531PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Medical and Pharmaceutical Statistics Research UnitLancaster UniversityLancasterUK
  2. 2.Institut für BiostatistikLeibniz Universität HannoverHannoverGermany

Personalised recommendations