Comment on response from Milesi et al. to 'Perinatal exposure to a glyphosate-based herbicide impairs female reproductive outcomes and induces second-generation adverse effects in Wistar rats'

Milesi et al. (2018, 2019) are to be applauded for their willingness to reanalyse the data used in their 2018 paper to take account of litter effects in their three-generation glyphosate study. As proposed by Plewis (2019), they now use appropriate statistical models (linear mixed models, often known as multilevel or hierarchical linear models) to take account of dependence between offspring from the same dam. This approach is more efficient than using litter as the unit of analysis as suggested by Paumgartten (2019) and others. Note that litter effects are just one way in which observational dependence is generated in rodent feeding experiments; cage effects and repeated measures are others. Unfortunately, the reanalyses of Milesi et al. (2019) do not go far enough in terms of allowing for the hierarchical structure of their data and assertions that their original findings hold up after the reanalyses are not borne out. Consider first the model that lies behind Table 1 and using algebra rather than the confusing R syntax employed by the authors: where yij is the pre-implantation loss % for each F1 rat; i = 1... nj ( max nj = 4) indexes the F1 rats within F0 dams j, j = 1... 21 ; tlj are dummy (0/1) fixed effects for the LD and HD treatments; uj ∼ N (

Dear Editors, Milesi et al. (2018Milesi et al. ( , 2019 are to be applauded for their willingness to reanalyse the data used in their 2018 paper to take account of litter effects in their three-generation glyphosate study. As proposed by Plewis (2019), they now use appropriate statistical models (linear mixed models, often known as multilevel or hierarchical linear models) to take account of dependence between offspring from the same dam. This approach is more efficient than using litter as the unit of analysis as suggested by Paumgartten (2019) and others. Note that litter effects are just one way in which observational dependence is generated in rodent feeding experiments; cage effects and repeated measures are others. Unfortunately, the reanalyses of Milesi et al. (2019) do not go far enough in terms of allowing for the hierarchical structure of their data and assertions that their original findings hold up after the reanalyses are not borne out.
Consider first the model that lies behind Table 1 and using algebra rather than the confusing R syntax employed by the authors: where y ij is the pre-implantation loss % for each F1 rat; i = 1 … n j ( max n j = 4) indexes the F1 rats within F0 dams j, j = 1 … 21 ; t lj are dummy (0/1) fixed effects for the LD and HD treatments; u j ∼ N 0, 2 u is the random effect for F0 with 2 u being the variance between F0 dams and e ij ∼ N 0, 2 e , where 2 e is the variance between F1 rats within the F0 dams.
The null hypothesis is that 1 = 2 = 0 and this should be tested using a single Wald test rather than using two separate tests as Milesi et al. (2019) do. Nevertheless, they do not find a statistically significant difference between the LD group and the controls and this is out of line with Fig. 3d (Milesi et al. 2018(Milesi et al. , p. 2635).
There are a number of points to note about this model and the data used by Milesi et al. (2019) to estimate its parameters. First is that the treatments vary only between F0 dams; all F1 offsprings are exposed to the same treatment. Second, the litter effect can be represented as the proportion of the total variance in the outcome explained by the F0 dams, i.e. 2 u ∕ 2 u + 2 e , often known as the intra-cluster coefficient. Milesi et al. (2019) do not provide this estimate which is regrettable, because it would be a helpful indicator for the design of future studies even though a precise estimate of 2 u might be difficult to obtain with such a small sample (21) of F0 dams. Third, there seems to be unexplained missing data as we would expect a F1 sample size of 28 for each of the three conditions, rather than 25, 20 and 20 as given and these numbers fall further at F2 to 20, 15 and 13. One implication of the way the F1 rats were apparently housed in cages (four per cage; Milesi et al. 2018Milesi et al. , p. 2631) is that the variation in the outcome between F0 dams includes both a genetic effect and an environmental (or cage) effect arising This comment refers to the original article at https ://doi. org/10.1007/s0020 4-018-2236-6.
This comment refers to the response available at https ://doi. org/10.1007/s0020 4-019-02609 -0. from the co-housing. Ideally, these two effects should be separated by, for example, housing only two F1 rats together. Finally, the outcome variable, y ij , is a percentage which might be very small for many F1 rats, suggesting that rather than assuming normality for e ij , a binomial model might be more appropriate.
Turning to Table 2, we find that the authors do not fit the three-level models (F2 nested within F1 nested within F0) that are required, i.e., where y ijk is now the outcome of interest (fetal weight etc.) for each F2 rat; i = 1 … n jk indexes the F2 rats within F1 j, j = 1 … J k and F0 k, k = 1 … K ; t lk are the treatment variables as before; v k ∼ N 0, 2 v is now the random effect for F0 dams; u jk ∼ N 0, 2 u is now the random effect for F1 (within F0) and e ijk ∼ N 0, 2 e with 2 e the variance between F2 rats within both the F0 and F1 dams. Models such as these are easily estimated with the appropriate software and estimates of the variance components can inform future designs.
Even though Milesi et al. (2019) wrongly omit the F0 level from the model underpinning Table 2, their results are still different from those reported in the original paper: there is no longer any treatment effect for placental weight or for foetal length for the low dose. It is plausible to suppose that allowing for the full hierarchical structure would lead to a further reduction in the precision of the estimated treatment effects.
The authors are of course correct to state that no one experiment can be definitive. Arguably, more attention to experimental design is warranted to have appropriately powered multi-generational rodent experiments such as this one and others (e.g. Kubsad et al. 2019). But rather than offer their rats for further experiments, it would be much more helpful to researchers interested in estimating litter effects, if they made their data publicly available.

Compliance with ethical standards
Conflict of interest The author declares that he has no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.