We thank Drs. Druwe and Burgoon for their clarification of the Bayesian analysis (Druwe and Burgoon 2016) of the data from the studies by Waalkes and his colleagues (Tokar et al. 2011; Waalkes et al. 2014). The critical issue being raised by us, however, is not the statistics but rather the biology. Spontaneous lung tumors in this strain of mice are extremely common, with reports in the literature ranging from a low of 8.8 to a high of 61.1% with an average of 21.8% (Manenti et al. 2003). This appears to be related to a susceptibility gene for lung tumors present in many strains of mice including CD-1 (Manenti et al. 2003). Dr. Waalkes’ laboratory has reported incidences of lung tumors in control mice from their studies ranging from 20 to 42% [20% (Tokar et al. 2012b), 22% (Waalkes et al. 2014), 34% (Tokar et al. 2011), and 42% (Tokar et al. 2012a)]. This variability could be due to a variety of factors including normal variation in incidences as well as differences in diet, water, handling procedures, and others. For example, note that the Waalkes group used a Ralston Purina non-certified rodent diet, 5L79 which has not been analyzed for contaminants (personal communication with Dr. Melanie Hoar, PMI Nutrition International Technical Services) and the sources of its component parts vary over time depending on availability. One of its components, fish meal (Purina diet composition sheet for 5L79), is rich in arsenic. The Waalkes group did not measure arsenic in their diets (Waalkes et al. 2014). They did reference arsenic levels in a Purina certified diet with similar components but that does not ensure that those are the same arsenic levels in the diets used in the reported study.

The tumor incidences in their treated groups at low doses are within the range stated above for their control groups and are not statistically significant at p < 0.01. Unfortunately, Druwe and Burgoon dismissed the use of p < 0.01 without providing a rationale for their action. The basis for this statistical analysis was provided by Haesman (1983, 1984) and Haseman et al. (1984, 1986) who was the chief statistician for the National Toxicology Program bioassay program. He suggested using p < 0.01 for common tumors for the very reason that is evident in the studies by Waalkes and his colleagues, that is, there is wide variability from group to group, making chance variation much more likely. The use of p < 0.01 when dealing with common tumors has been accepted for regulatory purposes by the United States Food and Drug Administration Carcinogenicity Assessment Committee (CAC), in addition to other regulatory agencies around the world. To interpret the results of Waalkes and his colleagues any other way than by chance is to raise serious questions about a very peculiar dose response. To suggest that the higher doses produced a lower incidence of tumors because of toxicity is not consistent with the actual findings, since at even higher doses they had increased incidences of tumors. Furthermore, at all of these doses, there is no evidence of pulmonary toxicity in these mice. In addition, others have not been able to reproduce the lung tumors findings of any of Waalkes’ studies (Garry et al. 2015; Ahlborn et al. 2009; Nohara et al. 2012; Takumi et al. 2015).

Although Druwe and Burgoon might have performed an acceptable Bayesian analysis, it is not biologically relevant to interpretation of the findings. This is the key point. Taking into account statistical analyses and biology, one can only conclude that there is no effect of inorganic arsenic on the incidence of lung tumors in mice at all three low doses reported by Waalkes and his colleagues.