General criticisms
Most comments included general accusations about the quality of the work, such as:
“The study appeared to sweep aside all known benchmarks of scientific good practice and, more importantly, to ignore the minimal standards of scientific and ethical conduct…” (Arjo et al. [11]), and
“Throughout their manuscript, Séralini et al. ignore clear indications that there is something fundamentally wrong in their experimental design” (Grunewald [13]).
Such judgments were widespread. Does the evidence justify them?
Many challenged the study design for not following OECD or EPA protocols for safety testing for tumours. The experiments, however, were not safety tests. A safety test requires a large enough trial not to miss a rare occurrence. No one would be satisfied by a safety test with only 10 animals. These particular OECD and EPA protocols are therefore not relevant. It must also be stressed that differences in experimental protocol do not necessarily imply that one or another is flawed, as Meyer and Hillbeck [7] pointed out.
Arjo et al. identify the basis of poor design thus: “The key flaw in the paper is the poor study design, which is based on the discredited hypothesis that inserting a gene into the genome of a crop species is inherently more likely to produce unintended, unexpected and hazardous characteristics than would be the case using conventional breeding.”
There are several different fundamental errors in this statement, which assumes that an inserted gene can have no adverse action other than the intended action. This is a commonly held assumption and is related to the concept of “substantial equivalence”, but it is factually wrong; the hypothesis in the quote is not discredited by the majority of scientists. This is considered more fully in the “Discussion” section.
A multiplicity of errors
The critics point out a diversity of errors within the Seralini et al. paper. The VIB report concludes with a bullet-pointed list of identified errors including the following:
“There is no mention of how the nutritional balance was kept in the various diets. If you replace 22 or 33 % of the food with maize, you change the percentage or the amount of carbohydrates, protein, fats, fiber, vitamins and so on, in the diet.”
“There is no indication whatsoever of whether the genetically modified maize originating from the land that was sprayed with Roundup contained traces of the degradation products of Roundup and, if so, how much there was, and how this compared to the amount of Roundup that was fed to all the animals”.
Arjo et al. listed “many major critical errors” in their Table 3. For example, under the heading “False or unsubstantiated statements”, they and others give several instances of inadequate reporting of results and of tests that were made. They claim that there is insufficient detail about the feed materials; as in the VIB quote above, and by Wager [15]:
“a robust experiment would also include a random, unrelated diet, e.g., one derived from organic maize)….. Critical details on how much food was consumed by each rat are absent, making it impossible to establish any dose/response relationship.”
These comments fail to appreciate that all diets, including the controls, were equally balanced with 33 % maize (GM or not); sources and details of the feed materials are described by Seralini et al., but more detail would have been useful. The GM maize grown with Roundup was as in usual agricultural practice, and maize was tested for contaminating pesticides. It is correct that total feed intake and rat weights were not reported, but they were monitored. Again, more data would be useful.
A common complaint was that neither the concentration and stability of glyphosate in the drinking water nor the amounts consumed by rats were given: Roundup was given ad lib at stated concentrations. Since ad lib provision represents a realistic regime, as it would be for farm animals and for humans, the amounts consumed by rats are not relevant. The use of Roundup is considered separately below.
Many complained that “Control data [was] not always included in the limited cases where data are presented to support the conclusions”. With few exceptions, this is incorrect. All tables and figures other than the photographs of rats with tumours (Figure 3, Seralini et al.) include the relevant control data; (note that the important Figure 2 is ignored by all, as described below).
The above are typical examples of criticism that arises from misreading or misunderstanding the paper and selection of mostly minor or irrelevant matters. In many cases, further information would indeed have enriched the paper and might have been provided in a supplementary paper online.
I conclude that these critical comments could have been addressed in a workshop presentation of the results and most complaints could be answered by provision of more data. None are sufficient, however, to invalidate the data and results of Seralini et al.
Most critics comment on the number of deaths in control and experimental groups, for example pointing out that male rats that were given Roundup at the highest concentration actually lived a little longer than the controls. All of these comments, although quoting from Seralini et al.’s Figure 1 which gives the time course of occurrence of deaths, deal mainly with the totals of deaths at the end of the experiment. This, notably, is the fault in Arjo et al.’s Table 1 [11] which by not mentioning the timing is a misrepresentation. Similarly, the VIB paper [12] quotes only the final end-of-experiment histograms from Figure 1, omitting the timing. While it does seem extraordinary that at the highest doses of Roundup male rats live a few days longer than controls, no significance can be attached to such small differences. (It is just conceivable that the highest concentration of Roundup offers some protection, in a non-monotonic manner; see [16].) Since Seralini et al. themselves draw no essential conclusions from Figure 1, the matter does not justify further concern.
The incidence of tumours
Figure 2 in Seralini et al. shows that treated rats acquired tumours earlier than did the controls. It was Figure 2 with its accompanying photographs of tumours that became the subject of widespread concern, media coverage and controversy. The two most frequent and substantive criticisms of the paper were that the numbers of rats were too small with too few controls and that Sprague Dawley (SD) rats suffer spontaneous tumours anyway, making the conclusions of Seralini et al. meaningless. EFSA, for example, states: [14] “….the observed frequency of tumours is influenced by the natural occurence of tumours typical of this strain, regardless of any treatment. This is neither taken into account nor discussed in the Séralini et al. (2012) publication.”
Yet most of the critics in the several communications cited above never mention Seralini et al.’s Figure 2 and do not discuss or criticise its findings. One critic who did mention Figure 2 [17] complained: “There are other confusing sentences that also reflect poor editorial work, for example “Up to 14 months, no animals in the control groups showed any signs of tumors whilst 10–30 % of treated females per group developed tumors, with the exception of one group (33 % GMO + R).”….I was left wondering if anyone had really read the paper carefully.” Careful reading shows that the quoted sentence is factually correct. Another also found the paper difficult to interpret [18] and questioned whether control rats got tumours because of the absence of photographs of control tumours. This critic concluded: “Numerically, we cannot tell, because they are absent also from Figure 2.” Every section of Figure 2, however, shows a control curve. The critic’s error could be a misreading, perhaps the result of the too rapid reply required by the Science Media Centre (the same critic also misread Table 2, which included the 10 controls, not fewer; the wording here is clear but can be misread) VIB, [12] without mention or quote from Figure 2 concludes: “Only once you have increased the size of the groups significantly will the chances that you have divided the animals incorrectly drop considerably. This is the second fundamental flaw in the research design used by Séralini et al. They use far too few animals per treated group.”Their criticism is considered further below under “Statistical issues” section.
Since Figure 2 is of core importance for the whole paper, I offer my interpretation as follows:
Simple inspection of Figure 2 shows that the curves of appearance of tumours in treated rats rise ahead of those representing the control tumours. Incidence of tumours in control rats are generally consistent with the published literature (such as the five references quoted by Arjo et al., see below) and serve to check against unforeseen local anomalies. It is important to note that there is considerable variation in the literature about the incidence of tumours in control SD rats. Many studies concluding that GM materials are safe have used such variable historical controls. The variation serves to increase the background “noise” and this tends to hide small indications of disease, as reviewed in [19] and explored by Meyer and Hillbeck [7].
As rats age and their pathology increases, so the difference between treated and control rats decreases; in other words, the signal-to-noise ratio of experimental to control reduces. Yet most critics considered only the incidence of tumours at the end of the experiments, that is, when the signal-to-noise ratio must have become low. For example, 1 of the 17 critics in [10] showed ingeniously that the figures for tumour incidence in Seralini et al.’s Table 2 looked like random numbers. The figures he quoted, however, were also end-of-experiment data. Nevertheless, the total of pathologies in the treated animals remained always greater than in the controls even this Table 2. Critics (including VIB) fail to discuss the timing of appearance of tumours in Figure 2. It is the timing of appearance of tumours that, to me, is significant and should not be overlooked.
In Figure 2, the tumour incidence in treated and control rats can be seen as the areas enclosed between the respective curves and the x-axis for any chosen time period. The units for these areas can be defined as “accumulated tumour days”. These can be easily summed from the graphs in Figure 2 in order to put tangible figures to otherwise intuitive visualisation. One can then express the results as the ratio of “tumour days” in treated rats to “tumour days” in control rats for any specified period. For example, I added the tumour days in Figure 2 “FEMALES GMO” (top right) by summing together the tumours as they appeared under all three feeding doses of 11, 22, and 33 % GMO (divided by 3 to apply per 10 rats) neglecting any differences between doses. In this way, 30 treated rats are compared to 10 controls. The ratios of tumour days of treated to control rats are shown in Table 1.
The change in the ratio with time provides a measure of internal control. In this example, during the period of 100–650 days, there are nearly 2 ½ times as many tumour days in the treated rats as in the control rats, while in the last 100 of these (550–650 days), the ratio is only 1.92. This suggests that significant results are limited to about 550 days. No statistical analysis could deny that the large ratio of 3.04 up to 550 days is significant.
However, results presented in this way could be subjected to serious errors at the early stages when very few tumours occur. For this reason, the total numbers of tumour days are indicated. (see also the VIB criticism under the “Statistical issues” section below.)
One can also express the results as time of appearance of the first five tumours per 10 rats (i.e., 50 % occurrence). This is 710 days for controls, 470 to 630 for GMO treated and 470 to 530 for Roundup treated. This provides a convenient way to summarise the findings, akin to an “LD50”.
The above analysis is consistent with the authors’ conclusions that the treated rats developed tumours significantly sooner than did the control rats. The results are also consistent with the critics’ arguments and with the literature, which shows that SD rats get tumours anyway, but only later in life.
I conclude that, contrary to the criticisms, SD rats are the appropriate animal for these experiments precisely because of their susceptibility to late tumorogenesis, in agreement with Meyer and Hillbeck [7]. This idea is taken further in the “Discussion” section.
In summary, Figure 2, which with its accompanying photographs gave rise to the most publicity (yet has been the most ignored by all the critics), acts as the crucial and informative data about the onset of tumours. The figure shows clearly that there is an effect by a GM crop as well as by its associated herbicide. There is no reason to extrapolate these findings, however, beyond the experimental results. The conclusions of Seralini et al. need not apply to other GM systems, as seems to have been assumed by many anti-GMO interests, as well as by most of the critics.
Statistical issues
Among the many criticisms about the statistics used, VIB elegantly describe the possible errors from using too few controls: “The chances that we will have made groups of 10 among the females in which in the one instance two and in the other instance nine animals will spontaneously develop tumors, or four and eight instead of six, are extremely good” “This is a fundamental error in the research design: there are too few control groups in relation to the treated groups.”
It is of course possible that of the 10 groups of 20 rats (10 for each sex) tested, 1 group has a lower incidence of spontaneous tumours than the rest. Such a group would be an outlier in the distribution. An outlier like this could happen to be chosen by the researchers as a control. The probability of a whole group of 10 rats showing a lower incidence of tumours is low, and there is a 1 in 10 probability that such a group would be chosen as control. In this unlikely case, Seralini et al.’s conclusions would indeed be rendered invalid. This argument seems to answer: “…how, in such a long-term experiment, can you differentiate between tumors that occur spontaneously from those that occur as a consequence of eating genetically modified maize, or drinking Roundup.”
The analysis I give probably answers the above by considering the time course of appearance of tumours. Neglect of the timing of tumour appearance was a universal error of the critics.
Seralini et al.’s Figure 5 shows the SEMs of the numerous parameters measured at 15 months, (450 days) arranged in order from the highest increase to the highest decrease of a parameter, over control values. It would be difficult to argue that these are without significance. Any experimental biologist, searching for effects and finding results such as Seralini et al. did in Figures 2 and 5, would assume that they had found something sufficiently meaningful to justify further experimentation. Imagine that if the experiments had given the opposite results and showed that the onset of tumours was delayed by the treatments, the same critics may have (sceptically) accepted those findings as hints towards finding ways to delay human tumourogenesis!
More animals and statistical analysis might have allowed one to determine a greater or lesser degree of confidence; but certainty was not claimed by Seralini et al. and nor was it required, since theirs was not a test for safety. The results in the (largely ignored) Figure 2 leave us with the likelihood that the effects are real and the probability that they are of significance.
Scientific common sense must be the initial way to assess this experimental data; statistical analysis follows when needed, but usually it is better to repeat the experiment with appropriate improvements. In this case, with hindsight, two sets of 10 control rats would have enhanced the evidence. Meanwhile, Seralini et al.’s results remain significant, even if the level of confidence is below 95 %.
Roundup and glyphosate
There was much criticism that glyphosate concentrations and levels of its consumption were not given by the researchers and that only a whole formulation of Roundup was described and used. Arjo et al. took this further suggesting that Seralini et al. “justify the use of commercial formulations rather than the pure active ingredient on the tenuous basis that environmental exposure is to the whole product. The weakness of this argument is apparent from the differing behavior of formulations and their ingredients in terms of environmental stability, mobility in ground water, …” [11].
Roundup is the commercial material being tested; one cannot know a priori whether any toxicity is due to glyphosate or an adjuvant or to their combined actions. This is a common error that assumes that because glyphosate is the active herbicide principle it is also the material to be tested for other activities; this puts assumption before experiment. There are many reports about the toxicity of glyphosate itself (e.g., [19, 20] and of its untoward effects on the microbial community in soils [21].
Many others assume that there should be a dose-response relationship, for example: “…highest incidence of tumors supposedly found in the animals administered the lowest dose. These conclusions are not only implausible, but they are entirely discordant with the body of literature already available for glyphosate.” [11]. However, there is a growing body of literature about damage caused by glyphosate, such as [20] quoted above; also, non-monotonic dose-response curves are common in biology [16].
I conclude that the criticisms about the use of Roundup by Seralini et al. were not relevant to the experiment, which was concerned with the material used in agricultural practice, not with glyphosate itself.
The 90-day protocol
The conventional 90-day test period has become the norm, and most deny the need for longer term experiments: Arjo et al. defend the 90-day protocol thus: “This particular rat strain exhibits a 45–80 % incidence of spontaneous tumors in the absence of any exogenous factor, depending on the diet and whether or not fed ad libitum (Prejean et al. 1973; Davis et al. 1956; Keenan et al. 1996; Suzuki et al. 1979; Thompson et al. 1961). The rats normally begin developing these spontaneous tumors after 90 days, and are used in shorter-term experiments to ascertain tumorigenicity,....”
None of the above references given in the Arjo et al. paper anywhere mention 90 days, and none of them give a time scale for the appearance of tumours. Each states, however, that rats develop tumours only in older age, such as “540 days” (Davis et al. and Prejean et al.) or older than “18 months” (Suzucki et al.) and, with a complex chow formulation, “in female rats ranging in age from 11 to 30 months (21.78 average)” (Thompson et al.); Keenan et al. report tumour incidence later in life, reduced under a restricted diet, a similar finding to the older one of Davis.
One has to conclude that the above quoted paragraph is erroneous reporting of all five references. The case for longer term experiments remains. One might note that the protocol of 90 days became the norm after it was set arbitrarily by Monsanto in its applications and has been since adopted by others without question or any apparent justification.
Conclusions from the critical evidence
The criticisms described above contain several serious flaws and omissions, such as the following:
-
Misreporting (e.g., of the five references and their association with 90-day trials);
-
Misunderstanding (e.g., the omission of Figure 2 and the distortion of the use of Roundup);
-
Unwarranted assumptions (e.g., putting presumption before experiment, such as expecting a dose-response effect and the discrediting a common hypothesis (see below);
-
Misreading of data, especially regarding experimental controls (e.g., complaints that there were none in critical places).
The criticisms also included useful comments, such as the dearth of detailed information on food consumption and rats’ weights; these were measured by Seralini et al. but not reported and could have been provided online as supplementary information.
However, as illustrated with the examples given, I could not find any substantive material definitive enough to invalidate the paper. The most substantial and repeated criticisms were either false or could be answered easily. We are left, predominantly, with polemical criticisms that fail scientifically.
This is an unsatisfactory situation for the interests of biotechnologists in gene transfer (GMOs) as well as for any deeper understanding of the issues among biologists and the public. It is worthwhile, therefore, to examine underlying modes of thought that led to such vehement denial.