Introduction

Since the mid-1990s, genetically modified (GM) crops have increasingly been grown commercially around the world, reaching a global acreage of 189.8 million hectares planted by farmers worldwide, particularly in non-EU countries, such as USA, Brazil, Argentina, Canada, and India. The four major cultivated commodity crops are soybean, maize, oilseed rape, and cotton, whilst a wide range of other GM crops, fruits, and vegetables are grown to a lesser extent. The main traits that have been introduced through genetic modification are herbicide tolerance and insect resistance (ISAAA 2017).

In many countries around the world, GM crops are only allowed onto the market after a regulatory approval or consultation procedure, which entails, amongst others, a pre-market safety assessment. To assess the safety of foods derived from GM crops, the FAO/WHO Codex Alimentarius established internationally harmonized guidelines (Codex Alimentarius 2008). Central to the safety assessment is the comparison of a given GM crop with a genetically near non-GM counterpart with a history of safe use, focusing on the differences between both, identifying both intended and unintended effects of the genetic modification. This entails, amongst others, a comprehensive analysis of the compositional characteristics of these comparators, including nutrients (e.g. amino acids, fatty acids, vitamins and minerals), anti-nutrients, toxins, and other compounds of biological relevance (e.g. phytoestrogens). Based on the differences thus identified and the information already available on the possible safety implications of such changes, it can be determined if and how the further safety assessment should be performed before reaching a conclusion. For newly expressed proteins encoded by the introduced foreign genes, the assessment may, for example, address their potential toxicity and allergenicity through the performance of a number of in silico, in vitro or in vivo tests (Codex Alimentarius 2008).

Besides the above-mentioned compositional analysis, testing of whole food products may be considered in some exceptional cases, whilst it has to be realized that foods are complex mixtures and, unlike purified chemicals, there are boundaries to the dose ranges that can be tested in experimental animals, thereby diminishing the sensitivity of such tests (Codex Alimentarius 2008). In the EU, the requirements for the risk assessment of GM food/feed are specified in detail by the Implementing Regulation No. 503/2013 (EU 2013a), which takes into account the Codex Alimentarius guideline for the conduct of food safety assessment of foods derived from GM plants (Codex Alimentarius 2008).

According to the Implementing Regulation No. 503/2013, applicants are requested to carry out an obligatory 90-day feeding study with whole food/feed for each GMO event to be marketed in the EU, although various stakeholders presented scientific arguments not unanimously supporting this requirement (see e.g. Devos et al. 2016). Depending on the outcome of previous studies, a 2-year carcinogenicity study with rats may also be requested by the European Food Safety Authority (EFSA) on a case-by-case basis. To prepare for such eventualities, EFSA was asked by the European Commission to provide supplementary guidance on key elements to be considered for a 2-year carcinogenicity trial in rats with whole food/feed. Against this backdrop and to address possible concerns after the publication of a study on the long-term toxicity of Roundup, a glyphosate-based herbicide formulation and the glyphosate-tolerant genetically modified maize NK603 (Séralini et al. 2012; EFSA 2012), the European Commission funded the 4-year research project G-TwYST (GM Plant Two Year Safety Testing) to address the following issues related to 2-year feeding trials in a stepwise approach: (1) the execution of at least one rat feeding trial with the GM maize NK603 applying EFSA protocols and recommendations, whereby the participating institutions should strictly comply with all applicable international standards and norms concerning feeding trials in close collaboration with EFSA; (2) analysis, report and provision of recommendations, in particular as to the scientific justification and added value of such long-term feeding trials with regard to GMO risk assessment.

To achieve the first objective, the G-TwYST partners performed three rat feeding studies with GM maize NK603, both untreated and treated once with Roundup® during its cultivation, as well as the untreated conventional counterpart as control:

  • two 90-day trials for subchronic toxicity testing, one with GM maize inclusion rates of 11 and 33% and one with GM maize inclusion rates of 11, 33 and 50% as well as

  • a combined chronic toxicity/carcinogenicity (2-year) study with GM maize inclusion rates of 11 and 33%,

all three of them based on the OECD Guidelines for the testing of chemicals (OECD 1998, 2009a) and EFSA recommendations (EFSA 2011a, 2013, 2014). A 90-day feeding trial with a maize incorporation rate of 50% was included, since EFSA (2014) proposed it as the reference value for a high maize dose in 90-day studies in rodents, based on a report by Zhu et al. (2013). Furthermore, various stakeholders strongly supported the performance of a 90-day feeding trial with a maize incorporation rate of 50% at the first G-TwYST Stakeholder Meeting held in Vienna, Austria, in December 2014.

In the present study, the results of the three above-mentioned feeding trials are described and discussed. Based on the results obtained, the conclusions and recommendations of the G-TwYST consortium on the design, conduct and analysis of rat feeding studies with whole food/feed, as well as the scientific justification and added value of long-term feeding trials for the GM plant risk assessment are presented.

Materials and methods

Plant material

Given that the NK603 maize had been approved in the EU for import, processing as well as food and feed uses but not for cultivation, the NK603 and control maize plants were cultivated according to good agricultural practice in an experimental field station of the University of Guelph (ON, Canada) in 2014. Two varieties were produced: the GM maize NK603 (Pioneer 8906 R), without and with a Roundup application during cultivation, and its near-isogenic non-GM comparator (Pioneer 8906). About 84,000 plants/ha were planted in May 2014, the non-GM seeds on the 13th of May 2014 and the GM seeds on the 14th of May 2014. Weeds were controlled by applying Primextra® II Magnum® (S-Metolachlor + Atrazine, 3.5 L/ha) on the 20th of May 2014 at all sites and Roundup Transorb® HC (2.5 L/ha, 1.35 kg glyphosate [potassium salt]/ha) at the “Roundup site” on the 20th of June 2014. GM maize and non-GM maize were grown on a farm in a distance of about 1 km to avoid cross pollination. Glyphosate-treated and untreated fields were separated by a field road. Maize was harvested in November 2014 and kernels were removed from the cobs on-site by machine. They had grain moisture levels in the usual range and were dried in a biological dryer, kept below 60 °C, down to a moisture level of 13–14%.

Diet preparation and analyses

Four tons of GM maize NK603, four tons of Roundup-treated GM maize NK603 and 7.8 tons of the near-isogenic non-GM comparator (maize kernels) were transported on the 14th of December 2014 to Germany (air freight) and stored in big bags in a commercial storage facility not storing other grains and under ambient climate conditions. No other plant material was stored at the facility. The quality of the kernels was analysed after import. The sampling of the big bags (i.e. kernels from the same variety and treatment) was performed according to Regulation 691/2013 (EU 2013b) by a representative of the Landesuntersuchungsanstalt für das Gesundheits- und Veterinärwesen Sachsen in January 2015.

In autumn 2015, an infestation with the Indian meal moth (Plodia interpunctella) was noticed and a fumigation with phosphine, which does not leave residues on/in the stored kernels, took place. The big bags were moved to another storage facility and stored under ambient climate conditions between November 2015 and April 2017 (storage conditions: 12.3 ± 6.2 °C, 64.6 ± 7.2% humidity, hourly recorded). In summer 2016, mice were viewed in the storing hall, so that the kernels were sieved and placed in rigid containers.

Kernels were shipped to ENVIGO/Mucedola srl (Milan, Italy), milled (mesh size: 1 mm) to prepare the feed. The formulation of the diets was calculated by the nutritionist of the company based on their standard feeds and ingredients for all feeding trials to provide a balanced nutrition. The feeds were isoproteic, isocaloric and adjusted to the dietary requirements of the rat strain Wistar Han RCC used in the feeding trials. Besides the milled maize, the formulation mainly consisted of other plant-derived ingredients, including wheat, wheat middlings, soybean meal, soybean oil and a rice protein concentrate, while it did not contain animal-derived ingredients (Supplementary Electronic Material, Table 1). Except for the GM maize, the other ingredients of the diets were supposed to be GM-free, but traces of GMO (traits) were frequently detected: this information is documented in the diet analyses file that can be accessed via the internet portal named CADIMA (Central Access Database for Impact Assessment of Crop Genetic Improvement Technologies; http://www.cadima.info). Traces of GMO (traits) were also detected in the commercially available seeds of control maize. The pellets were dried at a temperature of < 50 °C, coded in a blinded fashion and sent to the Slovak Medical University (Bratislava, Slovakia) for the feeding trials as vacuum-packed, γ-irradiated batches (irradiation dose = 25 kGy).

Diets were produced in six batches. A re-coded part of batch 3 of all diet groups was used for the 90-day feeding trial with a GM maize inclusion rate of 11 and 33% (complemented with the near-isogenic variety up to 33% maize in toto), whereas a separate batch of diets was prepared for the 90-day feeding trial with a GM maize inclusion rate of up to 50% (complemented with the near-isogenic variety up to 50% maize in toto). Maize and diet subsamples were retained at the animal feed producing facility (Mucedola srl) for analysis. Irradiated diet samples of > 1.5 kg each were sent to the Julius Kühn-Institut (Quedlinburg, Germany) and stored at − 80 °C. Batches 1, 3 and 5 were analysed. Diet samples for analyses were shipped to RIKILT Wageningen University and Research, where the feed pellets were milled and re-mixed. Subsamples of the milled material were analysed by RIKILT and for complementary analyses either dispatched to Covance (Madison, WI, USA) or SGS (Hamburg, GER). The parameters measured and the methods applied by each of the certified laboratories are listed in the diet analyses file (http://ww.cadima.info).

Study design

The study design of the two 90-day feeding trials (Tables 1, 7) was based on the OECD Test Guideline 408 for the testing of chemicals (OECD 1998) and EFSA recommendations on the performance of 90-day rodent feeding trials with whole food/feed (EFSA 2011a, 2014). The study design of the combined chronic toxicity/carcinogenicity feeding trial (Table 11) was based on the OECD Test Guideline 453 (OECD 2009) and EFSA considerations on the applicability of OECD TG 453 to whole food/feed (EFSA 2013).

As recommended by EFSA (2011a), two animals of the same gender were housed per cage and the cage was taken as the experimental unit. In the 90-day feeding trial with a GM maize NK603 inclusion rate of 11 and 33%, as well as in the combined chronic toxicity/carcinogenicity study there were five feeding groups (Tables 1, 11) and in the 90-day feeding trial with a GM maize inclusion rate of up to 50% there were eight feeding groups (Table 7). The cages were organized in blocks of 5–8 cages, and the feeding groups were randomized within blocks, i.e. a completely randomized block design was applied in all three feeding trials. There were separate blocks with male and female rats. To keep the units within blocks as homogeneous as possible, the ten heaviest rats of each sex (based on the weights 48 h after arrival) were housed in the first block, the next ten heaviest rats of each sex were housed in block 2, and so on. Except for feed consumption, which was determined per cage, all other parameters were measured in individual animals.

Rat feeding trials

The trials were performed in compliance with GLP in the experimental animal facility at the Department of Toxicology of the Slovak Medical University in Bratislava (Slovakia). Five-week-old male and female specific-pathogen-free Wistar Han RCC (RccHan™:WIST) rats were purchased from Envigo (San Pietro al Natisone, Italy). A large amount of histopathological data has been published on the three main strains of rat used in carcinogenicity studies, i.e. Wistar Han, Sprague–Dawley, and Fischer 344 (Weber 2017). It is well known that Fischer 344 rats are prone to develop spontaneous myeloid leukemia and Leydig cell tumours, while Sprague–Dawley rats show high incidence rates in the case of spontaneous pituitary gland adenomas and mammary gland neoplasms (Weber 2017). The Wistar Han rat strain was used to perform the three feeding trials described in this paper because it shows the lowest incidence of spontaneous tumours in most organs when compared to the Fischer 344 and Sprague–Dawley strains.

The feeding trials were started 1 week after delivery of the animals at the animal testing facility. A detailed examination of all animals to verify their health condition (see the section “Periodical health status observations”) was carried out just before the start of the feeding trials. Feed and water were supplied ad libitum; feed was changed once a week and water every day. Feed consumption was determined once weekly during the first 13 weeks, every 2 weeks thereafter and reported as the total amount of feed consumed by two animals in one cage per week or 2 weeks, respectively.

Periodical health status observations

Rats were inspected twice daily for changes in skin, fur, eyes, mucous membranes, occurrence of secretions and excretions as well as activity level and change in behaviour. A detailed physical examination of each animal out of the cage was performed prior to the beginning of the feeding trials, on day 1, once weekly during the first 13 weeks and once monthly thereafter to identify changes in skin, fur, eyes, mucous membranes, occurrence of secretions and excretions, autonomic activity such as lacrimation, piloerection, pupil size, unusual respiratory patterns as well as activity level and change in behaviour. At the end of the feeding trials a functional assessment of changes in gait, posture and response to handling, as well as the presence of clonic or tonic movements or bizarre behaviour (self-mutilation, walking backwards) was carried out. Sensory reactivity to auditory, visual and proprioceptive stimuli was recorded. An ophthalmologic examination of both eyes of all animals in the conscious state was performed prior to the beginning of the feeding trials and 2 weeks before the end of the studies. The eyes and the peribulbar structures were examined macroscopically after pupillary dilatation induced by instillation of a 0.5% tropicamide solution. Each animal was weighed 48 h after its arrival at the experimental animal facility of the Slovak Medical University, on the randomization day (i.e. day − 1), on the first day of the feeding trials, once weekly during the first 13 weeks, once every 2 weeks thereafter and at the end of the studies.

Haematology and clinical biochemistry analyses

In the case of the two 90-day feeding trials, blood samples from the tail vein of 16 males and 16 females per group after 16–18 h fasting were taken at the end of the study for the haematological analyses (with EDTA as anticoagulant) as well as for the clinical biochemistry analyses (without anticoagulant). In the case of the combined chronic toxicity/carcinogenicity study, at the end of months 3 and 6, blood samples from the tail vein of 40 males and 40 females per group after 16–18 h fasting were taken for the haematological and clinical biochemistry analyses in the same way as described above. At months 12 and 24, blood samples were taken from all animals in the different groups.

No later than 4 h after collection of the blood samples the following haematology parameters were measured by making use of a Sysmex K-4500 automated haematology analyser (Sysmex, Kobe, Japan): white blood cell count (WBC), red blood cell count (RBC), haemoglobin concentration (HGB), haematocrit (HCT), mean cell volume (MCV), mean corpuscular haemoglobin (MCH), mean corpuscular haemoglobin concentration (MCHC), platelet count (PLT) as well as the absolute (LYMA) and relative lymphocyte count (LYMR). For the differential leukocyte count, blood smears were stained with the May–Grunwald and Giemsa–Romanowski dyes and thereafter examined by light microscopy; the percentage lymphocytes, neutrophils, eosinophils, basophils and monocytes were determined by examining 200 cells.

The parameters alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), albumin (ALB), total protein (TP), glucose (GLU), creatinine (CREA), urea (U), cholesterol (CHOL), triglycerides (TRG), calcium (Ca), chloride (Cl), potassium (K), sodium (Na) and phosphorus (P) were measured maximally 4 h after collection of the blood samples in serum with an Ortho Clinical Vitros® 250 Chemistry System (Ortho-Clinical Diagnostics, Raritan, NJ, USA), whereas coagulation parameters were not determined.

Estrous cycle monitoring

17β-Estradiol was measured in the female rats being in the estrus phase of the estrous cycle. To monitor estrous cycles in adult female rats, daily vaginal lavages were taken during 10 days to track the estrous cycles by vaginal cytology. Vaginal lavages were obtained daily between 08:00 and 09:00 a.m. and examined under a low-power light microscope. Lavages were collected by flushing the entrance of the vagina with physiological saline, and a Giemsa–Romanowski stain was used to visualize the cells by optical microscopy. The stage of the estrous cycle was determined based on the presence of leukocytes (metestrus, diestrus), nucleated epithelial cells (proestrus) and cornified epithelial cells (estrus) (Long and Evans 1922; Caligioni 2009). A female rat that showed a constant 4- or 5-day vaginal estrous cycle was regarded as an animal with a regular estrous cycle. An extended estrus was defined as exhibiting cornified cells with no leukocytes for three or more days, and an extended diestrus was defined as the presence of leukocytes for four or more days (Cooper and Goldman 1999).

Hormone measurements

17β-Estradiol blood levels were measured using the Estradiol (rat) ELISA (EIA-5774) from DRG Instruments (Marburg, Germany). The intra assay coefficient of variation was 4.1%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 2.0 pg/mL. The cross-reactivity reported by the manufacturer was 4.2% for estrone, 3.8% for 17β-estradiol-3-glucuronide and 3.6% for 17β-estradiol-3-sulphate. Testosterone blood levels were measured using the Testosterone (Rat/Mouse) ELISA (EIA-5179) from DRG Instruments. The intra assay coefficient of variation was 5.3%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 0.2 ng/mL. The cross-reactivity reported by the manufacturer was 69.6% for dihydrotestosterone and 7.4% for dihydroxyandrosterone. T3 blood levels were measured using the Total T3 RIA Kit (IM 1699) from Beckman Coulter (Brea, CA, USA). The intra assay coefficient of variation was 9.9%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 0.75 nmol/L. T4 blood levels were measured using the Total T4 RIA Kit (IM 1447) from Beckman Coulter. The intra assay coefficient of variation was 10.2%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 26 nmol/L.

Urinalysis

In the case of the two 90-day feeding trials, an analysis of urine of 16 males and 16 females per group was performed at the end of the study. In the combined chronic toxicity/carcinogenicity study, an analysis of urine of 20 male and 20 female rats per group was performed at months 3, 6, and 12; at month 24, urine samples from all surviving animals were analysed.

Urine was collected from each individual rat in metabolic cages for 16 h. The parameters total protein, glucose, ketone, leukocyte number, erythrocyte number, bilirubin, urobilinogen, nitrate, and pH were analyzed with Combur10Test® UX test strips (Roche Diagnostics, Mannheim, Germany) and semi-quantitatively evaluated by reflectance photometry with a Urilux S analyzer (Roche Diagnostics). Osmolarity was measured with the Advanced® Model 3300 micro-osmometer from Advanced Instruments (Norwood, MA, USA).

Gross necropsy and histopathology

At the end of the study, rats were anaesthetized after a 16- to 18-h fasting period with 10 mg/kg bw xylazine and 75 mg/kg bw ketamine. Blood samples for the corresponding analyses were taken from the abdominal aorta. Thereafter, the successive necropsy of the thoracic cavity, the abdominal cavity, the genital organs and the head was performed. Moreover, the wet weight of the kidneys, spleen, liver, adrenal glands, heart, thymus, uterus, ovaries, testes, epididymides and brain of all animals was recorded. Organ samples were stored in neutrally buffered 10% formalin, except for the eyes and the male reproductive tissues, which were immersed in Bouin’s solution. The formalin-fixed tissue samples were washed, dehydrated and embedded in paraffin. Thereafter, 4-µm thick sections were stained with haematoxylin and eosin for the light microscopic examination of the tissue structure.

A complete microscopic examination of the brain (including cerebrum, cerebellum and medulla/pons), spinal cord (at the cervical, mid-thoracic and lumbar level), pituitary, thyroid, parathyroid, thymus, oesophagus, salivary glands, stomach, small and large intestines, gut-associated lymphoid tissue (GALT), liver, pancreas, kidneys, adrenals, spleen, heart, trachea and lungs, aorta, tongue, eyes, Harderian gland, lacrimal gland, ovaries, cervix, vagina, uterus, female mammary gland, prostate, testes, seminal vesicles and coagulating gland, epididymides, urinary bladder, mesenteric and mandibular lymph nodes, peripheral nerve (sciatic), skeletal muscle, femur, sternum with bone marrow, and skin was performed.

In-life and necropsy phases were conducted “blind” at the Slovak Medical University (Bratislava, Slovakia). The group allocation was revealed to the histology facility (Wolfgang Baumgärtner, University of Veterinary Medicine Hannover, Germany) before the start of tissue staining and slide preparations, and to the pathologist (Roger Alison Ltd., Lampeter, UK) before the start of the histopathological examination, i.e. histology and histopathology were “non-blind”.

Histology was conducted “non-blind” for operational reasons. If histology had been conducted “blind”, all slides from all animals would have had to be processed, incurring significant penalties in cost and time. “Non-blind” histology allowed processing of only control and high dose group animals, together with macroscopic abnormalities and potential target organs. As the histology laboratory was not involved in the generation of data, this was not considered to have impacted upon the scientific evaluation of the study.

Histopathology was conducted “non-blind” because the consensus of opinion among toxicologic pathologists is that “blind reading” during the initial evaluation of tissues can have a negative impact on both the time it takes to accomplish the microscopic evaluation as well as the quality of the information obtained from the study (Iatropoulos 1984; Newberne and de la Iglesia 1985; SOTP 1986; Prasse et al. 1986; Goodman 1988; House et al. 1992; Crissman et al. 2004). “Blind reading” makes the task of separating treatment-related changes from normal variation more difficult and may result in missing subtle lesions. Awareness of the treatment group assignment allows the pathologist to intensely focus the histopathologic evaluation and to find important, and sometimes subtle, differences between the tissues of treated and untreated animals. “Blind reading” is commonly reserved for targeted review of lesions once they have been identified at the primary evaluation, particularly in a “Pathology Working Group”. In addition, all major published data on background lesions of rats from toxicity and carcinogenicity studies has been derived from “non-blind reading”. Reading the present series of studies “blind” would have significantly decreased the comparability of the findings to published data.

The resulting tissue sections were stained by hematoxylin and eosin at the above-mentioned histology facility. Histopathology examinations were performed in compliance with the principles of Good Laboratory Practice (GLP). The PathData V. 6.2d2 computer system was used for the recording and reporting of histopathology findings. A peer review of the study was conducted at the Test Site by Dr. C. Gopinath, Consultant Toxicological Pathologist. The histological sections and raw data (final signed histopathology report) will be returned to the Study Director for archiving. Other data related to the histopathological evaluation of the study will be archived by the study pathologist for at least 10 years. No data will be discarded after this period without the consent of the Test Facility. In the present report, summary tables with all necropsy and histopathology findings are presented. The complete histopathology reports (Alison 2018a, b, c) can be accessed via the internet portal CADIMA (http://www.cadima.info).

The statistical analysis of the incidence of tumors in the combined chronic toxicity/carcinogenicity study was based on the principles outlined by Peto et al. (1980). Peto’s method corrects for longevity (and hence for the period of time at risk) and applies a statistical approach appropriate to the cause of death (“context of observation”). This statistical analysis was performed on all animals from both phases of the study together. Where appropriate, tumors were also grouped for analysis (McConnell et al. 1986). The full analysis is presented in the PathData Appendix in the corresponding histopathology report (Alison 2018c). Neoplasms were considered statistically significant according to the Peto statistical test at p ≤ 0.05 for rare tumors and at p ≤ 0.01 for common tumors (FDA 2001). These analyses were performed using the PathData software.

Statistical analysis

This section provides a short description of the methods used for the statistical analyses presented in this paper, with more detailed descriptions of these and some additionally applied methods given in the Electronic Supplementary Material. A full description of all the statistical methods used and the results obtained in the three studies, except for the histopathology, is given in the statistical reports (Goedhart and van der Voet 2017, 2018a, b, c, d, e, f, g, h, i), which can be accessed via the internet. Statistical analyses were performed separately for the three studies, for males and females, and for the four time points (3, 6, 12 and 24 months) in the combined chronic toxicity/carcinogenicity study. Our main interest was to analyse the difference between each of the GM maize feeding groups and the control feeding groups with the same amount of maize.

Data preparation

All parameters except time of death and pH were transformed to the natural logarithmic scale and then averaged to the cage level. This implies that, rather than looking at differences between feeding group means, ratios between the GM feeds and the corresponding control feed are of interest. Since the endpoints uLeu, uHemogl and uKeton had zero values, half of the smallest positive value was added to these observations before taking the logarithm. Data from rats fed diets with non-GM maize in the EU-funded project GRACE (http://www.grace-fp7.eu) were used as historical control data to set equivalence limits according to the method of van der Voet et al. (2017). The GRACE data have been analysed before (Schmidt and Schmidtke 2014; Schmidt et al. 2015a, b, 2016, 2017; Zeljenková et al. 2014, 2016). More details of the data preparation including outlier detection are given in the Electronic Supplementary Material.

Summary tables of means and standard deviations (SDs), classified by the feeding groups, were prepared on the original non-transformed scale. These tables were obtained by first calculating cage means and then calculating the summary statistics. Extended tables with the number of observations and coefficients of variation (%) are included in the statistical reports (Goedhart and van der Voet 2017, 2018b, c, d, e, f). In the combined chronic toxicity/carcinogenicity study, when measurements were made after 3, 6, 12 and 24 months, the number of cages per feeding group was 35 for the body weights, 20 for the haematology, differential WBC and clinical biochemistry data and 10 for the urine data. The number of cages per feeding group in the two 90-day feeding trials was 8, except for the hormone data with six cages for males and eight cages for females.

Mortality after 2 years

Death events of animals were rare up to month 12, with at most 3 dead animals per group of 50. Therefore, only the mortality rates at 24 months were statistically analysed. Only the 50 animals per sex and feeding group that were part of the 2-year cohort were statistically analysed, because the other 20 animals were sacrificed after 1 year. Fitting a beta-binomial regression model (Williams 1982) by means of maximum likelihood to the number of dead animals in each cage as response variable revealed that the estimate of the beta-binomial over-dispersion parameter equals its bounded value of 0.0001 for both males and females. This indicates that there was no over-dispersion and, therefore, the ordinary logistic model (McCullagh and Nelder 1989) was used to analyse the number of dead animals per cage. After allowing for differences between blocks, one-sided pairwise Wald tests were performed for the one-sided null hypothesis that the mortality probability of a GM feed is equal to or smaller than the mortality probability of the non-GM control feed. Mortality was also analysed by survival analysis using the procedures KAPLANMEIER, RSTEST, RPROPORTIONAL and RPHFIT and direct programming in GenStat 18 (VSN International 2015).

Growth curves and feed consumption

For each individual rat, growth curves were fitted to the observed weights for restricted periods of time. For the data up to 13 weeks (3 months), an exponential growth curve \(A+B~\exp ( - \gamma \;{\text{Week}})\) with growth rate \(\gamma\) was fitted to the observed weights. For the data between weeks 13 and 27 (6 months), and separately for the data between weeks 27 and 52 (12 months), a simple linear regression, \({\text{Weight}}=\alpha +\beta ~\;{\text{Week}}\), was fitted to the observed weights, and the growth rate was defined as \(\gamma =\log (\beta )\). In general, the growth curves fitted very well and it was, therefore, decided to only analyse the estimated growth rates \(\gamma\), further called growth rate, and the final weights observed after week 13, 27 and 52. The latter are further called Weight_13 (or just BodyWeight in the two 90-day feeding trials), Weight_27 and Weight_52. No general growth curves could be fitted for the data between weeks 53 and 104 (24 months) in the combined chronic toxicity/carcinogenicity study, and, therefore, only the final weights for those animals that survived for 24 months, further called Weight_104, were statistically analysed.

Equivalence and difference tests of quantitative endpoints

Traditionally, the first step in the statistical analysis of toxicological data is to determine if there are statistically significant differences between groups (difference tests). Only in a second step of the classical approach, the toxicological relevance of such statistically significant differences is interpreted in the light of historical control data. In this paper, the “toxicological relevance” question is directly addressed using equivalence tests. This procedure has a known probability of a Type 1 error (failing to find a potential relevant difference). In contrast, the traditional two-step approach has an unknown probability of failing to find a potential relevant difference because such a difference might not be statistically significant due to lack of precision. Thus, the one-step approach, which always considers the historical control data (irrespective of the result of the difference test), is preferred to the two-step approach. This is also in line with the EFSA recommendation (EFSA 2011c) stating that less emphasis should be placed on the reporting of statistical significance and more on statistical point estimation and associated interval estimations as more information can be presented using the latter.

Equivalence testing was introduced for GM safety assessment for compositional data in the EFSA guidance for risk assessment of food and feed from GM plants (EFSA 2011b). In the context of 90-day feeding studies in rodents, EFSA (2014) recognized the potential advantages of equivalence testing and recommended further investigation. In response to this issue, an equivalence test was developed in the course of the G-TwYST project (see van der Voet et al. [2017] for a full description and explanation of the approach). This test compares the difference between a test (T) and a control (C) feed, obtained simultaneously in a current study, to the typical differences between reference (R) feeds obtained in one or more historical studies. The equivalence test corrects for between-study differences, and the within-study variation between references R, along with the residual variation, is used to set equivalence limits for the difference between T and C in the current study. The so-called distribution wise equivalence (DWE) criterion is used in this test. The equivalence test employs the concept of desired power (here chosen as 95%) in a simplified situation, where there is no between-reference variation, where the historical and current studies have the same residual variance, and where the current study is assumed to have a sample size as approved by a regulator. Here we assumed regulatory sample sizes equal to the replication, i.e. the number of cages for most parameters in the current studies, e.g. 8 for the 90-day studies.

A critical factor is that the equivalence test of van der Voet et al. (2017) requires historical control data. It is assumed that the natural variability in results between non-GM reference groups can be estimated from previous studies in the same experimental facility. For the 3-month data in the three feeding trials, the GRACE non-GM reference data described in section “Data preparation” serve this purpose. For the data obtained after 6, 12 and 24 months, no corresponding reference data were available. Variance components between and within reference feeds were not much different between 3 and 6 months in the 90-day feeding trial with the GM maize at an inclusion rate of 11 and 33%, and, therefore, the 3-month GRACE data were still used as a reference for the 6-month data. The data variation was generally larger after 12 and 24 months; therefore, no equivalence tests are reported here for these data, and the assessment of the biological relevance was in these cases based on the expert opinion of the study coordinator. Details of the equivalence test calculations are given in the Electronic Supplementary Material.

The equivalence test results in a confidence interval for the so-called equivalence limit scaled difference (ELSD), which can be used both for difference and for equivalence testing. The hypothesis of no difference is rejected in case the interval does not contain zero, while the non-equivalence hypothesis is rejected in favour of an “equivalence” conclusion when the interval fully lies inside the interval (− 1, 1). When the ELSD point estimate is between − 1 and + 1, but one or both of the confidence limits lie outside this interval, the conclusion, in the terminology of EFSA, is that “equivalence is more likely than not”. “Non-equivalence is more likely than not” when the ELSD point estimate is lower than − 1 or higher than + 1. As a further help for interpretation of the ELSD graphs, it can be noted that confidence intervals for ELSD will become wider when the within-group variation in the current study is higher than in the historical data. Such higher variation may be treatment-induced or just the consequence of a lower analytical precision in the current study.

Classical statistical methods for continuous data were applied in line with OECD and EFSA approaches, and very similar to the approaches followed in the GRACE project (Schmidt and Schmidtke 2014; Schmidt et al. 2015a, b). OECD Test Guidelines require numerical results to be evaluated by an appropriate and acceptable statistical method, but give no further guidance on statistical analysis. More detailed guidance, although strictly only meant for chronic and carcinogenicity studies of single compounds, not of complex whole foods, is provided in chapter 4 of the OECD Guidance Document 116 (OECD 2014). A classical analysis of variance was performed on the cage means after log-transforming the data. This was done in the statistical program R. The analysis of variance was performed according to the randomized block design employing the model “Block + Treatment”, where Treatment defines the five feeding groups. For details see the Electronic Supplementary Material. Significance results of the different tests comparing GM maize-fed groups with the control group have been indicated in the tables with means and SDs.

All tests were performed by comparing two-sided 95% confidence intervals to values representing equivalence limits or strict equality (no difference).

Stakeholder involvement and access to data

A key characteristic of the G-TwYST project was to allow for the involvement of a broad range of stakeholders and to ensure transparency of the research conducted, including access to the data. Stakeholder consultations were conducted not just on the results but also a priori on the study plans. The main stakeholder groups targeted were competent authorities, industry, civil society organizations, and researchers interested or experienced in animal feeding studies with GM food/feed. The geographical focus was on Europe.

In a first round, the draft study plans were subjected to a stakeholder consultation. The comments on these drafts, which had been received in the course of a stakeholder workshop and afterwards in writing, were discussed by the study team and taken into account when finalizing the study plans. As a result of the discussions during the stakeholder workshop, a 90-day feeding trial with a GM maize inclusion rate of up to 50%, as well as an analysis of hormone levels in blood samples of male (testosterone, T3 and T4) and female rats (17β-estradiol, T3 and T4) fed the GM maize at inclusion rates of 11 and 33% were performed. Study team members answered the stakeholder comments and questions in a written form, so that the stakeholders could verify if and how their comments had been taken into consideration and understand the underlying reasons for doing so.

In a similar way, the draft results and conclusions were subjected to stakeholder scrutiny. This final round included three subsequent steps, one workshop and two rounds of written comments: one round of written comments focused on the results of the 90-day studies. The stakeholder workshop and another round of written comments focused on the combined chronic toxicity and carcinogenicity study. To facilitate the consultation process, the documents including all full-length statistical as well as histopathology draft reports and draft interpretations were made available to registered participants that had signed a Non-Disclosure Agreement. Again, all stakeholder comments were considered when finalizing interpretation and conclusions, and written responses were prepared by the study team.

Maximum transparency of the process was established by publishing on the project website all stakeholder comments, as well as the written responses of the study team members in comprehensive consultation reports alongside with the draft and revised study plan (http://www.g-twyst.eu).

For each consultation round more than 700 stakeholders were invited by e-mail. Stakeholder participants were not selected in any way: all interested stakeholder representatives could participate. Overall 70 stakeholders from 19 countries participated in one or more consultation stages. A total of 158 stakeholder comments were received in written form and responded to by the study team. In each consultation stage, representatives of all main stakeholder groups targeted were involved and actively contributed.

Access to the raw data

In line with the G-TwYST transparency policy any interested person will have access to the raw data obtained in the frame of the G-TwYST project, including the clinical, ophthalmological, body weight, haematology, clinical biochemistry, organ weight, necropsy and histopathology data presented in this study, through the internet portal CADIMA (http://www.cadima.info).

Results

Feed composition analysis

The main components of the diets are listed in Table 1 of the Electronic Supplementary Material. The amount of soybean meal was restricted to a level of 5% in all diets, since soybean meal contains high amounts of isoflavones, which per se induce estrogenic effects (Eisenbrand 2008) and could, therefore, interfere with the outcome of the feeding trials. Soybean meal is also an important protein source; to compensate the lower delivery of proteins by soybean meal, a rice protein concentrate was included in all diets.

The compositional analysis of the diets used in the three feeding trials performed with the GM maize NK603 in the course of the G-TwYST project can be accessed via the internet portal CADIMA (http://www.cadima.info). Briefly, six batches of all diets were prepared and batches 1, 3 and 5 of all diets were analyzed. All batches showed similar levels of the proximates (ash, total carbohydrates, fat, protein), starch, fibres, amino acids, fatty acids, minerals, vitamins, sugars, anti-nutrients and other secondary metabolites, except for the phenylpropanoids caffeic acid, ferulic acid and p-coumaric acid as well as the B vitamin niacin and the isoflavones daidzin and glycitin, which were nominally higher in the batch 1 than in the batches 3 and 5 of all diets. These differences are due to the fact that Covance analysed the bound + free phenylpropanoids in batch 1, while SGS analysed the free phenylpropanoids in batches 3 and 5. Moreover, the levels of the B vitamin niacin and the isoflavones daidzin and glycitin were also higher in batch 1 than in batches 3 and 5, which is due to some differences in the methodology applied by the different labs involved. None of the detected differences were considered to affect the health of the rats in any way. The levels of the above-mentioned proximates and compounds in the separate batch of diets prepared for the 90-day feeding trial with a GM maize inclusion rate of up to 50% were similar to those measured in the batch 3 of all diets, which were used for the 90-day feeding trial with a GM maize inclusion rate of 11 and 33%.

Low and similar amounts of polychlorinated dibenzo-p-dioxins and dibenzofurans, polychlorinated biphenyls, polycyclic aromatic hydrocarbons, mycotoxins and nitrosamines were detected in all analyzed diet batches. Residues of the pesticides 2-phenylphenol, cypermethrin, deltamethrin, tetramethrin, ethoxyquin, piperonyl butoxide, pirimiphos-methyl, N-desethyl-pirimiphos-methyl and propiconazole were detected in all diets. The glyphosate levels in the near-isogenic non-GM maize (Pioneer 8906) kernels, as well as in the untreated and Roundup-treated GM maize NK603 (Pioneer 8906 R) kernels were ≤ 16 ppb, while the glyphosate levels ranged between about 30 and 140 ppb in batches 1, 3 and 5 of all diets, including the control diet (Table 3, Electronic Supplementary Material). Glyphosate levels were below the limit of detection in the 50% control, 11% NK603, 50% NK603, 11% NK603 + Roundup and 50% NK603 + Roundup diets, while glyphosate was detected in the 33% control and 33% NK603 + Roundup diets used for the 90-day feeding trial with the GM maize NK603 at an inclusion rate of up to 50% (Table 4, Electronic Supplementary Material). The amount of the major glyphosate metabolite aminomethylphosphonic acid (AMPA) in all diets was below the limit of quantification (< 100 ppb). The detected levels of contaminants and pesticide residues were well below regulatory limits and, therefore, none of them were considered to affect the health of the rats in any way.

As expected, the NK603 event was detected in all analyzed batches of the diets containing 11, 33 or 50% of the GM maize NK603 using a highly specific PCR assay amplifying a DNA fragment overlapping the transition between inserted and genomic maize DNA in the NK603 maize (Tables 3, 4, Electronic Supplementary Material). Moreover, the NK603 event was not detected in batch 1 of the control diet, whereas the batches 3 and 5 of the control diets as well as the separate control diet batch with a 50% maize inclusion rate contained non-quantifiable traces of the NK603 event (Tables 3, 4, Electronic Supplementary Material). These NK603 maize traces were considered not to influence the feeding trials in any way.

Following irradiation, microorganisms such as coliforms, Enterobacteriaceae, yeast and molds were not detected in the diets.

90-day feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diets

The study design is shown in Table 1. There was one unscheduled death: a male rat fed the 33% NK603 diet was sacrificed moribund on day 51 due to a malignant lymphoma, which was considered not to be treatment-related, since it was only observed in 1 out of 16 young animals having been fed the diet for less than 2 months. The rat body weights as well as the feed consumption during the 90-day feeding period in all five experimental groups are shown in Figs. 1 and 2. Statistically significant differences were observed in 5 of the 24 comparisons regarding final body weight, growth rate and feed consumption in male and female rats over four NK603 groups. The equivalence tests (in which the data obtained in the present study were compared with the historical control data of the animal testing facility, i.e. the data obtained from the GRACE project feeding trials) showed “equivalence” for 22 of the 24 comparisons, i.e. ELSD intervals were in the equivalence area between − 1 and + 1. The upper ELSD confidence limit was outside the interval [− 1, + 1] for the growth rate in female rats fed the 11% GM maize NK603 or the 33% GM maize NK603 + Roundup diet; therefore, in these two cases the appropriate conclusion is “equivalence more likely than not” (Fig. 2b).

Table 1 Study design of the 90-day feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diet
Fig. 1
figure 1

Male and female rat body weight development (upper charts) and feed consumption (lower charts) during the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

Fig. 2
figure 2

Equivalence testing of the parameters measured in male (a) and female rats (b) fed the diets with 11% GM maize NK603 (NK11−), 33% GM maize NK603 (NK33−), 11% GM maize NK603 + Roundup (NK11+) or 33% GM maize NK603 + Roundup (NK33+) versus the rats fed the control diet. For estimates (square symbols) left of zero, the parameter in rats fed the GM maize feed has a smaller mean than that in rats fed the control feed. Endpoints labelled with a yellow background have a larger residual variance compared to the historical studies (variance ratio > 150%). Fuchsia coloured symbols denote a significant difference

The data on the haematological parameters measured in the blood samples of the male and female rats are shown in Table 2 and Fig. 2. The MCHC was significantly lower in male rats fed the 11% NK603 diet and the 11% NK603 + Roundup (with a Roundup application during cultivation) diet when compared to the control value based on the 95% confidence interval. The percentage lymphocytes (determined in the differential leukocyte analysis) was significantly lower and the percentage neutrophils significantly higher in male rats fed the 33% NK603 diet. Apart from these four statistically significant differences in male rats, no significant differences were observed in the case of the other 108 comparisons with the control group involving the 14 haematological parameters in male and female rats over 4 NK603 groups. The equivalence tests showed “equivalence” in 89 of the 104 comparisons with the control group for 13 haematological parameters (note that there were no historical control data for 1 parameter, LYMR) in male and female rats over 4 NK603 groups, and “equivalence more likely than not” in the case of the remaining 15 comparisons. One or both ELSD confidence limits were outside the interval [− 1, + 1] for the percentage lymphocytes in the male rats fed with any of the 4 GM maize diets (Fig. 2a), for HCT and the percentage lymphocytes in the female rats fed with any of the 4 GM maize diets, for RBC in female rats fed the 33% NK603 diet and for the percentage eosinophils in female rats fed the 11% NK603 + Roundup and the 33% NK603 + Roundup diets (Fig. 2b). These were all cases in which the within-group data variance was larger than the within-variance of the historical control data.

Table 2 Haematology parameters (mean ± SD) in the serum of male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The data on the clinical biochemical parameters measured in the blood samples of the male and female rats are shown in Table 3 and Fig. 2. In male rats, K levels were significantly higher in animals fed the 33% NK603 diet, ALT activity was significantly lower and ALB, TP, CREA and K levels were significantly higher in animals fed the 11% NK603 + Roundup diet and ALP activity was significantly higher in animals fed the 33% NK603 + Roundup diet when compared to the corresponding control diets. In female rats, GLU and Cl levels were significantly lower and the TAG level significantly higher if compared to the corresponding control values. Apart from these 10 significances, the other 110 comparisons of the 15 clinical biochemical parameters with the control group in male and female rats over four NK603 groups showed no statistically significant differences. The equivalence tests showed “equivalence” for all 120 comparisons involving clinical biochemical parameters, i.e. all ELSD intervals were fully inside the interval [− 1, + 1] (Fig. 2a, b).

Table 3 Clinical biochemistry parameters (mean ± SD) in the serum of male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet1

The testosterone, T3 and T4 levels in the serum of male rats are shown in Table 4. The testosterone and T3 levels in the four groups fed the NK603 diet were not significantly different from those in the corresponding control group, whereas the T4 level was significantly lower in the group fed the 33% NK603 diet if compared to the control group. The 17β-estradiol, T3 and T4 levels in the serum of female rats are shown in Table 5. The 17β-estradiol levels in the four groups fed the NK603 diet were significantly lower than that in the control group. The T3 levels in the four groups fed the NK603 diet were not significantly different from that in the control group, while the T4 level was significantly lower in the 33% NK603 + Roundup group than in the control group.

Table 4 Testosterone, T3 and T4 levels (mean ± SD) in the serum of male Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet
Table 5 17β-Estradiol, T3 and T4 levels (mean ± SD) in the serum of female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The urinalysis data are summarized in the statistical report by Goedhart and van der Voet (2017). The urine pH was significantly lower in male rats fed the 33% NK603 and 33% NK603 + Roundup diets than in the control group, while the ketone level was significantly lower in female rats fed the 33% NK603 + Roundup diet than in the control group. Apart from these 3 significances, the other 45 comparisons with the control group involving six urinalysis parameters in male and female rats over 4 NK603 groups showed no significant differences.

The relative organ weights in male and female rats are shown in Table 6 and Fig. 2. The relative thymus weight in male rats fed the 11% NK603 + Roundup diet was significantly higher than that in the control group. In female rats, the relative adrenal gland weight in the groups fed the 11% NK603, 33% NK603 and the 11% NK603 + Roundup diets, the relative ovary weight in the animals fed the 11% NK603 and the 33% NK603 + Roundup diets, as well as the relative spleen weight in rats fed the 11% NK603 + Roundup diet were significantly lower when compared to the corresponding control groups. Apart from these 6 significances, the other 66 comparisons with the control group involving the 10 relative organ weights in male and female rats over 4 NK603 groups showed no significant differences. The equivalence tests showed that one or both ELSD confidence limits, but not the point estimates, were outside the interval [− 1, + 1] for the relative kidney weight in female rats fed the 11% NK603 and the 33% NK603 diets as well as for the relative uterus weight in the animals fed the 33% NK603 + Roundup diet (Fig. 2a, b). In these three cases, the conclusion is, therefore, “equivalence more likely than not”, whereas in the case of all other 69 comparisons with the control group involving the 10 organ weight parameters in male and female rats over three NK603 groups the conclusion is “equivalence”.

Table 6 Relative organ weights in male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The summary tables with all necropsy findings observed in male and female rats are listed in the Supplementary Electronic Material’s Table 5. There were no treatment-related necropsy findings following the feeding of NK603 or NK603 + Roundup to rats for 90 days. The summary tables with all histopathological findings observed in male and female rats are listed in the Supplementary Electronic Material’s Tables 6 and 7, respectively. There were no treatment-related histopathological findings following the feeding of NK603 or NK603 + Roundup to rats for 90 days.

90-Day feeding trial with the GM maize NK603 at an inclusion rate of up to 50% in the diets

The study design is shown in Table 7, while the body weight and the feed consumption of the rats are shown in Fig. 3a, b, respectively, with a summary on group comparisons in Fig. 4a–d. The body weight as well as the feed consumption during the 90-day feeding period was significantly decreased for male rats fed NK603 at the 50% inclusion rate relative to the 50% control diet. The body weight as well as the feed consumption of female rats during the 90-day feeding period was significantly increased for female rats fed NK603 + Roundup at an inclusion rate of 11% compared to the control group with an inclusion rate of 50%. The growth rate was significantly increased for female rats fed NK603 at an inclusion rate of 50%. Apart from these 5 significances, the other 49 comparisons involving the body weight, feed consumption and growth rate of male and female rats over 9 types of comparison (see headers in Fig. 4) showed no significant differences. The equivalence tests showed “equivalence” for all 54 comparisons, i.e. all ELSD intervals were fully inside the interval [− 1, + 1] (Fig. 4a–d).

Table 7 Study design of the 90-day feeding trial with the GM maize NK603 at an inclusion rate of up to 50% in the diet
Fig. 3
figure 3

Male and female rat body weight development (upper charts) and feed consumption (lower charts) during the 90-day feeding trial with GM maize NK603 at an inclusion rate of up to 50% in the diet

Fig. 4
figure 4figure 4

a Equivalence testing of 50% GM maize feeds versus the corresponding 50% non-GM control maize feed for male rats. b Equivalence testing of 50% maize feeds versus the corresponding 33% maize feeds and of 33% GM maize feeds versus the 33% non-GM maize feed for male rats. c Equivalence testing of 50% GM maize feeds versus the corresponding 50% non-GM control maize feed for female rats. d Equivalence testing of 50% maize feeds versus the corresponding 33% maize feeds and of 33% GM maize feeds versus the 33% non-GM maize feed for female rats. In all subplots, for estimates on the left of zero, the first mentioned feed has a smaller mean than the last mentioned feed. Endpoints labelled with a yellow background have a larger residual variance compared to the historical studies (variance ratio > 150%). Fuchsia coloured symbols denote a significant difference

The data on the haematological parameters measured in the blood samples of the male and female rats are shown in Table 8 and Fig. 4. In male rats, HGB in the group fed the 11% NK603 and the 11% NK603 + Roundup diets, PLT in the group fed the 50% NK603 and the 50% NK603 + Roundup diets and HCT in the group fed the 11% NK603 + Roundup diet were significantly higher than in the 50% control group, MCV and MCH were significantly lower in the group fed the 33% NK603 + Roundup than in the 33% control group and MCV and MCH were significantly higher in the group fed the 50% NK603 + Roundup diet than in the group fed the 33% NK603 + Roundup diet. Moreover, the percentage lymphocytes was significantly higher in male rats fed the 50% control diet than in the group fed the 33% control diet. The percentage neutrophils was significantly lower in male rats fed the 50% NK603 diet than in the groups fed the 50% control diet or the 33% NK603 diet. In female rats, RBC was significantly higher and MCV significantly lower in the group fed the 33% NK603 + Roundup diet than in the group fed the 33% control diet, while PLT was significantly lower in the group fed the 50% NK603 diet + Roundup than in the group fed the 33% NK603 + Roundup diet. Apart from these 14 significances, the other 238 comparisons involving the 14 haematological parameters in male and female rats over nine types of comparison (see headers in Fig. 4) showed no significant differences.

Table 8 Haematology parameters (mean ± SD) in the serum of male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of up to 50% in the diet

The differential white blood count data in the present study were much more variable than the historical control data. This indicates that the analytical methods used in the current and historical study were not comparable, and therefore, the historical data were not considered suitable to be used for equivalence tests (see details in Goedhart and van der Voet 2018i). The equivalence tests for all 162 comparisons involving the remaining 9 haematological parameters showed “equivalence”, i.e. all ELSD intervals were fully inside the interval [− 1, + 1] (Fig. 4a–d).

The data on the clinical biochemical parameters in male and female rats are shown in Table 9 and Fig. 4. In male rats, the TP level was lower in the group fed the 50% control diet than in the group fed the 33% control diet, whereas ALB and K levels were significantly higher in the group fed the 11% NK603 diet than in the group fed the 50% control diet. UREA was significantly lower in the group fed the 33% NK603 diet than in the 33% control group, while it was significantly higher in the 11% NK603 + Roundup group than in the 50% control group. In female rats, ALT was significantly higher in the group fed the 50% NK603 + Roundup diet than in the group fed the 33% NK603 + Roundup diet, whereas it was significantly lower in the group fed the 33% NK603 + Roundup diet than in the 33% control group. ALB was significantly lower in the group fed the 50% NK603 + Roundup diet than in the group fed the 33% NK603 + Roundup diet. CHOL was significantly lower in the 50% NK603 group than in the 33% NK603 group and the 33% NK603 + Roundup group; moreover, CHOL was significantly lower in the 50% NK603 + Roundup group than in the 50% control group. TAG levels were significantly lower in the group fed the 50% NK603 + Roundup diet than in the group fed the 50% control diet and the 33% NK603 + Roundup diet. CREA was significantly lower in the group fed the 50% NK603 + Roundup diet than in the group fed the 33% NK603 + Roundup diet. The Ca level was significantly lower in the group fed the 50% NK603 + Roundup diet than in the group fed the 50% control diet and in the group fed the 33% NK603 + Roundup diet when compared to the group fed the 33% control diet. Apart from these 16 significances, the other 254 comparisons involving the 15 clinical biochemical parameters in male and female rats over nine types of comparison (see headers in Fig. 4) showed no significant differences. The equivalence tests showed “equivalence” for 266 comparisons involving the 15 biochemical parameters, whereas the conclusion was “equivalence more likely than not” in four cases (Fig. 4a–d). One of the ELSD confidence limits was outside the interval [− 1, + 1] for CHOL in female rats fed the 50% NK603 + Roundup diet relative to the 50% control diet and the 33% NK603 + Roundup diet, for CHOL in female rats fed the 50% NK603 diet relative to the 33% NK603 diet and for P levels in female rats fed the 33% NK603 + Roundup diet relative to the 33% control diet (Fig. 4c, d).

Table 9 Clinical biochemistry parameters (mean ± SD) in the serum of male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of up to 50% in the diet

The urinalysis data are summarized in the statistical report by Goedhart and van der Voet (2018f). The urine pH was significantly lower in male rats fed the 50% NK603 + Roundup diet than in the group fed the 33% NK603 + Roundup diet. In female rats, the urine volume and pH were significantly lower in the group fed the 50% NK603 diet than in the group fed the 33% NK603 diet, while the urine volume/body weight was significantly higher in the group fed the 33% NK603 diet than in the group fed the 33% control diet and the urine osmolarity was significantly lower in the group fed the 33% NK603 diet than in the group fed the 33% control diet. Apart from these 5 significances, the other 103 comparisons regarding the 6 urinalysis parameters showed no significant differences.

The relative organ weights in male and female rats are shown in Table 10 and Fig. 4. In male rats fed the 50% NK603 diet, the relative brain and testis weights were significantly higher than those in the 50% control group and the relative heart weight and thymus weight were significantly higher than that in the 33% NK603 group. The relative heart weight in the 33% NK603 group was significantly lower than in the 33% control group. In female rats fed the 50% NK603 + Roundup diet, the relative kidney and ovary weights were significantly higher than those in the 50% control group and the relative heart and ovary weights were significantly higher than those in the 33% NK603 + Roundup group. The relative thymus weight in the 11% NK603 + Roundup group was significantly lower than in the 50% control group. Apart from these 10 significances, the other 152 comparisons involving the 10 relative organ weights in male and female rats over 9 types of comparison (see headers in Fig. 4) showed no significant differences. The equivalence tests showed “equivalence” for all but one of the 162 comparisons, and “equivalence more likely than not” for relative kidney weight in female rats fed the 50% NK603 + Roundup diet relative to the 50% control diet (Fig. 4c).

Table 10 Relative organ weights in male and female Wistar Han RCC rats in the 90-day feeding trial with GM maize NK603 at an inclusion rate of up to 50% in the diet

The summary tables with all necropsy findings observed in male and female rats are listed in the Supplementary Electronic Material’s Tables 8 and 9, respectively, those with all histopathological findings observed in male and female rats are listed in the Supplementary Electronic Material’s Tables 10 and 11, respectively. There were no treatment-related necropsy and histopathological findings following the feeding of NK603 or NK603 + Roundup to rats up to 50% inclusion rate for 90 days.

Combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diets

The study design is shown in Table 11. The male and female rat mortality rates in the five experimental groups are shown in Table 12. The mortality rate of the male rats fed the 33% NK603 + Roundup diet was significantly higher than that of the corresponding control group (p = 0.03 in a one-sided test). In contrast, the female rats fed the 33% NK603 + Roundup diet showed a lower, though not significantly lower mortality rate than the corresponding control group (p = 0.07 if the test would have been performed for decreased rather than increased mortality).

Table 11 Study design of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diet
Table 12 Male and female rat mortality rate in the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The body weight of the male and female rats surviving until the end of the 2-year feeding period is shown in Fig. 5. The male rats fed the NK603 + Roundup diet at an inclusion rate of 33% showed a significantly higher body weight when compared to the corresponding control group in the second half of the experimental period, while the body weight of the female rats was similar in all five experimental groups (Fig. 5). The mean feed consumption of male and female rats surviving until the end of the 2-year feeding period was similar to the respective control groups (Fig. 6). The equivalence tests were only performed for the data after 3 and 6 months, since there was only one set of control data from a 1-year feeding trial performed in the course of the preceding GRACE project, but no historical control data from a 2-year feeding trial at the animal housing facility. The equivalence tests showed “equivalence” for body weight and mean feed consumption at t = 3 and t = 6 months in male and female rats, i.e. all ELSD intervals were fully inside the interval [− 1, + 1]. Figures for all parameter groups are shown in Goedhart and van der Voet (2018a, i).

Fig. 5
figure 5

Mean body weight of all male and female rats surviving until the points in time plotted in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

Fig. 6
figure 6

Mean feed consumption of all male and female rats surviving until the points in time plotted in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The data on the haematological parameters measured in the blood samples of the male rats are shown in Table 13. At t = 3 months, WBC were significantly higher in the group fed the 33% NK603 diet than in the control group. At t = 6 months, WBC and LYMA were significantly higher in the groups fed the 33% NK603 and 11% NK603 + Roundup diets than in the control group, while the percentage monocytes was significantly lower in the group fed the 11% NK603 diet than in the control group. At t = 12 months, HCT was significantly higher and MCHC was significantly lower in the group fed the 11% NK603 + Roundup diet than in the control group, while PLT was significantly higher in the groups fed the 33% NK603 and the 33% NK603 + Roundup diets than in the control group. The data on the haematological parameters measured in the blood samples of the female rats are shown in Table 14. At t = 6 months, PLT was significantly higher in the group fed the 11% NK603 + Roundup diet than in the control group, whereas the percentage monocytes was significantly lower in the group fed the 33% NK603 + Roundup diet than in the control group. At t = 12 months, MCH was significantly lower in the groups fed the 11% NK603 + Roundup and the 33% NK603 + Roundup diets and MCHC was significantly lower in the group fed the 11% NK603 + Roundup diet than in the control group. Moreover, the percentage lymphocytes was significantly lower and the percentage neutrophils significantly higher in the group fed the 11% NK603 diet than in the control group. At t = 24 months, the percentage eosinophils was significantly higher in the group fed the 33% NK603 diet than in the control group. Apart from these 21 cases of significance, the other 427 comparisons involving the 14 haematological parameters in male and female rats across 4 points in time and 4 NK603 groups showed no significant differences between the NK603 groups and the control group.

Table 13 Haematology parameters (mean ± SD) in the serum of male Wistar Han RCC rats in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet
Table 14 Haematology parameters (mean ± SD) in the serum of female Wistar Han RCC rats in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The differential white blood count data in the present data were much more variable than the historical reference data. This indicates that the analytical methods used in the current and historical study were not comparable, and therefore, the historical data were not considered suitable to be used for equivalence tests (see details in Goedhart and van der Voet 2018i). The equivalence tests for the remaining 9 haematological parameters showed “equivalence” in the case of 131 comparisons, “equivalence more likely than not” for 12 comparisons, and “non-equivalence more likely than not” for one of the 144 comparisons with the control group. The latter case, in which the ELSD point estimate was outside the equivalence area between − 1 and + 1, was the percentage monocytes in female rats fed the 33% NK603 + Roundup diet at t = 6 months. Furthermore, one or both ELSD confidence limits were outside the interval [− 1, + 1] for MCV in male rats fed any NK603 diet at t = 3 months and male rats fed the 33% NK603 or the 33% NK603 + Roundup diet at t = 6 months, for MCH in male rats fed any NK603 diet at t = 6 months, as well as for MCHC in female rats fed the 33% NK603 + Roundup diet at t = 3 months or the 11% NK603 + Roundup diet or the 33% NK603 diet at t = 6 months.

The data on the clinical biochemical parameters in male rats are shown in Table 15. The TAG levels were significantly higher in the groups fed the 11% NK603 + Roundup diet than in the control group at t = 6 and 12 months, while UREA was significantly higher in the group fed the 11% NK603 diet than in the control group at t = 24 months. Cl was significantly lower in the groups fed the 33% NK603, 11% NK603 + Roundup and 33% NK603 + Roundup diets than in the control group at t = 12 months, whereas K was significantly higher in the group fed the 11% NK603 + Roundup diet than in the control group at t = 24 months and P was significantly lower in the group fed the 11% NK603 diet than in the control group at t = 12 months. The data on the clinical biochemical parameters in female rats are shown in Table 16. ALT was significantly lower in the group fed the 11% NK603 + Roundup diet than in the control group after 12 months and AST was significantly higher in the group fed the 33% NK603 + Roundup diet than in the control group at t = 6 months. TAG was significantly lower in the group fed the 11% NK603 + Roundup diet than in the control group at t = 6 months. CREA was significantly higher in the group fed the 11% NK603 diet than in the control group at t = 12 months and was significantly higher in the groups fed the 33% NK603, 11% NK603 + Roundup and 33% NK603 + Roundup diets than in the control group at t = 24 months. UREA was significantly higher in all four groups fed the NK603 containing diets than in the control group at t = 12 months. Cl was significantly higher in the group fed the 11% NK603 + Roundup diet than in the control group at t = 3 months, while it was significantly lower in the group fed the 11% NK603 diet than in the control group at t = 24 months. K was significantly higher in the group fed the 33% NK603 diet than in the control group at t = 3 months and in the group fed the 11% NK603 diet than in the control group at t = 24 months. Na was significantly higher in the group fed the 11% NK603 + Roundup diet than in the control group at t = 3 months and in the groups fed the 11% NK603 and 33% NK603 diets than in the control group at t = 12 months. P was significantly higher in the group fed the 33% NK603 diet than in the control group at t = 3 months. Apart from these 30 cases of significance, the other 450 comparisons involving the 15 clinical biochemical parameters for males and females over 4 time points and 4 NK603 groups showed no significant differences between the NK603 fed groups and the group fed the control diet. Likewise, the equivalence tests at t = 3 and t = 6 months showed “equivalence” for 225 comparisons involving the 15 clinical biochemical parameters, and “equivalence more likely than not” for 15 comparisons: one or both ELSD confidence limits were outside the interval [− 1, + 1] for CHOL in female rats fed the 33% NK603, the 11% NK603 + Roundup or the 33% NK603 + Roundup diet at t = 3 months, or fed any NK603 diet at t = 6 months, and P levels in female rats fed the 33% NK603 diet at t = 3 months.

Table 15 Clinical biochemistry parameters (mean ± SD) in the serum of male Wistar Han RCC rats in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet
Table 16 Clinical biochemistry parameters (mean ± SD) in the serum of female Wistar Han RCC rats in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet

The urinalysis data are summarized in the statistical reports by Goedhart and van der Voet (2018a, b, c, d, e). In male rats, at t = 3 months, the urine volume was significantly lower and the urine leukocyte number as well as the urine ketone level were significantly higher in the group fed the 33% NK603 diet than in the control group. At t = 6 months, the urine volume was significantly lower in the group fed the 33% NK603 diet than in the control group. At t = 12 months, the urine volume and the urine volume/body weight were significantly lower and the urine osmolarity and the urine ketone level were significantly higher in the group fed the 33% NK603 diet than in the control group, while the urine volume, urine volume/body weight and urine leukocyte number were significantly lower in the group fed the 11% NK603 diet than in the control group. At t = 24 months, the urine ketone level was significantly higher in the group fed the 33% NK603 + Roundup diet than in the control group. In female rats, at t = 3 months, the urine ketone level was higher in the group fed the 11% NK603 + Roundup diet than in the control group, while the urine volume/bodyweight was lower in the group fed the 33% NK603 + Roundup diet than in the control group. At t = 24 months, the urine pH was higher in the group fed the 33% NK603 diet than in the control group. Apart from these 13 cases of significance, the other 243 comparisons involving 8 urinalysis parameters in male and female rats over 4 points in time and 4 NK603 groups showed no significant differences between the NK603-fed groups and the group fed the control diet.

The relative organ weights in male and female rats at t = 12 months are shown in Table 17. In male rats fed the 11% NK603 diet, the relative epididymides weight was significantly higher than that in the control group. In female rats, the relative brain weight was significantly higher in the four groups fed the NK603 diets, whereas the relative kidney weight was significantly higher in the groups fed the 11% NK603 + Roundup and the 33% NK603 + Roundup diets than in the control group. Apart from these 7 cases of significance, the other 57 comparisons involving the 9 relative organ weights over 4 time points and 4 NK603 groups for males and females were not statistically significant between the NK603 fed groups and the group fed the control diet.

Table 17 Relative organ weights in male and female Wistar Han RCC rats in the combined chronic toxicity/carcinogenicity feeding trial with GM maize NK603 at an inclusion rate of 11 and 33% in the diet (at t = 12 months)

The complete histopathology report is available via the internet portal CADIMA. As previously mentioned, the mortality rate of the male rats fed the 33% NK603 + Roundup diet was significantly higher than that of the corresponding control group. Tables 18 and 19 list the causes of premature death in male rats fed the control diet and the 33% NK603 + Roundup diet, respectively. The most common cause of premature death in both groups was a pituitary pars anterior adenoma, 12 in the control group and 17 in the group fed the 33% NK603 diet. The next most common cause of premature death was a kidney chronic progressive nephropathy, 1 in the control group and 3 in the group fed the 33% NK603 + Roundup diet.

Table 18 Causes of premature death in male rats fed the control diet in the course of the combined chronic toxicity/carcinogenicity feeding trial
Table 19 Causes of premature death in male rats fed the 33% NK603 + Roundup diet in the course of the combined chronic toxicity/carcinogenicity feeding trial

The necropsy findings in the chronic toxicity phase recorded in this study were considered to be within the normal range of background alterations seen in untreated animals of this age and strain (McInnes 2012; Blankenship and Skaggs 2013; Cesta et al. 2014). The macroscopic findings in the brain, pituitary gland and mammary glands are listed in Table 20. The deformation of the brain correlated with the microscopic findings of compression by a pituitary neoplasm and mammary nodules and masses correlated with the microscopic findings of mammary neoplasia (see below). All microscopic findings observed in the rats in the chronic toxicity phase are listed in the Supplementary Electronic Material’s Table 12. The microscopic findings in the pituitary gland and mammary glands are listed in Table 21. There were no statistically significant differences in the number of macroscopic and microscopic findings between control rats and the NK603-fed rats.

Table 20 Macroscopic findings in the brain, pituitary gland and mammary glands in the chronic toxicity phase of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603
Table 21 Microscopic findings in the pituitary gland and mammary glands in the chronic toxicity phase of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603

The necropsy findings in the carcinogenicity phase recorded in this study were considered to be within the normal range of background alterations seen in untreated animals of this age and strain (McInnes 2012; Cesta et al. 2014). The macroscopic findings in the brain, pituitary gland, mammary glands and thymus are listed in Table 22. The deformation of the brain correlated with microscopic findings of compression by a pituitary neoplasm (see below). The findings in the thymus (enlarged and nodule) correlated with microscopic findings of thymoma (see below). The mammary nodules and masses correlated with microscopic findings of mammary neoplasia (see below).

Table 22 Macroscopic findings in the brain, pituitary gland, mammary glands and thymus in the carcinogenicity phase of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603

All non-neoplastic microscopic findings observed in the rats in the carcinogenicity phase are listed in the Supplementary Electronic Material’s Table 13. There were no treatment-related differences in the number of non-neoplastic microscopic findings between control rats and NK603-fed rats. The neoplastic lesions observed in male and female rats in the chronic toxicity phase of the combined chronic toxicity/carcinogenicity feeding trial with NK603 are shown in Table 23. All neoplastic microscopic findings observed in the rats fed the 33% control, 33% NK603 and 33% NK603 + Roundup diets in the carcinogenicity phase are listed in Table 24. There were no treatment-related differences in the number of neoplastic microscopic findings between control rats and NK603-fed rats. The observed increase in the number of benign thymomas in the female group fed NK603 + Roundup at an inclusion rate of 33% was not statistically significant when compared to the control group, or when benign and malignant thymomas were analyzed in combination. Moreover, this was also the case of pituitary gland and mammary gland tumours.

Table 23 Neoplastic lesions in male and female rats fed the 33% control, the 33% NK603 and the 33% NK603 + Roundup diets in the chronic toxicity phase of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603
Table 24 Neoplastic lesions in male and female rats fed the 33% control, the 33% NK603 and the 33% NK603 + Roundup diets in the carcinogenicity phase of the combined chronic toxicity/carcinogenicity feeding trial with the GM maize NK603

Discussion

The GM maize NK603 produced for the NK603 + Roundup diets to perform the three feeding trials was treated once with the glyphosate-containing herbicide Roundup Transorb® HC as described in Materials and Methods following the principles of good agricultural practice and in conformity with the requirements of the KBBE-2013-FEEDTRIALS call “Two-year carcinogenicity rat feeding study with maize NK603”. During the final conference of the G-TwYST project, it was argued that the maize should have been exposed to much higher concentrations of the herbicide to induce toxic effects in the rats during the feeding trials, but this would have contradicted the principles of good agricultural practice. Moreover, it should be noted that the toxicity testing of glyphosate as such was not within the scope of the above-mentioned call. The toxicological evaluation of the active substance glyphosate by EFSA relied on a large number of studies (EFSA 2015). Glyphosate did not show a genotoxic potential and there was no evidence of carcinogenicity in rat or mice studies. Considering a number of long-term studies in rats, an overall long-term No Observed Adverse Effect Level (NOAEL) of 100 mg/kg body weight/day was obtained. Based on an overall maternal and developmental NOAEL of 50 mg/kg body weight/day obtained from several developmental toxicity studies in rabbits, an acceptable daily intake (ADI) value of 0.5 mg/kg body weight/day as well as an acute reference dose (ARfD) of 0.5 mg/kg body weight were derived (EFSA 2015).

The glyphosate levels in the near-isogenic non-GM maize (Pioneer 8906) kernels as well as in the untreated and Roundup-treated GM maize NK603 (Pioneer 8906 R) kernels were ≤ 16 µg/kg, which is far below the maximum residue level for glyphosate in maize (1 mg/kg) in the European Union (EU-Pesticides Database, accessed on the 5th of February 2019). All diets, including the control diet, were contaminated with low levels of glyphosate (about 30–140 µg/kg) not considered to be adverse. The source of these low levels of the contaminant is not known; they might derive from constituents in the diet other than the maize.

The choice of the rat strain for the G-TwYST feeding trials proved to be extremely important for the type of studies to be performed. In the study by Séralini et al. (2012), the Sprague–Dawley rat strain was chosen and a very high incidence of mammary gland tumors in female animals was reported. However, the female Sprague–Dawley rats are not suitable to demonstrate treatment-related mammary carcinogenesis because of the very high incidence of spontaneously formed mammary gland tumors (Weber 2017). In the present study, the incidence of mammary gland neoplasms in female rats fed the control diet was 44% (i.e. 7 adenocarcinomas, 2 adenomas and 13 fibroadenomas out of 50 rats). Therefore, the Wistar Han RCC rat strain used in the present study was a suitable choice to test if the GM maize NK603 had the ability to stimulate mammary gland carcinogenesis. It should be noted that the Wistar Han RCC rat strain has previously been used in numerous studies as an experimental model of chemically induced mammary gland carcinogenesis (e.g. Abd-Ellatef et al. 2017; Ganaie et al. 2017; Smina et al. 2017).

In the 90-day feeding trial with the GM maize NK603 at an inclusion rate of 11 and 33% in the diets, no adverse changes were observed regarding the parameters listed in the OECD Test Guideline 408 (OECD 1998) including the health status of the animals, the body weights, the haematological and clinical biochemical parameters, the relative organ weights as well as the necropsy and histopathology findings. Whilst some parameters showed statistically significant differences between the various NK603 maize-fed groups and the control group, “equivalence” or “equivalence more likely than not” could almost always be established for differences between the data of these groups and the historical control data. Moreover, the differences between test groups and the control group were generally small and/or not dose-related further indicating that they reflect normal variation and are not biologically relevant. Furthermore, there was no correlation between the haematological parameters, the clinical biochemical parameters and the relative organ weights showing significant differences and the necropsy and histopathology findings.

The T4 level was significantly lower in the sera of male rats fed the 33% NK603 diet and in the sera of female rats fed the 33% NK603 + Roundup diet if compared to the corresponding control groups, but there were no histopathological changes in the thyroid gland, so that these changes in the T4 levels are considered not to be adverse. The 17β-estradiol levels in the four groups of female rats fed the GM maize NK603 were significantly lower than that in the control group. However, these were not accompanied by histopathological alterations in any of the analyzed estrogen-sensitive tissues/organs, so that they are considered not to be adverse.

In the 90-day feeding trial with the GM maize NK603 at an inclusion rate of up to 50% in the diets, no adverse changes related to the feeding of the NK603 maize were observed regarding the parameters listed in the OECD Test Guideline 408 (OECD 1998) including the health status of the animals, the haematological and clinical biochemical parameters, the relative organ weights as well as the necropsy and histopathology findings. EFSA (2014) proposed an inclusion rate of 50% as a high maize dose in 90-day studies in rodents, based on a report by Zhu et al. (2013). In the present study, there was no indication of a nutritional imbalance if the maize inclusion rate was increased from 33 to 50% in a subchronic feeding trial, whereby in this case the diet composition had to be rebalanced. However, since it has been described that the subchronic and chronic feeding of high-protein diets leads to renal damage in rats (Rao 2002; Haseman et al. 2003; Wakefield et al. 2011; Aparicio et al. 2013), the inclusion rate of 33% maize in the diets was chosen by the G-TwYST consortium as the “high dose” of maize for the combined chronic toxicity/carcinogenicity study.

Mortality was significantly increased in male rats fed the 33% NK603 + Roundup diet when compared to the control group and was related to the increased number of deaths from pituitary neoplasia (12 male rats fed the control diet versus 17 male rats fed the 33% NK603 + Roundup diet), the most common cause of death in all groups including controls. However, it is important to mention that there was no effect of the 33% NK603 + Roundup diet on the overall incidence of pituitary neoplasia, i.e. the incidence of pituitary pars anterior adenomas in male rats fed the control and the 33% NK603 + Roundup diets was 62 and 58%, respectively. The increased mortality observed between the 12th and 24th month of the feeding trial in male rats fed the 33% NK603 + Roundup diet coincided with a strong increase in the body weight of these rats in this period of time. These results are in accordance with previous studies showing that ad libitum feeding of common high calorie diets to rodents in long-term studies results in higher body weights, an earlier onset and a higher incidence of spontaneous neoplasms such as pituitary gland tumors, as well as a reduced survival of these animals and that these effects are not observed in rats undergoing a dietary restriction (Keenan et al. 1995a, b; Roe et al. 1995; NTP 1997; Nold et al. 2001). Therefore, it is concluded that the increased mortality associated with a higher body weight and an increased incidence of pituitary neoplasms in male rats fed the 33% NK603 + Roundup diet is not specifically related to the feeding of this diet (Keenan et al. 1995a, b; Nold et al. 2001). In female rats fed the 33% NK603 + Roundup diet, the mortality rate was decreased when compared to the control female rats, but this difference was not statistically significant (p = 0.07). It should be noted that the overall incidence of pituitary neoplasms in control and GM maize NK603-fed male and female rats did not differ significantly.

The necropsy findings in the chronic toxicity and carcinogenicity phase of the 2-year feeding trial were considered to be within the normal range of background alterations seen in untreated animals of this age and strain and their incidence was similar in the control group and the groups fed the GM maize NK603. Moreover, the microscopic findings in the chronic toxicity and carcinogenicity phase of the 2-year feeding trial were consistent with spontaneously occurring findings described in the literature, the findings were distributed randomly among groups, and/or their appearance was similar to findings found in controls. The thymoma is a typical Wistar rat-related neoplasm, with females more affected than males (Murray et al. 1985; Poteracki and Walsh 1998; Weber 2017). The data obtained in the present study corroborate these earlier findings, whereby there was no statistically significant difference in the incidence of benign thymomas in the female control group and the female group fed the GM maize NK603 treated with Roundup at an inclusion rate of 33% or when benign and malignant thymomas were analyzed in combination. Based on the results obtained in the three feeding trials performed in the course of the G-TwYST project, it is concluded that there were no adverse findings related to the feeding with the NK603 or NK603 + Roundup diets to rats for up to 2 years.

In the study by Séralini et al. (2012), the most frequently occurring anatomical pathologies were observed in the kidneys, liver and digestive tract of male Sprague–Dawley rats and in the pituitary gland and mammary glands, including mammary tumors, in female Sprague–Dawley rats. Moreover, Séralini et al. (2012) indicated that the diets containing the untreated GM maize NK603 and the Roundup-treated GM maize NK603 contained significantly lower levels of the phenolic acids caffeic acid and ferulic acid than the control diet and hypothesised that this decrease could explain the higher tumor incidence observed in the GM maize NK603-fed rats due to less protection afforded by these compounds. In the present study, there were no statistically significant differences in the number of non-neoplastic and neoplastic findings in the above-mentioned organs/tissues between Wistar rats fed the control diet and those fed the GM maize NK603-containing diets. In line with these findings, the clinical biochemical parameters of hepatic (ALP, ALT, AST, TP and ALB) and renal function (CREA, UREA and ions) remained, with single exceptions, unaltered. Furthermore, the levels of caffeic acid were similar in the three batches of control and GM maize NK603 containing diets, and this was also the case of ferulic acid, whereby much higher levels of this compound were measured in batch 1 than in the batches 3 and 5 of all analyzed diets due to the different extraction methods applied by the two laboratories subcontracted to perform the analyses.

The protocols outlined in the OECD Test Guidelines 408 for subchronic toxicity testing and 453 for a combined chronic toxicity/carcinogenicity testing in rodents (OECD 1998, 2009) have been designed for the testing of single chemicals and provide standard procedures to identify health hazards resulting from the repeated exposure to chemicals for 90 days, 1 year and 2 years. The OECD Test Guidelines recommend the use of at least 10 animals/sex and group for subchronic toxicity testing, 20 animals/sex and group for chronic toxicity testing and at least 50 animals/sex and group for carcinogenicity testing. Chemicals can be administered individually to rodents at doses several multiples higher than the amount of the chemicals to which humans are exposed to test whether they may lead to toxicity. Whole food/feed contains a mixture of constituents and can only be administered to rodents at rather limited levels to avoid a nutritional imbalance. Therefore, it is unlikely that substances present in small amounts and with a low toxic potential in whole food/feed will cause any observable effects in animal feeding trials (EFSA GMO Panel Working Group on Animal Feeding Trials 2008).

The G-TwYST project has demonstrated that in principle, following the approach presented in van der Voet et al. (2017), reference groups of animals from previous similar trials in the same experimental facility fed with non-GM plant material considered to be safe may be used to define a regular bandwidth for each endpoint. In practice, it may be difficult to apply this approach, e.g. when the variability of analytical methods is not stable between the historical and current studies, such as for the differential white blood count data in this study, or when no historical data are available at all, as for the 2-year feeding trial. Traditional statistical tests applied in toxicological studies to find differences between test and control groups cannot make a distinction between biologically relevant and non-relevant effect sizes, and EFSA has recommended to put more emphasis on the use of confidence intervals (EFSA 2011c). Based on a confidence interval approach, equivalence tests can be used to show that a test group is within the bandwidth defined from historical data obtained in recent studies conducted in the same testing facility (“proof of safety”).

Equivalence bandwidths are ideally based on a large pool of historical data to capture the whole bandwidth of “safe” values. In this study, the available historical control database from the same test facility was limited with respect to the number of non-GM reference groups. This limitation leads to uncertainties, and to wider expected equivalence regions than would be obtained with a larger historical database. However, the consequences of limited data are taken into account in the equivalence tests, effectively by allowing larger differences to be equivalent. It may be noted that in many cases adverse effect limits are expected to lie far outside the equivalence region; thus, our approach is conservative and does not exclude toxic effects too easily.

An alternative approach is to define bandwidths using toxicologically defined quantifications of adverse effects. This was performed for nine endpoints, and was reported by Goedhart and van der Voet (2017, 2018a, f). However, tentative toxicologically defined bandwidths are often only available for a few measured endpoints in animal studies, and, therefore, an equivalence assessment based on historical data was preferred.

In the above-mentioned context, the non-GM data obtained in the course of the 90-day and 1-year rat feeding trials performed in the preceding GRACE project were used as historical control data for the equivalence testing in the G-TwYST project. G-TwYST proposed and applied a statistical method for equivalence testing that accounts for statistical uncertainties in both the current 90-day feeding trials and the chronic toxicity testing phase of the 2-year feeding trial as well as in the historical reference data (van der Voet et al. 2017; Goedhart and van der Voet 2017, 2018a, f). The analysis of the data from the rat feeding trials performed in the course of the G-TwYST project showed that the cases in which equivalence could not be concluded with a 95% statistical significance and the cases in which significant differences were observed correspond to no more than about 5% of cases across 1424 equivalence tests and 3472 difference tests, which is the expected percentage for statistical tests using a 5% error level. Moreover, most of the cases in which equivalence could not be concluded with a 95% statistical significance were related to a larger within-group variation of the measurements in the current studies as compared to the historical control data, which may be an analytical rather than a toxicological reason for a wide confidence interval.

G-TwYST performed a power analysis for quantitative findings in 90-day studies estimating effect sizes that could be estimated with 80% power (Goedhart and van der Voet 2018h). For parameters in the list of determinations as advocated by the OECD Test Guideline 408 for testing chemicals (OECD 1998) and the EFSA Guidance Document (EFSA 2011b), such as body/organ weights, haematology and clinical chemistry parameters, a design with 8 cages per group in a 90-day study is appropriate for more than 80% of the quantitative parameters if deviations of 1.3-fold (+ 30% or − 23%) or more have to be detected with at least 80% power.

However, the number of quantitative parameters as required to be measured by OECD (1998) is 40–50, and the within-group variability in these parameters was found in the G-TwYST studies to vary considerably, between 1% and 44% expressed as a coefficient of variation. The 10–20% of the parameters with a relatively high variability would require larger sample sizes to attain the same power as for the other parameters. In this study, we found relatively large variability for five parameters in both sexes (monocytes, eosinophils, BIL, TAG and HGB), and additionally for five parameters in female rats only (WBC, LYMA, Neutrophils, ALT and uterus weight) (Goedhart and van der Voet 2018h).

G-TwYST performed a whole food/feed combined chronic toxicity/carcinogenicity study with 50 animals/sex and group for the carcinogenicity phase, which is the minimal number requested by the OECD Test Guideline 453 (2009). A power analysis for qualitative (yes/no) findings, such as deaths and histopathological results, shows that with such a design only large differences in single incidences can be detected with at least 80% power (Goedhart and van der Voet 2018h). Increasing the number of animals is of limited help, and therefore, not a suggested direction for future animal studies. For example, in case that the control incidence is 1%, a ninefold rather than a 16-fold increase can be detected with 100 instead of 50 animals/sex and group, and, in the case that the control incidence is 10%, a 2.4-fold rather than a 3.2-fold increase can be detected. These levels of changes are often larger than what toxicologists would judge as minimal relevant values. The low sensitivity is inherent to the qualitative character of the findings. These considerations point at the limitations of using whole food/feed studies as an untargeted screening approach and underline the necessity to perform such studies with clearly targeted hypotheses. It is concluded that, apart from setting effect sizes of interest, a prior selection should be made of those parameters for which a sufficiently high power is needed, before a power analysis can be helpful to set the sample sizes for a 90-day animal study in line with the OECD Test Guideline 408 (1998) and EFSA documents (EFSA 2011b, 2014). The potential usefulness of such studies lies in the possibility to consider and interpret patterns of findings rather than in an analysis of single endpoints.

G-TwYST has confirmed that the variability of quantitative measurements in a combined chronic toxicity/carcinogenicity study is in general higher for the measurements after 12 and 24 months as compared to 3 or 6 months (Goedhart and van der Voet 2018a, g). Therefore, the statistical power to detect a certain difference decreases as the duration of the study increases, and larger samples may be needed for GM plant risk assessment based on 12 or 24 months data. In terms of clinical chemistry and haematology measures carried out at the 2-year termination period, scientific opinion is split as to their actual value since the concurrent appearance of age-related diseases, such as end-stage kidney and hormonal senescence, tends to confound the interpretation by increasing the variability in serum parameters associated with such effects (Young et al. 2011). For this reason, OECD TG 453 leaves the option of whether or not to carry out clinical chemistry and haematology measures after 24 months to the discretion of the study director (OECD 2009). Similar endpoints show less variability at the 12-month than at the 24-month termination point in time, but nevertheless do show increased variability when compared to those measured at the 90-day point in time due to the early appearance of age-related disease in some animals.

The G-TwYST project provided a broad set of data indicating that the performance of rat feeding trials with whole food/feed for the risk assessment of a GM plant did not result in the identification of hazards due to the GM maize NK603 and is absolutely in line with the earlier risk assessment published by EFSA (EFSA 2003). The obligatory risk assessment included, in addition to the detailed safety assessment of the newly introduced trait, a comparative assessment focusing on potential unintended effects of the genetic modification. This comparative assessment included: (1) a detailed molecular characterisation of the genetic modification, i.e. of the inserted sequence as well as of the place of insertion; (2) a phenotypic and agronomic comparison of the GM variety and its conventional comparator; (3) a compositional analysis of the GM variety in relation to its conventional counterpart as well as other non-GM varieties (reference varieties) (EFSA 2011b).

The 3Rs (Replacement, Reduction and Refinement) are guiding principles for a more ethical use of animals in laboratory experiments and were first described by Russell and Burch (1959). From the 3Rs perspective, it is particularly important to point out that the added value of long-term animal studies with whole food/feed should be carefully evaluated given the very high number of animals needed (in the case of the G-TwYST project: 720, 172 and 268 in the combined chronic toxicity/carcinogenicity study, the 90-day feeding trial with the 11 and 33% inclusion rates of the GM maize NK603 and the 90-day feeding trial with the 50% inclusion rate of the GM maize NK603, respectively).

In the original assessment of the GM maize NK603, there were no indications that the NK603 maize was not as safe as other maize varieties currently on the European market. As a result of this lack of hazards, there were no triggers, based on general toxicological principles, to perform additional animal feeding studies. Therefore, the G-TwYST 90-day and long-term animal studies were conducted in the absence of a specific hypothesis. The G-TwYST data from 90-day and long-term rodent feeding studies confirmed that there are no hazards, thereby supporting the results from the initial risk assessment performed by EFSA.

In conclusion, in the European GRACE and G-TwYST projects a series of animal feeding trials were performed (Zeljenková et al. 2014, 2016; this study). This series of studies neither delivered a scientific basis for the 90-day animal feeding trial demanded by the European Commission to be performed for each new GM plant variety nor did it indicate that untargeted, extended feeding studies with rats fed GM plant material are of value for a final confirmation of safety. Thus, an added value of animal studies relative to the available non-animal studies for the risk assessment of GM plants (EFSA Scientific Committee et al. 2017) was not substantiated.