Plant material
Given that the NK603 maize had been approved in the EU for import, processing as well as food and feed uses but not for cultivation, the NK603 and control maize plants were cultivated according to good agricultural practice in an experimental field station of the University of Guelph (ON, Canada) in 2014. Two varieties were produced: the GM maize NK603 (Pioneer 8906 R), without and with a Roundup application during cultivation, and its near-isogenic non-GM comparator (Pioneer 8906). About 84,000 plants/ha were planted in May 2014, the non-GM seeds on the 13th of May 2014 and the GM seeds on the 14th of May 2014. Weeds were controlled by applying Primextra® II Magnum® (S-Metolachlor + Atrazine, 3.5 L/ha) on the 20th of May 2014 at all sites and Roundup Transorb® HC (2.5 L/ha, 1.35 kg glyphosate [potassium salt]/ha) at the “Roundup site” on the 20th of June 2014. GM maize and non-GM maize were grown on a farm in a distance of about 1 km to avoid cross pollination. Glyphosate-treated and untreated fields were separated by a field road. Maize was harvested in November 2014 and kernels were removed from the cobs on-site by machine. They had grain moisture levels in the usual range and were dried in a biological dryer, kept below 60 °C, down to a moisture level of 13–14%.
Diet preparation and analyses
Four tons of GM maize NK603, four tons of Roundup-treated GM maize NK603 and 7.8 tons of the near-isogenic non-GM comparator (maize kernels) were transported on the 14th of December 2014 to Germany (air freight) and stored in big bags in a commercial storage facility not storing other grains and under ambient climate conditions. No other plant material was stored at the facility. The quality of the kernels was analysed after import. The sampling of the big bags (i.e. kernels from the same variety and treatment) was performed according to Regulation 691/2013 (EU 2013b) by a representative of the Landesuntersuchungsanstalt für das Gesundheits- und Veterinärwesen Sachsen in January 2015.
In autumn 2015, an infestation with the Indian meal moth (Plodia interpunctella) was noticed and a fumigation with phosphine, which does not leave residues on/in the stored kernels, took place. The big bags were moved to another storage facility and stored under ambient climate conditions between November 2015 and April 2017 (storage conditions: 12.3 ± 6.2 °C, 64.6 ± 7.2% humidity, hourly recorded). In summer 2016, mice were viewed in the storing hall, so that the kernels were sieved and placed in rigid containers.
Kernels were shipped to ENVIGO/Mucedola srl (Milan, Italy), milled (mesh size: 1 mm) to prepare the feed. The formulation of the diets was calculated by the nutritionist of the company based on their standard feeds and ingredients for all feeding trials to provide a balanced nutrition. The feeds were isoproteic, isocaloric and adjusted to the dietary requirements of the rat strain Wistar Han RCC used in the feeding trials. Besides the milled maize, the formulation mainly consisted of other plant-derived ingredients, including wheat, wheat middlings, soybean meal, soybean oil and a rice protein concentrate, while it did not contain animal-derived ingredients (Supplementary Electronic Material, Table 1). Except for the GM maize, the other ingredients of the diets were supposed to be GM-free, but traces of GMO (traits) were frequently detected: this information is documented in the diet analyses file that can be accessed via the internet portal named CADIMA (Central Access Database for Impact Assessment of Crop Genetic Improvement Technologies; http://www.cadima.info). Traces of GMO (traits) were also detected in the commercially available seeds of control maize. The pellets were dried at a temperature of < 50 °C, coded in a blinded fashion and sent to the Slovak Medical University (Bratislava, Slovakia) for the feeding trials as vacuum-packed, γ-irradiated batches (irradiation dose = 25 kGy).
Diets were produced in six batches. A re-coded part of batch 3 of all diet groups was used for the 90-day feeding trial with a GM maize inclusion rate of 11 and 33% (complemented with the near-isogenic variety up to 33% maize in toto), whereas a separate batch of diets was prepared for the 90-day feeding trial with a GM maize inclusion rate of up to 50% (complemented with the near-isogenic variety up to 50% maize in toto). Maize and diet subsamples were retained at the animal feed producing facility (Mucedola srl) for analysis. Irradiated diet samples of > 1.5 kg each were sent to the Julius Kühn-Institut (Quedlinburg, Germany) and stored at − 80 °C. Batches 1, 3 and 5 were analysed. Diet samples for analyses were shipped to RIKILT Wageningen University and Research, where the feed pellets were milled and re-mixed. Subsamples of the milled material were analysed by RIKILT and for complementary analyses either dispatched to Covance (Madison, WI, USA) or SGS (Hamburg, GER). The parameters measured and the methods applied by each of the certified laboratories are listed in the diet analyses file (http://ww.cadima.info).
Study design
The study design of the two 90-day feeding trials (Tables 1, 7) was based on the OECD Test Guideline 408 for the testing of chemicals (OECD 1998) and EFSA recommendations on the performance of 90-day rodent feeding trials with whole food/feed (EFSA 2011a, 2014). The study design of the combined chronic toxicity/carcinogenicity feeding trial (Table 11) was based on the OECD Test Guideline 453 (OECD 2009) and EFSA considerations on the applicability of OECD TG 453 to whole food/feed (EFSA 2013).
As recommended by EFSA (2011a), two animals of the same gender were housed per cage and the cage was taken as the experimental unit. In the 90-day feeding trial with a GM maize NK603 inclusion rate of 11 and 33%, as well as in the combined chronic toxicity/carcinogenicity study there were five feeding groups (Tables 1, 11) and in the 90-day feeding trial with a GM maize inclusion rate of up to 50% there were eight feeding groups (Table 7). The cages were organized in blocks of 5–8 cages, and the feeding groups were randomized within blocks, i.e. a completely randomized block design was applied in all three feeding trials. There were separate blocks with male and female rats. To keep the units within blocks as homogeneous as possible, the ten heaviest rats of each sex (based on the weights 48 h after arrival) were housed in the first block, the next ten heaviest rats of each sex were housed in block 2, and so on. Except for feed consumption, which was determined per cage, all other parameters were measured in individual animals.
Rat feeding trials
The trials were performed in compliance with GLP in the experimental animal facility at the Department of Toxicology of the Slovak Medical University in Bratislava (Slovakia). Five-week-old male and female specific-pathogen-free Wistar Han RCC (RccHan™:WIST) rats were purchased from Envigo (San Pietro al Natisone, Italy). A large amount of histopathological data has been published on the three main strains of rat used in carcinogenicity studies, i.e. Wistar Han, Sprague–Dawley, and Fischer 344 (Weber 2017). It is well known that Fischer 344 rats are prone to develop spontaneous myeloid leukemia and Leydig cell tumours, while Sprague–Dawley rats show high incidence rates in the case of spontaneous pituitary gland adenomas and mammary gland neoplasms (Weber 2017). The Wistar Han rat strain was used to perform the three feeding trials described in this paper because it shows the lowest incidence of spontaneous tumours in most organs when compared to the Fischer 344 and Sprague–Dawley strains.
The feeding trials were started 1 week after delivery of the animals at the animal testing facility. A detailed examination of all animals to verify their health condition (see the section “Periodical health status observations”) was carried out just before the start of the feeding trials. Feed and water were supplied ad libitum; feed was changed once a week and water every day. Feed consumption was determined once weekly during the first 13 weeks, every 2 weeks thereafter and reported as the total amount of feed consumed by two animals in one cage per week or 2 weeks, respectively.
Periodical health status observations
Rats were inspected twice daily for changes in skin, fur, eyes, mucous membranes, occurrence of secretions and excretions as well as activity level and change in behaviour. A detailed physical examination of each animal out of the cage was performed prior to the beginning of the feeding trials, on day 1, once weekly during the first 13 weeks and once monthly thereafter to identify changes in skin, fur, eyes, mucous membranes, occurrence of secretions and excretions, autonomic activity such as lacrimation, piloerection, pupil size, unusual respiratory patterns as well as activity level and change in behaviour. At the end of the feeding trials a functional assessment of changes in gait, posture and response to handling, as well as the presence of clonic or tonic movements or bizarre behaviour (self-mutilation, walking backwards) was carried out. Sensory reactivity to auditory, visual and proprioceptive stimuli was recorded. An ophthalmologic examination of both eyes of all animals in the conscious state was performed prior to the beginning of the feeding trials and 2 weeks before the end of the studies. The eyes and the peribulbar structures were examined macroscopically after pupillary dilatation induced by instillation of a 0.5% tropicamide solution. Each animal was weighed 48 h after its arrival at the experimental animal facility of the Slovak Medical University, on the randomization day (i.e. day − 1), on the first day of the feeding trials, once weekly during the first 13 weeks, once every 2 weeks thereafter and at the end of the studies.
Haematology and clinical biochemistry analyses
In the case of the two 90-day feeding trials, blood samples from the tail vein of 16 males and 16 females per group after 16–18 h fasting were taken at the end of the study for the haematological analyses (with EDTA as anticoagulant) as well as for the clinical biochemistry analyses (without anticoagulant). In the case of the combined chronic toxicity/carcinogenicity study, at the end of months 3 and 6, blood samples from the tail vein of 40 males and 40 females per group after 16–18 h fasting were taken for the haematological and clinical biochemistry analyses in the same way as described above. At months 12 and 24, blood samples were taken from all animals in the different groups.
No later than 4 h after collection of the blood samples the following haematology parameters were measured by making use of a Sysmex K-4500 automated haematology analyser (Sysmex, Kobe, Japan): white blood cell count (WBC), red blood cell count (RBC), haemoglobin concentration (HGB), haematocrit (HCT), mean cell volume (MCV), mean corpuscular haemoglobin (MCH), mean corpuscular haemoglobin concentration (MCHC), platelet count (PLT) as well as the absolute (LYMA) and relative lymphocyte count (LYMR). For the differential leukocyte count, blood smears were stained with the May–Grunwald and Giemsa–Romanowski dyes and thereafter examined by light microscopy; the percentage lymphocytes, neutrophils, eosinophils, basophils and monocytes were determined by examining 200 cells.
The parameters alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), albumin (ALB), total protein (TP), glucose (GLU), creatinine (CREA), urea (U), cholesterol (CHOL), triglycerides (TRG), calcium (Ca), chloride (Cl), potassium (K), sodium (Na) and phosphorus (P) were measured maximally 4 h after collection of the blood samples in serum with an Ortho Clinical Vitros® 250 Chemistry System (Ortho-Clinical Diagnostics, Raritan, NJ, USA), whereas coagulation parameters were not determined.
Estrous cycle monitoring
17β-Estradiol was measured in the female rats being in the estrus phase of the estrous cycle. To monitor estrous cycles in adult female rats, daily vaginal lavages were taken during 10 days to track the estrous cycles by vaginal cytology. Vaginal lavages were obtained daily between 08:00 and 09:00 a.m. and examined under a low-power light microscope. Lavages were collected by flushing the entrance of the vagina with physiological saline, and a Giemsa–Romanowski stain was used to visualize the cells by optical microscopy. The stage of the estrous cycle was determined based on the presence of leukocytes (metestrus, diestrus), nucleated epithelial cells (proestrus) and cornified epithelial cells (estrus) (Long and Evans 1922; Caligioni 2009). A female rat that showed a constant 4- or 5-day vaginal estrous cycle was regarded as an animal with a regular estrous cycle. An extended estrus was defined as exhibiting cornified cells with no leukocytes for three or more days, and an extended diestrus was defined as the presence of leukocytes for four or more days (Cooper and Goldman 1999).
Hormone measurements
17β-Estradiol blood levels were measured using the Estradiol (rat) ELISA (EIA-5774) from DRG Instruments (Marburg, Germany). The intra assay coefficient of variation was 4.1%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 2.0 pg/mL. The cross-reactivity reported by the manufacturer was 4.2% for estrone, 3.8% for 17β-estradiol-3-glucuronide and 3.6% for 17β-estradiol-3-sulphate. Testosterone blood levels were measured using the Testosterone (Rat/Mouse) ELISA (EIA-5179) from DRG Instruments. The intra assay coefficient of variation was 5.3%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 0.2 ng/mL. The cross-reactivity reported by the manufacturer was 69.6% for dihydrotestosterone and 7.4% for dihydroxyandrosterone. T3 blood levels were measured using the Total T3 RIA Kit (IM 1699) from Beckman Coulter (Brea, CA, USA). The intra assay coefficient of variation was 9.9%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 0.75 nmol/L. T4 blood levels were measured using the Total T4 RIA Kit (IM 1447) from Beckman Coulter. The intra assay coefficient of variation was 10.2%, calculated by measuring one rat sample 20 times in one test routine. The lower detection limit was < 26 nmol/L.
Urinalysis
In the case of the two 90-day feeding trials, an analysis of urine of 16 males and 16 females per group was performed at the end of the study. In the combined chronic toxicity/carcinogenicity study, an analysis of urine of 20 male and 20 female rats per group was performed at months 3, 6, and 12; at month 24, urine samples from all surviving animals were analysed.
Urine was collected from each individual rat in metabolic cages for 16 h. The parameters total protein, glucose, ketone, leukocyte number, erythrocyte number, bilirubin, urobilinogen, nitrate, and pH were analyzed with Combur10Test® UX test strips (Roche Diagnostics, Mannheim, Germany) and semi-quantitatively evaluated by reflectance photometry with a Urilux S analyzer (Roche Diagnostics). Osmolarity was measured with the Advanced® Model 3300 micro-osmometer from Advanced Instruments (Norwood, MA, USA).
Gross necropsy and histopathology
At the end of the study, rats were anaesthetized after a 16- to 18-h fasting period with 10 mg/kg bw xylazine and 75 mg/kg bw ketamine. Blood samples for the corresponding analyses were taken from the abdominal aorta. Thereafter, the successive necropsy of the thoracic cavity, the abdominal cavity, the genital organs and the head was performed. Moreover, the wet weight of the kidneys, spleen, liver, adrenal glands, heart, thymus, uterus, ovaries, testes, epididymides and brain of all animals was recorded. Organ samples were stored in neutrally buffered 10% formalin, except for the eyes and the male reproductive tissues, which were immersed in Bouin’s solution. The formalin-fixed tissue samples were washed, dehydrated and embedded in paraffin. Thereafter, 4-µm thick sections were stained with haematoxylin and eosin for the light microscopic examination of the tissue structure.
A complete microscopic examination of the brain (including cerebrum, cerebellum and medulla/pons), spinal cord (at the cervical, mid-thoracic and lumbar level), pituitary, thyroid, parathyroid, thymus, oesophagus, salivary glands, stomach, small and large intestines, gut-associated lymphoid tissue (GALT), liver, pancreas, kidneys, adrenals, spleen, heart, trachea and lungs, aorta, tongue, eyes, Harderian gland, lacrimal gland, ovaries, cervix, vagina, uterus, female mammary gland, prostate, testes, seminal vesicles and coagulating gland, epididymides, urinary bladder, mesenteric and mandibular lymph nodes, peripheral nerve (sciatic), skeletal muscle, femur, sternum with bone marrow, and skin was performed.
In-life and necropsy phases were conducted “blind” at the Slovak Medical University (Bratislava, Slovakia). The group allocation was revealed to the histology facility (Wolfgang Baumgärtner, University of Veterinary Medicine Hannover, Germany) before the start of tissue staining and slide preparations, and to the pathologist (Roger Alison Ltd., Lampeter, UK) before the start of the histopathological examination, i.e. histology and histopathology were “non-blind”.
Histology was conducted “non-blind” for operational reasons. If histology had been conducted “blind”, all slides from all animals would have had to be processed, incurring significant penalties in cost and time. “Non-blind” histology allowed processing of only control and high dose group animals, together with macroscopic abnormalities and potential target organs. As the histology laboratory was not involved in the generation of data, this was not considered to have impacted upon the scientific evaluation of the study.
Histopathology was conducted “non-blind” because the consensus of opinion among toxicologic pathologists is that “blind reading” during the initial evaluation of tissues can have a negative impact on both the time it takes to accomplish the microscopic evaluation as well as the quality of the information obtained from the study (Iatropoulos 1984; Newberne and de la Iglesia 1985; SOTP 1986; Prasse et al. 1986; Goodman 1988; House et al. 1992; Crissman et al. 2004). “Blind reading” makes the task of separating treatment-related changes from normal variation more difficult and may result in missing subtle lesions. Awareness of the treatment group assignment allows the pathologist to intensely focus the histopathologic evaluation and to find important, and sometimes subtle, differences between the tissues of treated and untreated animals. “Blind reading” is commonly reserved for targeted review of lesions once they have been identified at the primary evaluation, particularly in a “Pathology Working Group”. In addition, all major published data on background lesions of rats from toxicity and carcinogenicity studies has been derived from “non-blind reading”. Reading the present series of studies “blind” would have significantly decreased the comparability of the findings to published data.
The resulting tissue sections were stained by hematoxylin and eosin at the above-mentioned histology facility. Histopathology examinations were performed in compliance with the principles of Good Laboratory Practice (GLP). The PathData V. 6.2d2 computer system was used for the recording and reporting of histopathology findings. A peer review of the study was conducted at the Test Site by Dr. C. Gopinath, Consultant Toxicological Pathologist. The histological sections and raw data (final signed histopathology report) will be returned to the Study Director for archiving. Other data related to the histopathological evaluation of the study will be archived by the study pathologist for at least 10 years. No data will be discarded after this period without the consent of the Test Facility. In the present report, summary tables with all necropsy and histopathology findings are presented. The complete histopathology reports (Alison 2018a, b, c) can be accessed via the internet portal CADIMA (http://www.cadima.info).
The statistical analysis of the incidence of tumors in the combined chronic toxicity/carcinogenicity study was based on the principles outlined by Peto et al. (1980). Peto’s method corrects for longevity (and hence for the period of time at risk) and applies a statistical approach appropriate to the cause of death (“context of observation”). This statistical analysis was performed on all animals from both phases of the study together. Where appropriate, tumors were also grouped for analysis (McConnell et al. 1986). The full analysis is presented in the PathData Appendix in the corresponding histopathology report (Alison 2018c). Neoplasms were considered statistically significant according to the Peto statistical test at p ≤ 0.05 for rare tumors and at p ≤ 0.01 for common tumors (FDA 2001). These analyses were performed using the PathData software.
Statistical analysis
This section provides a short description of the methods used for the statistical analyses presented in this paper, with more detailed descriptions of these and some additionally applied methods given in the Electronic Supplementary Material. A full description of all the statistical methods used and the results obtained in the three studies, except for the histopathology, is given in the statistical reports (Goedhart and van der Voet 2017, 2018a, b, c, d, e, f, g, h, i), which can be accessed via the internet. Statistical analyses were performed separately for the three studies, for males and females, and for the four time points (3, 6, 12 and 24 months) in the combined chronic toxicity/carcinogenicity study. Our main interest was to analyse the difference between each of the GM maize feeding groups and the control feeding groups with the same amount of maize.
Data preparation
All parameters except time of death and pH were transformed to the natural logarithmic scale and then averaged to the cage level. This implies that, rather than looking at differences between feeding group means, ratios between the GM feeds and the corresponding control feed are of interest. Since the endpoints uLeu, uHemogl and uKeton had zero values, half of the smallest positive value was added to these observations before taking the logarithm. Data from rats fed diets with non-GM maize in the EU-funded project GRACE (http://www.grace-fp7.eu) were used as historical control data to set equivalence limits according to the method of van der Voet et al. (2017). The GRACE data have been analysed before (Schmidt and Schmidtke 2014; Schmidt et al. 2015a, b, 2016, 2017; Zeljenková et al. 2014, 2016). More details of the data preparation including outlier detection are given in the Electronic Supplementary Material.
Summary tables of means and standard deviations (SDs), classified by the feeding groups, were prepared on the original non-transformed scale. These tables were obtained by first calculating cage means and then calculating the summary statistics. Extended tables with the number of observations and coefficients of variation (%) are included in the statistical reports (Goedhart and van der Voet 2017, 2018b, c, d, e, f). In the combined chronic toxicity/carcinogenicity study, when measurements were made after 3, 6, 12 and 24 months, the number of cages per feeding group was 35 for the body weights, 20 for the haematology, differential WBC and clinical biochemistry data and 10 for the urine data. The number of cages per feeding group in the two 90-day feeding trials was 8, except for the hormone data with six cages for males and eight cages for females.
Mortality after 2 years
Death events of animals were rare up to month 12, with at most 3 dead animals per group of 50. Therefore, only the mortality rates at 24 months were statistically analysed. Only the 50 animals per sex and feeding group that were part of the 2-year cohort were statistically analysed, because the other 20 animals were sacrificed after 1 year. Fitting a beta-binomial regression model (Williams 1982) by means of maximum likelihood to the number of dead animals in each cage as response variable revealed that the estimate of the beta-binomial over-dispersion parameter equals its bounded value of 0.0001 for both males and females. This indicates that there was no over-dispersion and, therefore, the ordinary logistic model (McCullagh and Nelder 1989) was used to analyse the number of dead animals per cage. After allowing for differences between blocks, one-sided pairwise Wald tests were performed for the one-sided null hypothesis that the mortality probability of a GM feed is equal to or smaller than the mortality probability of the non-GM control feed. Mortality was also analysed by survival analysis using the procedures KAPLANMEIER, RSTEST, RPROPORTIONAL and RPHFIT and direct programming in GenStat 18 (VSN International 2015).
Growth curves and feed consumption
For each individual rat, growth curves were fitted to the observed weights for restricted periods of time. For the data up to 13 weeks (3 months), an exponential growth curve \(A+B~\exp ( - \gamma \;{\text{Week}})\) with growth rate \(\gamma\) was fitted to the observed weights. For the data between weeks 13 and 27 (6 months), and separately for the data between weeks 27 and 52 (12 months), a simple linear regression, \({\text{Weight}}=\alpha +\beta ~\;{\text{Week}}\), was fitted to the observed weights, and the growth rate was defined as \(\gamma =\log (\beta )\). In general, the growth curves fitted very well and it was, therefore, decided to only analyse the estimated growth rates \(\gamma\), further called growth rate, and the final weights observed after week 13, 27 and 52. The latter are further called Weight_13 (or just BodyWeight in the two 90-day feeding trials), Weight_27 and Weight_52. No general growth curves could be fitted for the data between weeks 53 and 104 (24 months) in the combined chronic toxicity/carcinogenicity study, and, therefore, only the final weights for those animals that survived for 24 months, further called Weight_104, were statistically analysed.
Equivalence and difference tests of quantitative endpoints
Traditionally, the first step in the statistical analysis of toxicological data is to determine if there are statistically significant differences between groups (difference tests). Only in a second step of the classical approach, the toxicological relevance of such statistically significant differences is interpreted in the light of historical control data. In this paper, the “toxicological relevance” question is directly addressed using equivalence tests. This procedure has a known probability of a Type 1 error (failing to find a potential relevant difference). In contrast, the traditional two-step approach has an unknown probability of failing to find a potential relevant difference because such a difference might not be statistically significant due to lack of precision. Thus, the one-step approach, which always considers the historical control data (irrespective of the result of the difference test), is preferred to the two-step approach. This is also in line with the EFSA recommendation (EFSA 2011c) stating that less emphasis should be placed on the reporting of statistical significance and more on statistical point estimation and associated interval estimations as more information can be presented using the latter.
Equivalence testing was introduced for GM safety assessment for compositional data in the EFSA guidance for risk assessment of food and feed from GM plants (EFSA 2011b). In the context of 90-day feeding studies in rodents, EFSA (2014) recognized the potential advantages of equivalence testing and recommended further investigation. In response to this issue, an equivalence test was developed in the course of the G-TwYST project (see van der Voet et al. [2017] for a full description and explanation of the approach). This test compares the difference between a test (T) and a control (C) feed, obtained simultaneously in a current study, to the typical differences between reference (R) feeds obtained in one or more historical studies. The equivalence test corrects for between-study differences, and the within-study variation between references R, along with the residual variation, is used to set equivalence limits for the difference between T and C in the current study. The so-called distribution wise equivalence (DWE) criterion is used in this test. The equivalence test employs the concept of desired power (here chosen as 95%) in a simplified situation, where there is no between-reference variation, where the historical and current studies have the same residual variance, and where the current study is assumed to have a sample size as approved by a regulator. Here we assumed regulatory sample sizes equal to the replication, i.e. the number of cages for most parameters in the current studies, e.g. 8 for the 90-day studies.
A critical factor is that the equivalence test of van der Voet et al. (2017) requires historical control data. It is assumed that the natural variability in results between non-GM reference groups can be estimated from previous studies in the same experimental facility. For the 3-month data in the three feeding trials, the GRACE non-GM reference data described in section “Data preparation” serve this purpose. For the data obtained after 6, 12 and 24 months, no corresponding reference data were available. Variance components between and within reference feeds were not much different between 3 and 6 months in the 90-day feeding trial with the GM maize at an inclusion rate of 11 and 33%, and, therefore, the 3-month GRACE data were still used as a reference for the 6-month data. The data variation was generally larger after 12 and 24 months; therefore, no equivalence tests are reported here for these data, and the assessment of the biological relevance was in these cases based on the expert opinion of the study coordinator. Details of the equivalence test calculations are given in the Electronic Supplementary Material.
The equivalence test results in a confidence interval for the so-called equivalence limit scaled difference (ELSD), which can be used both for difference and for equivalence testing. The hypothesis of no difference is rejected in case the interval does not contain zero, while the non-equivalence hypothesis is rejected in favour of an “equivalence” conclusion when the interval fully lies inside the interval (− 1, 1). When the ELSD point estimate is between − 1 and + 1, but one or both of the confidence limits lie outside this interval, the conclusion, in the terminology of EFSA, is that “equivalence is more likely than not”. “Non-equivalence is more likely than not” when the ELSD point estimate is lower than − 1 or higher than + 1. As a further help for interpretation of the ELSD graphs, it can be noted that confidence intervals for ELSD will become wider when the within-group variation in the current study is higher than in the historical data. Such higher variation may be treatment-induced or just the consequence of a lower analytical precision in the current study.
Classical statistical methods for continuous data were applied in line with OECD and EFSA approaches, and very similar to the approaches followed in the GRACE project (Schmidt and Schmidtke 2014; Schmidt et al. 2015a, b). OECD Test Guidelines require numerical results to be evaluated by an appropriate and acceptable statistical method, but give no further guidance on statistical analysis. More detailed guidance, although strictly only meant for chronic and carcinogenicity studies of single compounds, not of complex whole foods, is provided in chapter 4 of the OECD Guidance Document 116 (OECD 2014). A classical analysis of variance was performed on the cage means after log-transforming the data. This was done in the statistical program R. The analysis of variance was performed according to the randomized block design employing the model “Block + Treatment”, where Treatment defines the five feeding groups. For details see the Electronic Supplementary Material. Significance results of the different tests comparing GM maize-fed groups with the control group have been indicated in the tables with means and SDs.
All tests were performed by comparing two-sided 95% confidence intervals to values representing equivalence limits or strict equality (no difference).
Stakeholder involvement and access to data
A key characteristic of the G-TwYST project was to allow for the involvement of a broad range of stakeholders and to ensure transparency of the research conducted, including access to the data. Stakeholder consultations were conducted not just on the results but also a priori on the study plans. The main stakeholder groups targeted were competent authorities, industry, civil society organizations, and researchers interested or experienced in animal feeding studies with GM food/feed. The geographical focus was on Europe.
In a first round, the draft study plans were subjected to a stakeholder consultation. The comments on these drafts, which had been received in the course of a stakeholder workshop and afterwards in writing, were discussed by the study team and taken into account when finalizing the study plans. As a result of the discussions during the stakeholder workshop, a 90-day feeding trial with a GM maize inclusion rate of up to 50%, as well as an analysis of hormone levels in blood samples of male (testosterone, T3 and T4) and female rats (17β-estradiol, T3 and T4) fed the GM maize at inclusion rates of 11 and 33% were performed. Study team members answered the stakeholder comments and questions in a written form, so that the stakeholders could verify if and how their comments had been taken into consideration and understand the underlying reasons for doing so.
In a similar way, the draft results and conclusions were subjected to stakeholder scrutiny. This final round included three subsequent steps, one workshop and two rounds of written comments: one round of written comments focused on the results of the 90-day studies. The stakeholder workshop and another round of written comments focused on the combined chronic toxicity and carcinogenicity study. To facilitate the consultation process, the documents including all full-length statistical as well as histopathology draft reports and draft interpretations were made available to registered participants that had signed a Non-Disclosure Agreement. Again, all stakeholder comments were considered when finalizing interpretation and conclusions, and written responses were prepared by the study team.
Maximum transparency of the process was established by publishing on the project website all stakeholder comments, as well as the written responses of the study team members in comprehensive consultation reports alongside with the draft and revised study plan (http://www.g-twyst.eu).
For each consultation round more than 700 stakeholders were invited by e-mail. Stakeholder participants were not selected in any way: all interested stakeholder representatives could participate. Overall 70 stakeholders from 19 countries participated in one or more consultation stages. A total of 158 stakeholder comments were received in written form and responded to by the study team. In each consultation stage, representatives of all main stakeholder groups targeted were involved and actively contributed.
Access to the raw data
In line with the G-TwYST transparency policy any interested person will have access to the raw data obtained in the frame of the G-TwYST project, including the clinical, ophthalmological, body weight, haematology, clinical biochemistry, organ weight, necropsy and histopathology data presented in this study, through the internet portal CADIMA (http://www.cadima.info).