Post market environmental monitoring of genetically modified organisms (GMOs) after their deliberate release is a legal obligation in the European Union (Directive 2001/18/EC and EC-Regulation 1829/2003). It has been introduced in order to identify direct or indirect, immediate and/or delayed adverse effects of GMOs after their release on the market. The monitoring of environmental effects is subdivided in two parts: case-specific monitoring and general surveillance (GS). Case-specific monitoring deals with adverse effects that were identified in the environmental risk assessment and is not always required. For a crop containing a Bt gene that protects it against herbivory by beetle larvae, case-specific monitoring could, for instance, include observations on non-target insects in the field. GS is aimed at the detection of unexpected adverse effects of the GMO and is always required in the EU (Rotteveel and den Nijs 2009). GS should be carried out without having any specific hypothesis about adverse effects of the GMO. Various bioindustries now cooperate in Europabio (http://www.europabio.org) and this organization aims to standardize GS and process the data when they come in. The farmer questionnaire that is part of the GS plan now typically asks: “General impression of the occurrence of wildlife (mammals, birds, and insects) in fields of the GM crop compared to a conventional crop”. Respondents can answer with “As usual”, “More” or “Less”. By referring to the “usual” situation, comparison with a control treatment is avoided and the phrasing seems a smart way to spot unforeseen effects, while keeping paperwork to a minimum. However, one can ask whether the results that are obtained in this way answer the initial question about possible adverse effects of the GMO. I would argue that they do not.

In support of the current GS plan, Schmidt et al. (2006) argued that the ordinal variable with three categories (less, usual and more) could be changed into a variable with two categories (usual + more and less). Farmers with historical knowledge know the baseline and the 95% confidence limits between which population means fluctuate. Schmidt et al. (2006) argued that without any effect of the GM crop, farmers would be expected to score in 5% of the cases that population numbers are less than usual and that this “baseline threshold” could serve as a null-hypothesis against the alternative hypothesis that the situation is worse in more than 5% of the cases. This idea of a “baseline threshold” is not yet clearly developed and is asking rather a lot from the respondents. There is, however, a simple alternative since it will always be possible to use a binomial test on the number of answers for “less” and “more”, neglecting the “usuals”, under the null-hypothesis that the number of answers in these categories is equal. Such a basic statistical test is still useful for the purpose of GS.

The major problem with the proposed method for GS is, however, that it leaves out the controls. Populations fluctuate in time and over a very long period we may estimate the real average population size with some accuracy. For a farmer it would already be a long period when he has, for instance, 10 years expertise with a specific conventional crop. When such a farmer switches to a GM crop, he or she would base the answer in the questionnaire on expertise from the last 10 years. When farmers then score “less” for 10 years in a row we can be confident that wildlife is decreasing in their fields. However, we do not know the cause. The GM crop may have a negative effect on wildlife. But it might also be that the GM crop has no negative effect and that the decrease is due to environmental change, for instance, because the climate became warmer. The same methodological problem applies when the farmer would score “better” for the GM-crop for 10 years in a row, but in that case we can at least be content that biodiversity is on the rise and temper our worries about unforeseen environmental effects of GM crops. Thus, with the current method for GS any other negative effect causing a decrease of wildlife becomes confounded with the effect of using a GM versus a conventional crop. This is neither in the interest of the bioindustry nor of those people that are concerned about biosafety of GMO’s. An additional worry might be that respondents value things that happened in the “good old days” more positively than what happens now. This psychological effect could also lead to many choices for “less” in the questionnaires, even when the real situation does not change. An example from a completely different field may help to illustrate my point. Suppose people would be asked for the next ten years whether crime is now better, worse or as usual, as compared to the previous decade. It is quite likely that the people scoring “worse” will outnumber those that score “better”. However, this answer may reflect the changing attitude of people when they grow older or what they read in the newspapers, it does not necessarily reflect crime rates. For that we need real numbers and this also applies to GS of GMO’s.

The control or “usual” situation is not defined in the farmer questionnaires and I interpreted it as historical knowledge of a decade of growing a certain crop. One might also ask the farmer to compare with the situation on field of his neighbour when that neighbour cultivates a conventional crop. While this solves the previous problem of confounding effects it is not very practical. When a GM-crop is more profitable the neighbour will most likely also opt for the same GM-crop. Furthermore, a farmer will make observations on birds, wildlife and insects while working on his own field. It is not likely that he will spend as much time at the other side of the fence. Finally, the comparison between neighbouring fields will not tell us anything about the time course of population numbers.

I would therefore argue to collect data on GM and conventional crops simultaneously, so that there is a well-defined control. Quantitative data are best and one could ask farmers, for instance, how many birds or large or small mammals they spotted and what would be their best estimate for the number of ladybirds on a representative 10 × 10 m sampling plot at the end of the season. When both farmers with GM crops and conventional crops are included in the survey, a decade of GS will result in two accompanying time series for the two crops and it will be possible to separate the effect of time from the effect of growing a GM or non-GM crop. Per farmer my proposed method hardly increases the effort needed to complete the questionnaire. There will be extra costs of including farmers with conventional crops in the monitoring program but these costs can be kept at a minimum by making this group smaller than the group of farmers with a GM-crop. Long term data sets are hard to come by and even the farm scale evaluations (Hails 2002), which were set up in a statistically rigid way with adjoining GM and non-GM crops, lasted for only four consecutive years. Farmers are likely to switch between GM and non-GM crops and grow alternate crops, as required by the current economic situation. Although this complicates the statistics, it is still possible to analyze the dependent variable, abundance of a non-target organism in a certain year, in dependence of a number of factors and their interactions (for some guidance see Perry et al. 2009). These factors will include whether the crop is GM or conventional, which crop species is grown and relevant agronomic factors (Züghart et al. 2008), all measured in the same year. Additional factors may include the same factors measured in the previous, or an even earlier year. Data can be analyzed by standard ANOVA or Generalized Linear Models, which are now routinely available (Crawley 2007). The whole analysis breaks down, however, when the dependent variable is measured in an inappropriate way. The indiscriminate “wildlife” should be dropped and subdivided into in smaller, more coherent groups like birds, large mammals, small mammals, ladybirds and other insects. The answers in the questionnaire should reflect population numbers, data measured on preferably a metric scale (Schmidt et al. 2006; Perry et al. 2009) and such data should not be scaled with respect to an ill-defined control.

With respect to detecting ecological effects, GS can, in my opinion, develop in two directions. We can conclude that the current method is a paper tiger that we can do without and focus, as in the US, on specific monitoring by the experts. Alternatively, we make the small changes I proposed and use the farmer’s expertise in the best possible way. Both routes would be better than the present method of scaling towards an ill-defined control.