# Comparing the effects of calibration and climate errors on a statistical crop model and a process-based crop model

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s10584-014-1264-3

- Cite this article as:
- Watson, J., Challinor, A.J., Fricker, T.E. et al. Climatic Change (2015) 132: 93. doi:10.1007/s10584-014-1264-3

- 3 Citations
- 1.2k Downloads

## Abstract

Understanding the relationship between climate and crop productivity is a key component of projections of future food production, and hence assessments of food security. Climate models and crop yield datasets have errors, but the effects of these errors on regional scale crop models is not well categorized and understood. In this study we compare the effect of synthetic errors in temperature and precipitation observations on the hindcast skill of a process-based crop model and a statistical crop model. We find that errors in temperature data have a significantly stronger influence on both models than errors in precipitation. We also identify key differences in the responses of these models to different types of input data error. Statistical and process-based model responses differ depending on whether synthetic errors are overestimates or underestimates. We also investigate the impact of crop yield calibration data on model skill for both models, using datasets of yield at three different spatial scales. Whilst important for both models, the statistical model is more strongly influenced by crop yield scale than the process-based crop model. However, our results question the value of high resolution yield data for improving the skill of crop models; we find a focus on accuracy to be more likely to be valuable. For both crop models, and for all three spatial scales of yield calibration data, we found that model skill is greatest where growing area is above 10-15 %. Thus information on area harvested would appear to be a priority for data collection efforts. These results are important for three reasons. First, understanding how different crop models rely on different characteristics of temperature, precipitation and crop yield data allows us to match the model type to the available data. Second, we can prioritize where improvements in climate and crop yield data should be directed. Third, as better climate and crop yield data becomes available, we can predict how crop model skill should improve.

## 1 Introduction

While knowledge of crop physiology comes from experiments at the field scale, climate models have skill at the regional scale. Regional scale crop models have been developed as principled frameworks for upscaling field scale knowledge to the regional scale, in order to help capture and explore the key crop-climate processes. While regional scale crop models can differ significantly in their structure and assumptions, they all rely on the quality of available climate and crop yield data. ‘Quality’ of this data is not necessarily a matter of higher temporal/spatial resolution. Rather it depends on whether the model-significant statistics of the input data accurately reflect reality.

The projected response of crops to climate variability and change can vary significantly according to the methodology chosen (Challinor et al. 2014). This variation can be ascribed to three causes: structural differences between crop models, differences in crop calibration data, and differences in weather inputs. Structural differences in models result from the choice of parameterisations for representing crop growth and development (White et al. 2011). These choices are often related to the spatial scale for which the model is designed (Challinor and Wheeler 2008). Choices regarding model calibration are also related to the spatial scale of the assessment: regional-scale models typically have less crop growth and development data for calibration. Calibration and application of models at regional-scales invariably involves simplifying spatial heterogeneity, and can therefore result in aggregation error (Hansen and Jones 2000).

Disentangling these three sources of uncertainty is not trivial. Efforts to separate structural model uncertainty from calibration uncertainty have begun, and show promise (Asseng et al. 2013). Frameworks for measuring and interpreting climate model uncertainty have, at least for the case of climate change, a somewhat longer history (Ramirez-Villegas et al. 2013). Recent work includes assessments of the uncertainty associated with bias correction of climate model output (Hawkins et al. 2013b; Koehler et al. 2013). In order to identify the precise sources of climate-induced uncertainty in crop yield, recent studies have also systematically perturbed weather inputs (Berg et al. 2010; Lobell 2013; Watson and Challinor 2013).

There are significant structural differences between process-based and statistical model crop models. The latter can do an excellent job of reproducing historical temperature-induced yield variation at regional scales (Hawkins et al. 2013a), whilst the former are useful for determining the causes of yield variation (Lobell et al. 2013), and may be more robust to non-stationarity in the relationships between weather and crop yield. Comparisons between process-based and statistical models are at an early stage. Estes et al. (2013) found that a statistical model produces larger climate losses than a process-based model for maize and wheat in South Africa, leading them to recommend increased intercomparison of these two types of model.

This study draws on the research described above in order to examine interactions between structural model uncertainty, input weather uncertainty, and input calibration uncertainty. We apply systematic perturbations to observed weather and record the impact on the skill of two regional-scale models: one process based and one statistical. In order to assess calibration uncertainty, three calibration configurations are used. Each configuration uses yield at one of three spatial scales, plus crop harvested area. Thus we define “calibration” as any use of any crop observational data to improve crop model results.

By analyzing the effects of calibration uncertainty and systematic weather errors using two structurally distinct crop models, this study quantifies how different model types can be influenced by different input data uncertainties. This information is important for determining whether model and input data are fit for purpose in a given study, and informs the allocation of resources for future improvements. In addition, analyses such as this study provide information on how crop model skill may improve as weather and calibration data improve.

## 2 Materials and methods

Statistical crop models are typically designed for a particular study. Thus, to perform a direct comparison between a statistical crop model and a process-based crop model, this analysis replicated the maize crop hindcast scenario of Hawkins et al. (2013a). A process-based crop model was ported to the same scenario. Both these models are described below.

### 2.1 Statistical crop model

*t*:

*Y*is maize yield,

*X*is the number of days above a temperature threshold (32 °C) for the June-August growing season,

*P*is mean precipitation for June-August,

*g*is the expected yield given average precipitation and no hot days, and

*e*is a stochastic error term. The

*β*parameters and

*g*are trained using a penalized likelihood function, with

*g*being a cubic regression spline. For further details of this model, see Hawkins et al. (2013a). Since some studies assume a linear technology trend (e.g., Lobell and Asner 2003; de Wit and van Diepen 2007), we analyzed the model just described (

*STAT*

_{nonlinear}), as well as the case where a linear trend is used for

*g*(

*t*) (

*STAT*

_{linear}).

The historical daily temperature and precipitation data was obtained from the E-OBS gridded observational dataset (Haylock et al. 2008). Whereas Hawkins et al. used E-OBS version 5.0 on a 0.5° x 0.5° grid, we used version 7.0 on a 0.25° x 0.25° latitude/longitude grid. This higher resolution grid provided 1,035 locations in the study region.

^{1}and departmental scale observations were from AGRESTE - Statistique Agricole Annuelle, obtained via ARVALIS - Institut du végétal. A total of 22 regions and 96 departments were used in this study. Yield observations covering all three of these spatial scales were only available from 1980–2007, so this study only investigated that time period. To allow a direct comparison of input datasets with the GLAM crop model (described below), the yield data was linearly detrended with the level set to that of the start year.

The statistical crop model was run at each location on the E-OBS grid, so each of the kg ha^{−1} yield observation datasets were regridded to the E-OBS data’s resolution. The weighted mean of harvest area was calculated for each E-OBS grid cell according to the dataset provided by Monfreda et al. (2008).

### 2.2 Process-based crop model

We used the General Large Area Model for annual crops (GLAM; Challinor et al. 2004) as a case study of a process-based regional-scale crop model. This model has been used to simulate a range of crop types in both present-day and future climates (Koehler et al. 2013; Challinor et al. 2005). The maize version of the model was adapted from an African model developed by Greatrex (2012). GLAM relies on values for precipitation, minimum and maximum temperature, solar radiation, CO_{2} level, planting window and soil hydrological properties to simulate crop development and yield. Calibration of the model is performed by adjusting the yield gap parameter (YGP) such that the difference between simulated and observed mean yields is minimized. Maize accounts for approximately 56 % of the irrigated area of France (data retrieved from AQUASTAT provided by the FAO),^{2} but the ratio of irrigated to non-irrigated maize area varies significantly across the country.^{3} Since we wanted to compare the effects of systematic error in both temperature and precipitation, two configurations of GLAM were analyzed – one with irrigation (*GLAM*_{irr}) and one that was rainfed (*GLAM*_{rfd}).

GLAM was run on the same E-OBS grid as the statistical model. The daily temperature and precipitation values for the study period 1980–2007 were taken from the E-OBS dataset as described in Section 2.1. The yield and harvest area input data were identical to that used for the statistical model. The CO_{2} concentration was set to 357.07 ppm, the value observed at Mauna Loa for the mid-baseline of 1993 (Tans and Keeling data retrieved 2013). The solar radiation data for the time period was taken from the ECMWF’s ERA-Interim reanalysis,^{4} and was regridded to the E-OBS data resolution using the area weighted average. The planting window was set according to the dataset produced by Sacks et al. (2010). The soil hydrological values of saturated volume, lower limit volume, and drained upper limit were taken from the WISE Soil Database for Crop Simulation Models version 1.1 (Romero et al. 2012).

The YGP value was calibrated using increments of 0.01. The remaining GLAM parameters were set to their default values with the exception of transpiration efficiency (TE) and the maximum value of normalized TE (TEN_MAX). These parameters can have a significant impact on simulated yield. The value of TE was set to 5.45 pa and TEN_MAX was set to 6.0. These values were taken from Tallec et al. (2013), and are more realistic for the temperate region of this study than the default values. Unlike the statistical model, GLAM simulates processes such as planting and emergence, so the start of simulation was set to April (Birch et al. 2003). Initial tests of this experimental setup showed that GLAM was consistent with the maize development periods for this region as described by Sacks et al. (2010).

### 2.3 Simulating errors in climate data

In order to compare the effect of systematic climate data errors on these two types of crop models, we deconstruct the temperature datasets described in Sections 2.1 and 2.2 into terms that represent different time scales, and then perturb these terms to alter the mean and variance of the data at different temporal scales. As mentioned in Section 2.2, the statistical model uses temperature and weather data for the June-August season, while GLAM uses April-November. To ensure that both models were presented with identical input perturbations, the transformations described below were applied for the full year January-December.

*z*(

*y*,

*m*,

*d*) be the observed value of a given temperature value on day

*d*in month

*m*of year

*y*. The complete 1980–2007 time series was then deconstructed as:

*μ*is the overall mean,

*α*

_{y}is the average deviation for year

*y*,

*β*

_{m}is the deviation for month

*m*averaged over all years (so

*β*

_{1}…

*β*

_{12}represents the mean seasonal cycle),

*γ*

_{ym}is the year-dependent deviation from the mean seasonal cycle, and

*δ*

_{ymd}is the daily deviation from the monthly mean. Figure 2 illustrates the components of this deconstruction for a sample of minimum daily temperatures.

*z*

^{∗}:

*θ*= (

*μ*

_{θ},

*α*

_{θ},

*β*

_{θ},

*γ*

_{θ},

*δ*

_{θ}), i.e., each of the parameters in

*θ*are multiplicative adjustments to their respective terms in the deconstruction. The original time series is recovered by setting

*θ*= (1, 1, 1, 1, 1). Figure 3 shows examples of these adjustments made to the time series of Fig. 2.

*P*was perturbed using a different scheme, so that (1) transformed datasets could not contain negative values and (2) the pattern of days with no precipitation was retained, i.e., just the intensities of days with precipitation are perturbed. In this scheme the logarithm of the monthly means was used, and daily fluctuations are dealt with separately. That is, we let

*z*(

*y*,

*m*) =

*log*[

*P*(

*y*,

*m*, ⋅)] and deconstruct as

*θ*parameters, but omit the

*δ*terms to get

*z*

^{∗}(

*y*,

*m*). A transformed daily precipitation time series

*P*

^{∗}is recovered by setting

*P*

^{∗}(

*y*,

*m*,

*d*) would be undefined (i.e.,

*P*(

*y*,

*m*, ⋅) = 0). This value is insignificant with respect to the simulation of maize in both crop models.

Values for *θ* were chosen to encompass a wide range of systematic errors that may occur in climate model datasets, to inform future studies whether there are potential issues when assessing if a dataset is fit for purpose. We chose values for *μ*_{θ} to be +/-45 % of observed *μ*. In this study’s region for the reference period 1970–1999, the maximum difference in mean maximum summer temperature found between E-OBS observations and the CMIP3 ensemble (Meehl et al. 2007) was 9 °C, which is 37 % of the observed mean (Hawkins et al. 2013b). Note that an analysis of the more recent CMIP5 climate model ensemble (Taylor et al. 2012) indicates that its range of ensemble spread is not reduced compared to CMIP3 (Knutti and Sedláček 2013). It is difficult to obtain systematic errors in the standard deviation of climate models at a range of temporal scales. Climate models have the potential to have both higher and lower variance than observations depending on the selected scale. Thus, to ensure we evaluated the effect of a wide range of systematic errors, we compared crop model performance with *z*^{∗} datasets ranging from no variance through to 3x the respective values of E-OBS for *α*_{θ}, *β*_{θ}, *γ*_{θ} and *δ*_{θ}). In order to provide a set of reference crop model sensitivities that directly compare the relative importance of temperature and precipitation errors, we transform precipitation with the same *θ* values as for temperature. Note that the aim is not to directly equate numeric changes made to the temperature and precipitation timeseries, but rather to compare the effects of changes in relative errors of their statistical components.

*μ*was tested for values of

*μ*

_{θ}where

*θ*= (

*μ*

_{θ}, 1,1,1,1)). The effect of these errors were assessed in terms of

*ΔRMSE*and

*ΔCCOEF*(defined below).

*ΔRMSE*was chosen to measure the accuracy of predictions, but since this metric heavily penalizes models that correctly capture the weather / yield relationship but incorrectly predict mean yield,

^{5}

*ΔRMSE*was assessed in tandem with

*ΔCCOEF*, which measures the correlation between observed and simulated yields. Each crop model was calibrated using weather and crop data for the period 1980–2002, and

*ΔRMSE*and

*ΔCCOEF*were defined as follows for the period 2003–2007. Let

*RMSE*

_{baseline}denote the RMSE of simulated yield and observed yield, and

*RMSE*

_{transformed}denote the RMSE of simulated yield and observed yield after the transformation has been applied. Similarly, let

*CCOEF*

_{baseline}denote the correlation coefficient (CCOEF) of simulated yield and observed yield, and

*CCOEF*

_{transformed}denote the CCOEF of simulated yield and observed yield after the transformation has been applied. Then:

*ΔRMSE*and

*ΔCCOEF*were calculated for each E-OBS grid cell using

*STAT*

_{nonlinear},

*STAT*

_{linear},

*GLAM*

_{rfd}and

*GLAM*

_{irr}, and each assessment was repeated using country, regional and departmental scale yield calibration data.

### 2.4 Overview of study design

*RMSE*

_{baseline}and

*CCOEF*

_{baseline}of crop model simulations was measured for each of the 1,035 E-OBS grid locations, using:

4 crop model types (

*STAT*_{nonlinear},*STAT*_{linear},*GLAM*_{irr},*GLAM*_{rfd}), and3 yield data sources (country, regional, and departmental scale observations).

*ΔRMSE*and

*ΔCCOEF*for all combinations of the following climatic variations:

## 3 Results and discussion

Differences in the response to calibration and weather errors were found between the statistical crop models and the process-based crop models. Also, some key data features were critically important to both model types. These results are discussed below.

### 3.1 The importance of harvest area

*STAT*

_{linear},

*STAT*

_{nonlinear},

*GLAM*

_{irr}and

*GLAM*

_{rfd}) for each E-OBS grid cell. This data is plotted against the harvest area of the respective grid cell (

*x*-axis) and separated by the scale of the yield calibration data. This figure shows a clear relationship between harvest area and model skill. Regions with low reported harvest areas are relatively more likely to give poor results with regional scale crop models when compared to regions with higher harvest areas. This relationship is robust across both structurally distinct crop model types, which are based on very different maize development and calibration assumptions. The effect is also robust across all three scales of the yield observations, which each have distinct statistical variance.

These crop models rely on a signal existing between the model-relevant weather statistics and the model-relevant maize yield statistics. The strength of this signal in turn relies on the statistics of *both* the crop and weather datasets to accurately reflect conditions at the scale the crop model is being run. The lower the harvest area of a given grid cell, the more likely it is that reported kg ha^{−1} crop yields are the result of local sub-grid conditions not reflected in the interpolated grid cell weather. Thus the increasing likelihood of poor model performance in grid cells with low maize harvest area indicates an increasing mismatch between grid cell weather statistics and local crop yield observations.

The spatial scale of yield calibration data also had an effect on crop model skill. At any given value of harvest area, higher resolution yield data generally results in poorer model skill. This effect is particularly evident for GLAM at harvest areas < 10 %, where distinct differences in model skill can be seen depending on the spatial scale of the yield calibration data (see the CCOEF panels for *GLAM*_{irr} and *GLAM*_{rfd} in Fig. 4). This finding is contrary to the expectation that aggregation error decreases as spatial resolution increases, since the weather and the yield data are more closely matched to those experienced in reality (Hansen and Jones 2000). If model skill only relied on the spatial compatibility of weather and crop observations, we would expect departmental scale calibration data to outperform country scale data, since departments are still larger than the E-OBS grid cells. A possible explanation is that location-specific variations, such as management practices and weather conditions, cancel out over large areas, thus resulting in a stronger weather/yield relationship in the aggregated data. This finding appears to question the value of high resolution yield data for improving the skill of crop models. A focus on accuracy is likely to be of greater value. However, since these models were not optimized for local conditions, further work is clearly needed before definitive recommendations can be made.

### 3.2 The effect of simulated climate errors

*STAT*

_{linear}and

*STAT*

_{nonlinear}were similar in their responses to simulated errors. This was also the case for

*GLAM*

_{rfd}and

*GLAM*

_{irr}. Here we only show the responses of

*STAT*

_{nonlinear}and

*GLAM*

_{rfd}.

Both the statistical model and GLAM were significantly more influenced by transformations applied to temperature than those applied to precipitation. The fact that both model types exhibited this effect indicates that this is a model-independent result. Previous studies have also indicated a stronger maize yield response to temperature than to precipitation (Lobell et al. 2013; Hawkins et al. 2013a). This result can in part be explained by the method used to transform precipitation data. As described in Section 2.3, precipitation data was perturbed differently to temperature, in order to retain the pattern of days with rainfall, and to disallow negative values. By only adjusting the intensity of observed rainfall events, the overall impact of precipitation transformations can be less than those of temperature. Reductions in mean precipitation by up to 42 % did not significantly impact the skill of either model type – this is not a water-limited scenario. The statistical model’s performance was affected by precipitation changes in average yearly deviation, and year dependent deviations from the seasonal cycle.

The responses to temperature transformations differed significantly between GLAM and the statistical models. *GLAM* was predominantly affected by transformations in climatic mean (*μ*) and monthly deviations (*β*_{m}) with low and high values of *μ*_{θ} and *β*_{θ}, respectively. Transformations resulting in over-estimation typically resulted in increases in GLAM *ΔRMSE*, for all scales of yield observations (country, regional and departmental). An exception is the case for increases in the daily deviations from the monthly mean (*δ*_{ymd}) when country level crop yield data is used. This yield dataset was by definition common to all the grid cells analyzed, and exhibited the lowest standard deviation of the yield datasets (Fig. 1).

The statistical model was predominantly influenced by reductions in overall mean (*μ*), monthly deviations (*β*_{m} and *γ*_{ym}) and daily deviations from the monthly mean (*δ*_{ymd}). Increases in transformations also impacted the performance of the statistical model, but this was highly dependent on the scale of crop yield observations used. Increased overall mean increased statistical model RMSE, but only in the case of country scale yield observations. Higher resolution yield observations resulted in the statistical model exhibiting no negative change in skill for increases in overall mean. Positive changes to the average yearly deviation (*α*_{y}) resulted in greater loss of model skill as crop yield resolution increased. Positive changes in monthly deviations and daily deviations in the monthly mean also resulted in some loss of model skill, but again where country level yield observations were used. Overall, the statistical model was significantly more sensitive than GLAM to the spatial scale of the yield calibration data.

The statistical model generally exhibited greater variance than GLAM across the grid cells analyzed (Fig. 7). Unlike the statistical model, GLAM simulates processes in addition to statistical interactions between temperature, precipitation and crop yield, making it less susceptible to these transformations. However, large reductions in the overall mean (*μ*) and monthly deviations (*β*_{m}) of temperature did result in significant variation in GLAM’s response.

Care must be taken in drawing conclusions about the relative quality of these two models using these results. Resistance or sensitivity to transformations is not an indication of model quality, but rather a pointer to what data errors should be of concern to impacts modellers. Importantly, the relative effects of (1) individual weather data transformations, and (2) the scale of yield calibration data, differs depending on model type.

### 3.3 Comparison of statistical and process-based models

Statistical models can test for relatively simple relationships and extrapolate based on observed relationships. Process-based models contain more assumptions (i.e. physiological processes and crop-climate relationships). The statistical model is directly influenced by the loss of statistically significant information in temperature, precipitation and yield data, while the process-based model is affected by altered process interactions resulting from such errors. The main difference between the two models used in this study is that the statistical models are designed to be entirely fit to data, whereas GLAM is more constrained in its behaviour by the processes that comprise the model. For either type of model, errors in relevant input data characteristics will result in the model failing when used out of sample. The degree to which such errors are relevant to each model is not clear a priori. Identifying the data characteristics that each model is sensitive to, and quantifying their effects, are pre-requisites for answering our research questions. Three observations are relevant:

First, the yield calibration data, in the absence of weather perturbations, affect the two models in different ways. GLAM RMSE is more consistent across the three different yield calibration datasets than the statistical models (Fig. 4). In contrast, for CCOEF at low harvest areas, the converse is true. This is not simply the result of more degrees of freedom resulting in lower RMSE, since the non-linear statistical model performs slightly worse than the linear model. Similarly, the greater number of degrees of freedom in GLAM do not automatically result in an improved CCOEF – here GLAM is constrained by the processes it simulates.

Second, susceptibility to weather errors vary according to the model, the variable and the timescale of perturbation. Overall, GLAM is less susceptible to errors in precipitation than the statistical model (Fig. 6). The calibration process in GLAM results in some precipitation bias being corrected, even though this is not the primary aim of calibration (Challinor et al. 2005). Temperature perturbations tend to produce larger errors. The sign and frequency of perturbations that produce the greatest impact tends to differ between the two models. In the statistical model, reducing the amplitude of temperature variation (*α*_{θ}, *β*_{θ}, *γ*_{θ} and *δ*_{θ} < 1) produces the largest errors. In GLAM, it is predominantly increases in amplitude that have the greatest impact on skill. GLAM is particularly prone to errors in mean temperature (*μ*_{θ}) and the magnitude of the seasonal cycle of temperature (*β*_{θ}).

Third, the spatial coherence of model response across grid cells differs between GLAM and the statistical models (Fig. 7). Errors in GLAM in response to weather perturbations are generally more spatially systematic (across grid cells) than when those same perturbations are introduced to the statistical models. This result reflects the fact that the spatial calibration of GLAM is restricted to one single parameter, whereas the statistical models have more degrees of freedom across space. This suggests that weather biases may be easier to correct in a process-based model than a statistical model, and agrees with previous work calling for greater attention to measurement error in statistical crop models (Lobell 2013).

## 4 Conclusions

Care must be used in interpreting these results outside the context of this study. A single crop was analyzed in a single temperate country, and models were not optimized for local conditions beyond the use of their automatic calibration routines. However, particularly where similar results were found using different model structures, some general comments can be made.

The process-based models and the statistical models used in this study were found to be susceptible to different types of input data error. These two model types make different choices of model design and calibration, so this result is not surprising. The contribution here is in the quantification of the effects that different errors have on these model types. The next step is to identify to what extent widely used crop model inputs, such as global climate models, exhibit such errors.

For both regional scale crop model types, and for all three spatial scales of yield calibration data, we found that model skill is most reliable where growing area is above 10-15 %. Thus information on area harvested would appear to be a priority for data collection efforts.

The GLAM process-based model can compensate for some loss in weather information and was found to be resilient to differences in yield data resolution, while mean statistical model response was resilient to overestimation errors. These differing responses to input data error raise the intriguing possibility of using process-based and statistical models in tandem to improve crop yield predictions.

Biases and errors in temperature, precipitation and yield input data influence the results of crop models. Consequently, the management advice given on the basis of these models is subject to influence by these errors. Understanding the detailed impact of different error types helps improve crop yield projections, and can guide efforts to improve models and datasets. The methodology introduced by this study can be applied to a range of impact modelling scenarios that utilize daily weather data. Extending this study to assess the potential impact of data errors on different model types, crops, and locations, would provide modelers with key information for designing and evaluating impacts studies.

EUROSTAT: http://www.eea.europa.eu/data-and-maps/figures/ds_resolveuid/5B48E834-22A2-42D5-A247-6E8B71CCCA36http://www.eea.europa.eu/data-and-maps/figures/ds_resolveuid/5B48E834-22A2-42D5-A247-6E8B71CCCA36

For example, a model with a flat response close to the mean can have a lower *ΔRMSE* than a model that correctly predicts yield response but with an incorrect mean.

## Acknowledgments

The authors were supported by the UK NERC EQUIP project. JW and AC were also supported by the European MACSUR Knowledge Hub. Thanks to David Gouache for providing the departmental maize yield observations for France. Thanks also to Ed Hawkins, Daniel Smith and Karina Williams for useful discussion, and to Helen Greatrex and Tom Osborne for contributing the maize version of GLAM.

We acknowledge the E-OBS dataset from the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com) and the data providers in the ECA&D project (http://www.ecad.eu).