1 Introduction

Climate change (expressed in the short term as extreme weather phenomena) and the increased demand for food are two of the most important future global challenges for agricultural systems. The Foresight Report (2011) has projected a likely estimate increase in the global food demand equal to 70% by 2050 due to population increase and shifts in consumer attitudes and preferences.

The development, grow and yield of crops is influenced by the seasonal patterns in rainfall and temperature; and therefore, any future alteration of these, may have significant impacts on agricultural production (Knox et al. 2010a; Falloon and Betts 2010; Murphy et al. 2009). Future projections of increased daily temperature and humidity in the atmosphere can also increase the risk of agricultural pests and diseases as well as deteriorate the land available for agricultural activities due to sea rise level (Daccache et al. 2011; Knox et al. 2010b).

According to the Department for the Environment, Food and Rural Affairs (Defra, UK), agricultural Total Factor Productivity (TFP) has realised a significant drop during the period 2007-2013 mainly due to the frequent appearance of extreme weather phenomena such as floods (2007, 2012, 2013) and persistent drought periods (2010, 2011, 2012). According to Defra (2013), these variations in the observed agricultural TFP are due to phenomena and disease outbreaks which are usually out of the sphere of control of farm managers. However, we argue that these variations in the physical environment should be isolated in cases where we are interested in measuring farm performance only (e.g. policies aiming to identify farmers that could potentially perform better regardless of the weather conditions).

For example, the combination of high daytime temperatures and reduced rainfall levels during the May–July would have direct impacts on the hydrological status of areas (increase in the demand for water abstraction licences from agriculture and other competitive industries as well as the reduced water availability) with high concentration of arable and horticulture farms (i.e. East Anglia) (Defra 2009; Environment Agency 2008, 2011).

A number of research papers in determining the performance of a production unit have proposed both parametric and non-parametric empirical models to account for spatial heterogeneity (Vidoli and Canello 2016) such as climatic conditions, topography and socio-economic aspects. Within the parametric approaches, the most frequently used framework comprises the consideration of contextual variables that are presupposed to affect efficiency (Areal et al. 2012a; Barrios and Lavado 2010; Hughes et al. 2011; Pede Valerien et al. 2018). In addition, Sherlund et al. (2002) demonstrated that the omission of potential relevant physical environment variables will cause bias which will be absorbed in the composite error \(\left( {v - u} \right)\) and hence into the estimation of an efficiency estimate which is computed from the non-negative \(u\) term. However, the non-parametric literature lacks specific contributions to account for spatial variations in the agricultural sector. The usual approach involves the two-step approach (Simar and Wilson 2007) where the effect of contextual variables is isolated by (1) estimating efficiency of production units and (2) regress these efficiency estimates on a set of environmental variables. However, spatial dependence is not controlled in the first stage and hence bias is not avoided at the second stage of the estimation process. An alternative approach includes the incorporation of spatial dependence into the efficiency term and the probabilistic formulation of the nonparametric conditional measure (Bădin et al. 2012; Daraio and Simar 2007; Jeong et al. 2010) which also accounts for spatial heterogeneity within a set of contextual variables and not within a set of specific territorial characteristics.

This research paper adds to the nonparametric literature of production efficiency by suggesting a specification to account for spatial differences amongst production units. We analyse the relevance of considering the fluctuations in the attributes of the physical environment (rainfall and growing season variations) in the specifications of the Data Envelopment Analysis (DEA) linear programming. We do this by using rainfall and the growing season length as inputs in the production function. This enables us to (a) account for differences in the physical environment between farms in our sample, and consequently (b) to determine the significance of the variations in the climatic conditions on the technical efficiency estimates for arable farms. We treat rainfall and the length of the growing season as a production factor that is fixed (i.e. a non-discretionary input variable), which is equivalent to saying that the farm owners/managers do not have any management control.

The rationale behind this suggestion is that to improve extension services and advice for farming systems, differences in the environmental conditions realised by each individual farm should be considered to ensure the homogeneity of the benchmarking sample and hence, to reduce any biased estimates of technical efficiency.

2 Materials and methods

2.1 The UK arable sector and data requirements

The total land allocated into crop production in England over the period of 2006 to 2016 averaged to 4.8 million hectares with a growing season extending from early spring to the middle of autumn. For the same period approximately 3.9 million hectares were cropped (arable crops, cereals, oilseeds, potatoes, horticulture crops). Production for cereals in 2009 was characterised by a declining but high variable and fixed costs with also declining sales values as the growing season progressed and the performance of growers was influenced by the timing of their purchases and sales (Lang 2010). The 2010 harvest year had similar variable costs but fertiliser expenditure was reduced. Furthermore, crop and straw prices for 2010 were higher and consequently gross margins were exceeding those of recent years (Lang 2011).

For the purposes of the DEA model, we used a balanced sample of 245 cereal farms from the Farm Business Survey (FBS)Footnote 1 for the production years of 2009 and 2010 in England and Wales. The sample of cereal farms satisfies the homogeneity requirement of the DEA model when the structure of production is considered (crop mix) and the climatic and physical characteristics are accounted. This enables benchmarking performance and the individual farming systems through the modelling process.

The production technology of each farm in the sample was defined by the agricultural area under crop production, the various crop production costs (including fertiliser, crop protection, seed and other agricultural costs), the labour hours spend in agriculture per year, rainfall level and the growing season length.Footnote 2 Since arable farms generate output from various crops the farm business gross margin was identified as the only output of each individual farm in the sample. In addition, in the short-run the farm business gross margin will approximate the profit maximising goal of the producer based on the input set and the available technology for each individual farm in the sample set.

The climate in England during the 2009 growing period for crops (April–September) was characterised by favourable spring growing conditions but a wet July in England and Wales. The 2010 production year was very dry and warm and had a negative impact to the crops drilled during the spring season. In addition, the growing period proved to be very dry and warm. The dry weather broke suddenly with more than twice the average July rainfall in most regions at the onset of harvest.

We combine rainfall data in England over the 2009–2010 period with the spatial information using the 10 km grid reference for the cereal farms in the sample (Fig. 1a, b). This allowed us to assign an average rainfall level per farm during the growing season. Figure 1b shows that rainfall levels for the 2010 period were reduced during the April–August period (timing of construction and production phases of the crop) thus, moisture conditions for the development of the crop were least favourable when compare to 2009 (Fig. 1a). We would expect that incorporating rainfall into the technical efficiency analysis would have a greater impact on farm efficiency levels in 2009 than in 2010 (i.e. there would be greater bias using a conventional efficiency analysis when differences in rainfall across the sample are large as it is in 2009). Moreover, data available from the Met Office was used to assign an estimate of the length of the growing season per farm in the sample. Specifically, the growing season length is the period (days) bounded by a daily temperature mean over 5 °C for more than 5 consecutive days and less than 5 °C for more than 5 consecutive days (after the 1st July). Both rainfall and growing season data was derived from UK weather observations held at the Met Office.Footnote 3

Fig. 1
figure 1figure 1

Map of rainfall at 10 km square level for 2009 (a) and 2010 (b)

Table 1 presents the mean, SD and the minimum and maximum values of the inputs and outputs used in the modelling process to describe the production technology of the sample for the years 2009 and 2010.

Table 1 Descriptive statistics of the FBS variables used in the DEA LP models

2.2 Methodology

Farm efficiency levels are estimated using a conventional (CNV) and sub-vector (SBV) DEA model to discuss the significance of environmental parameters such as rainfall and the growing season length in benchmarking farming systems. We estimate the efficiency levels for 2009, 2010 to examine whether changes in ranking of farms based on efficiency scores occur between models.

The SBV DEA model allows the researcher to account for both input and output variables that are out of the control of the farm manager (non-discretionary variables). Therefore, the model is consisted by two distinguished types of variables a) discretionary (the farm owner/manager has a significant control over the mix of production inputs and the planned production output) and b) non-discretionary (the farm owner/manager is not able to decide over the proportion of inputs to be used or the outputs to produce). Hence, as it is suggested by Lilienfeld and Asmild (2007) the integration of non-discretionary variables into the DEA model allows the estimation of the proportional input reduction or expansion of outputs only for those variables or production outcomes that are under the direct control of the farm owner/manager. On the contrary, in a conventional DEA model, failing to account for non-discretionary variables would lead to an estimate of efficiency for each decision-making unit which will suggest an equiproportional cutback or increase of all inputs and outputs concurrently although some of the variables are out of the control of the farmer (e.g. rainfall—non-discretionary input). The sub-vector DEA efficiency analysis model was introduced by Kopp (1981) and Färe et al. (1983). In the literature of agricultural efficiency studies that have used the SBV DEA model are those of Piot-Lepetit et al. (1997) where land and agricultural workforce were considered as non-discretionary inputs, Lansink et al. (2002) and Lansink and Silva (2003) where the SBV model was used to both obtaining technical efficiency estimates for a group of inputs and measuring energy technical efficiency respectively. Asmild and Hougaard (2006) employed DEA models based on the SBV variation to compare the economic and environmental performance of Danish pig farms. Revenue and environmental variables were treated as non-discretionary variables alternately into the two SBV models.

2.3 Modelling the discretionary and non-discretionary variables in DEA

The non-parametric DEA models allow for both input and output orientation while different assumptions are possible to be made regarding the returns to scale. The assumption of variables return to scale (VRS) (Banker et al. 1984) has been made to solve the DEA model as an input orientated model. Hence, efficiency estimates derived from the model will identify the total equiproportional reduction for each input variable while ensuring that the farm individual levels of outputs will not change.

As a result, both the level of rainfall and the length of the growing season are treated as variables in the input side of the DEA model due to their direct impact on crop yields. Nonetheless, to consider both rainfall and growing season as non-discretionary variables relevant modifications are carried out to the DEA model constraints (see Eqs. 2, 3 and 4). Specifically, these are conducted through an input DEA model as most efficient farms are identified as those that are able to maintain the individual levels of output while the minimum amount of inputs is used while considering variations in the physical environment. The production frontier is defined by linear and convex combinations of best performers. The relative position of the remaining farms to this estimated frontier is then used to measure their efficiency score (DEA is a benchmarking technique). Further developments and a detailed discussion over the various DEA techniques and models is available in Cooper et al. (2006).

Additionally, to explore whether differences in the ranking of the farms exist when variations in the physical environment are considered or not, a CNV DEA and a non-discretionary or SBV DEA model are employed with the aim of comparing their individual efficiency scores. A non-discretionary variable in a DEA framework can be defined as one that cannot be modified or at least held constant in the short run.

To put the above into context is assumed that \(N\) farms are observed and each farm \(i = \left\{ {1, \ldots ,N} \right\}\) uses \(J\) \(\left( {j = 1, \ldots ,J} \right)\) inputs, \(x_{j}\) to yield \(S\) outputs \(y_{r} \left( {r = 1, \ldots ,S} \right).\) Hence, an input oriented DEA model with all inputs variable can be formulated as (the conventional CNV model):

$$min_{{\theta ,\lambda^{i} }} \theta_{CNV}^{'}$$
(1)
$$s.t.\quad \theta x_{ji}^{{\prime }} \ge \mathop \sum \limits_{i = 1}^{n} \lambda^{i} x_{ji}$$
(1.1)
$$y_{ri}^{{\prime }} \le \mathop \sum \limits_{i = 1}^{n} \lambda^{i} y_{ri}$$
(1.2)
$$\lambda^{i} \ge 0$$
(1.3)
$$\mathop \sum \limits_{i - 1}^{n} \lambda^{i} = 1$$
(1.4)

where \(\theta_{CNV}^{'}\) is a scalar, depicting the efficiency estimate for each of the \(n\) farms in the model. The optimal value of \(\theta_{CNV}\) will range between 0 and 1 with the value \(\theta_{i} = 1\) indicating a farm on the frontier (efficient). The value \(\theta_{i}\) cannot exceed unity since this represents the ratio of the Euclidean distance from the origin over the production frontier.

The above formulation and structure of the DEA LP will need appropriately be adjusted when a set of discretionary inputs \(DI\), \(DI \subset \left\{ {1, \ldots ,J} \right\}\) and a set of non-discretionary inputs \(NDI,\) \(NDI = \left\{ {1, \ldots ,F\} } \right.\), and all elements of NDI are not elements of DI.

Considering both the DI and NDI variables the production technology set \(P_{SBV}\) can be defined as follows:

$$P_{SBV} = \left\{ {\left( {x_{DIji} , x_{NDIji} ,y_{ri} } \right) |x_{DIji } \,and\,x_{NDIji} \,can\,produce\,y_{ri} } \right\}$$
(2)

We solve the DEA model using Bogetoft and Otto (2010) approach for cases where DI and NDI variables exist.

$$\theta \left( {\left( {x_{DIji} , x_{NDIji} ,y_{ri} } \right);P_{SPV} } \right) = min_{\theta } \left\{ {\theta |\left( {\theta x_{DIji} , x_{NDIji} ,y_{ri} } \right) \in P_{SPV} } \right\}$$
(3)

The constraints of the DEA LP are therefore adjusted to allow only for the discretionary inputs to be equiproportionaly reduced. Hence, the input oriented DEA efficiency measure when accounting for rainfall variations for observation \(x^{{\prime }} , \theta^{{\prime }} ,\) is estimated by the following LP model:

$$min_{{\theta ,\lambda^{i} }} \theta_{SBV}^{'}$$
(4)
$$s.t.\quad \theta x_{DIji}^{\prime } \ge \mathop \sum \limits_{i = 1}^{n} \lambda^{i} x_{DIji} \quad j \in DI$$
(4.1)
$$x_{NDIji}^{{\prime }} \ge \mathop \sum \limits_{i = 1}^{n} \lambda^{i} x_{NDIji} \quad j \in NDI$$
(4.2)
$$y_{ri}^{\prime } \le \mathop \sum \limits_{i = 1}^{n} \lambda^{i} y_{ri}$$
(4.3)
$$\lambda^{i} \ge 0$$
(4.4)
$$\mathop \sum \limits_{i - 1}^{n} \lambda^{i} = 1$$
(4.5)

According to Bogetoft and Otto (2010) an alteration of the model presented in model (4) to enable the solution of the LP is to treat the NDI inputs as negative outputs in a input oriented model:

$$min_{{\theta ,\lambda^{i} }} \theta_{SBV}^{\prime }$$
(5)
$$s.t.\quad \theta x_{DIji}^{\prime } \ge \mathop \sum \limits_{i = 1}^{n} \lambda^{i} x_{DIji} \quad j \in DI$$
(5.1)
$$- x_{NDIji}^{\prime } \ge \mathop \sum \limits_{i = 1}^{n} \lambda^{i} ( - x_{NDIji} )\quad j \in NDI$$
(5.2)
$$y_{ri}^{\prime } \le \mathop \sum \limits_{i = 1}^{n} \lambda^{i} y_{ri}$$
(5.3)
$$\lambda^{i} \ge 0$$
(5.4)
$$\mathop \sum \limits_{i - 1}^{n} \lambda^{i} = 1$$
(5.5)

In model (5), \(x_{DIji}\) is the \(j{{\rm th}}\) discretionary input for farm \(i\), \(x_{NDIji}\) is the \(j\)th non-discretionary input for farm \(i\) and \(y_{ri}\) is the \(r\)th output for farm \(i,\) \(i = \left( {1, \ldots N} \right), j = \left( {1, \ldots J} \right) \,and\,r = \left( {1, \ldots S} \right)\). The optimal value \(\theta_{SBV}\) ranges between 0 and 1 and represents the SBV efficiency estimate. This optimal value \(\theta_{SBV}\) indicates the equiproportional reduction of the \(x_{DIji}\) inputs while the level of outputs remain constant with reference farm on the frontier. The constraints expressed in equation—5.1 and 5.2 limit the proportional decrease in both DI and NDI, when the value \(\theta_{SBV}\) is optimised with respect to the mix of inputs utilised by the observed technology of the peer farms on the frontier. The third constraint ensures that the output yield by the \(i\)th farm is not greater than the frontier yield. The constraints in Eqs. 5.1, 5.2 and 5.3 will satisfy the condition of the optimal value to belonging to the production possibility set. The convexity constraint presented in Eq. 5.4, assumes VRS for the model. This implies that an increase in inputs does not result in a proportional change in the outputs as it is the case of the CRS assumption. This is considered appropriate since the aim is to measure the impact of physical performance on pure technical efficiency (Banker et al. 1984) rather than the gross efficiency under the CRS assumption (Charnes et al. 1978).

2.4 Peer units in DEA models

The right hand sides in DEA programmes of Eqs. (1) and (4), \((\mathop \sum \nolimits_{i = 1}^{n} \lambda^{i} x_{ji} , \mathop \sum \nolimits_{i = 1}^{n} \lambda^{i} y_{ri} )\) and \((\mathop \sum \nolimits_{i = 1}^{n} \lambda^{i} x_{DIji} , \mathop \sum \nolimits_{i = 1}^{n} \lambda^{i} x_{NDIji} , \mathop \sum \nolimits_{i = 1}^{n} \lambda^{i} y_{ri} )\) respectively define the reference decision making units (i.e. farms) for the CNV and SBV models against which we compare farm \(x^{{\prime }}\). Those farms with positive lambdas (i.e. weights) are identified in the DEA literature as the peer units, i.e.

$$Peer\,units = \left\{ {i \in \left\{ {1, \ldots ,n} \right\} |\lambda^{i} > 0} \right\}$$

and hence it is concluded that DEA “identifies explicit real peer-units for every evaluated unit” (Cooper et al. 2007). Generally it can be considered that for a given farm, peer units are the signal for modelling quality (Bogetoft and Otto 2010).

2.5 Bootstrapping in DEA: correct the bias in DEA estimators

Since the efficiency estimation using DEA is based on a production possibility set (PPS) derived from definite samples, instead of the real observed production frontier, the measures of efficiency could be affected by sampling variation, suggesting that the estimated distance functions to the frontier are potentially miscalculated (Balcombe et al. 2008a; Simar and Wilson 1998).

A segment of the DEA literature has concentrated its efforts in the provision of a vigorous theoretical framework for the establishment of statistical properties of DEA estimators (Banker 1993; Kneip et al. 1998; Korostelev et al. 1995). According to Simar and Wilson (1998, 2000, 2007) bootstrapping is the most suitable method to obtain statistical attributes for the DEA estimators. However, care should be taken on the type of bootstrapping used since results could be inconsistent in some cases. When the DEA efficiency estimates are close to one, resampling from the original data could be the cause of inconsistency in the estimation of the confidence intervals (i.e. the upper limit is above unity). The method of bootstrapping is based on the idea that is possible to simulate the real sampling distribution of the data by being able to imitate the Data Generation Process (DGP) (Balcombe et al. 2008b). Hence, the DGP in the case of DEA aims to the generation of a pseudo-data set which will be used to re-estimate the DEA distance functions. The higher the number of bootstrapped replicates [more than 2000 (Simar and Wilson 1998, 2007)] the better the approximation of the real distribution of the sampling. Consequently, the bootstrap Algorithm #2 of Simar and Wilson (2007) is used to obtain robust DEA estimators and confidence intervals. A detailed presentation of the algorithm used to bootstrap DEA estimates is available in Simar and Wilson (1998). In addition, we used the bootstrapped efficiency scores (bias corrected efficiency scores) to achieve a full ranking of the farms in the sample. Hence, we have accounted in this way for the problem of not being able to rank farms on the frontier (efficient farms are identified by a unity) since the bias corrected efficiency scores allows also for a discrimination between efficient farms (Simar and Wilson 1998).

2.6 The coefficient of separation

A useful summary statistic to express the degree of overlap between confidence intervals was introduced by Latruffe et al. (2005) and further developed by Gocht and Balcombe (2006) is the coefficient of separation “CoS”. This is estimated by accounting for every farm in turn those peers that are significantly more efficient than it. In other words, we identify those farms that their lower bound is strictly greater than the upper bound (for a given significance level) of the farm in question.

In particular, let \(\tilde{N}\) be a number of farms “significantly” greater than \(\hat{N}\) other farms where \(\hat{N} = 1,2, \ldots , N - 1\) and \(N\) is the total number of farms. Under perfect separation, we would observe

$$\tilde{N} = \left( {N - \hat{N}} \right)$$

for \(\hat{N} = 1,2, \ldots , N - 1\). Noting the identity

$$\frac{2}{{N^{2} }}\mathop \sum \limits_{{\hat{N} = 1}}^{N - 1} \left( {N - \hat{N}} \right) + \frac{1}{N} = 1$$

A “CoS” can be constructed as

$$CoS = \frac{2}{{N^{2} }}\mathop \sum \limits_{{\hat{N} = 1}}^{N - 1} \left( {\tilde{N}_{n} } \right) + \frac{1}{N}$$

If perfect separation is assumed, then based on the identity presented above, this will equal to unity.

$$CoS = \frac{2}{{N^{2} }}\mathop \sum \limits_{{\hat{N} = 1}}^{N - 1} \left( {N - \hat{N}} \right) + \frac{1}{N} = 1$$

The statistic provides us with information which enable us to approximately estimate the percentage of the sample that is significantly less efficient than a given percentage of the sample, following the ranking of the sample. According to Gocht and Balcombe (2006) “the smaller the CoS (at a given level of significance), the less we can differentiate between farm efficiencies, given the confidence intervals obtained by the bootstrapped”.

3 Results

The mean technical efficiency over the 2009 and 2010 production years for the SBV model (rainfall and growing season length are incorporated into the analysis considered as fixed variables) is 0.82 and 0.80 respectively. Accordingly, for the CNV DEA model the average technical efficiency for 2009 is 0.74 and 0.76 for 2010. Further information regarding the distribution of the efficiency estimates as well as the average efficiency for each year for both the CNV and SBV models are available in Table 2. By examining in contrast, the allocation of farms in relation to best performing farms in the sample for both 2009 and 2010 it can be noted that the distribution of the farms in the SBV model became increasingly skewed towards the higher efficiency rankings. This is clear in Fig. 2 where the kernel density estimate for the 2 years for all 4 models is plotted. The right-hand side of the panel presents the plots derived from the CNV model while the left-hand side plots the bias corrected efficiency scores for 2009 and 2010 respectively. In all cases the SBV model is skewed towards unity. This means that incorporating the additional restrictions in the DEA LP to account for the non-discretionary input variables ensures that each farm is only compared with other farms in the sample with the same environmental conditions. Consequently, the farms efficiency levels obtained from the SBV model are not distorted by varying rainfall levels making the benchmark “fairer” and non-misleading. For instance, when the mean efficiency score for the CNV model in 2009 (θ = 0.74) would indicate that the proportional input potential saving is 26%, however, once the environmental characteristics (rainfall and length of growing season) are accounted, the input potential saving is 18% (Table 2). For the year 2010, a year with less variation of rainfall across the sample (i.e. more homogeneous sample regarding rainfall), the mean efficiency score for the CNV model (0.76) would indicate that the proportional input potential saving is 24% when the SBV model finds such saving to be 20%.

Table 2 Distribution of the DEA efficiency estimates for the CNV and the non-discretionary DEA model
Fig. 2
figure 2

Kernel density estimates of the two DEA models for the 2009 and 2010 production years (Original and Bias corrected efficiency scores)

Technical efficiency in 2010 was decreased by 2.4% for the SBV model (physical characteristics adjusted model) and increased by 2.7% for the CNV model in relation to 2009 levels. Technical efficiency is in lower levels in 2009 for both models, which might indicate the impact of the increase in input prices for fertilisers and soil improvements during that yearFootnote 4 (Table 2).

Figure 2 shows that when accounting for the variability in the environmental conditions (rainfall, growing season length) the efficiency levels across cereal farms in the sample are similar to those derived by the CNV model when physical variability is not accounted for. Nevertheless, this does not mean that individual farms have similar scores under the CNV and the SBV models and or are ranked in the same position, which is relevant information for policy making.

We found that when measuring farm efficiency performance, environmental conditions that are not under the control of the farmer (physical characteristics) matter in terms of the relative rankings between farms (Areal et al. 2012b; Henderson and Kingwell 2005). Differences in the relative ranking of farms between the SBV and CNV model indicate a failure to correctly assess the relative performance of each farm and account for the effect of the annual variation of rainfall in production efficiency. For that purpose, the DEA bootstrapping efficiency scores are used to rank farms in the sample according to outputs of the CNV and SBV model. The results of the CNV and SBV bootstrapped models are summarised in Table 3. In addition, the bootstrapped DEA model allows the construction of confidence intervals which enable us to conclude about the statistical significance of change in ranking between the two models.

Table 3 Distribution of the mean of the bias corrected technical efficiency estimates for the CNV and SBV DEA models

Considering the change in ranking between the CNV and the SBV model for 2009 only a 2% of the farms was ranked the same, the remaining of the sample had an either positive or negative change in ranking. In particular, 34.32% of these farms had an increase in the ranking position, while 63.8% had a decrease in ranking position when we compare the CNV and the SBV model (a positive change in ranking shows a movement towards the technical efficiency frontier in the SBV model while the opposite is indicated in the case of the negative change in ranking). Table 4 shows the farms, the bias corrected efficiency scores and the change in ranking from the CNV to the SBV model. The largest positive change, 218 positions, is for farm 25 (CNV efficiency score 0.45, SBV efficiency score 0.88). The largest negative change in ranking is recorded for farm 226 which drops 66 positions according to the SBV efficiency scores from ranked 8th to 74th (CNV efficiency score 0.88, SBV efficiency score 0.83). These extreme changes in ranking have a significant implication in the proportional reduction of inputs for the farming systems. In particular, in the first case (positive change), the SBV model suggest a 12% proportional reduction in inputs compared to a 55% proportional reduction suggested by the CNV model while in the second case (negative change) suggests further proportional decrease in the use of inputs when compared to the efficiency score of the CNV model (17% reduction). Similar results are derived for 2010 (2% of the farms had no change in ranking, 29% positive change and 69% negative change in ranking). The farms with the five highest changes in ranking are presented in Table 5 and a more detailed table is available as online supplementary material.

Table 4 The five highest and lowest changes in ranking for 2009
Table 5 The five highest and lowest changes in ranking for 2010

In Fig. 3 the ranking position of the farms in the sample are plotted based on the bias corrected efficiency score for the CNV models for years 2009 and 2010 in a descending order (higher to lower efficiency score). Then the ranking position of the farms for the SBV model is plotted in relation to the efficiency score derived from the CNV model. Hence, in Fig. 3 we observe the change in the ranking position of individual farms based on the output of the CNV and SBV models. When there is no change in the ranking position of a farm the points overlap, a positive change in ranking (the farm has moved towards the SBV frontier) will be observed as a shift to the left while respectively, a negative change in ranking will be observed as a shift to the right (the farm has moved away from the frontier). As expected we find relatively larger changes in ranking under a more heterogeneous scenario regarding rainfall (year 2009) than for an scenario with less differences in rainfall across the sample (2010) (see Fig. 1). However, in both cases changes in ranking do occur. The detailed distribution of change in ranking in respect to the range of efficiency scores for the two years is presented in Table 6.

Fig. 3
figure 3

Change in the ranking position for the sample in 2009

Table 6 Distribution of change in ranking position of farms between the efficiency scores of the CNV and the SBV model

In order to investigate the degree of differentiation between farm efficiencies we calculate the CoS, which provides an indication (approximately) of the percentage of farms in the sample that are less efficient than a given percentage of the sample, after the sample is ranked (Gocht and Balcombe 2006). Hence, the smaller the CoS is (at a given level of significance) then the less we can differentiate between farm efficiencies given the confidence intervals derived from the bootstrap. As it is presented in Table 7, the highest CoS is reached in the case of the conventional DEA model for the 2009 harvest year data (CoS = 0.61) while the lowest is observed for the sub-vector model in the sample of year 2009 (CoS = 0.47). This is an important finding though not surprising. Not accounting for environmental characteristics that influence production (e.g. rainfall) in technical efficiency analysis contributes to discriminating farms regarding their performance level. Once all farms are compares in a ‘fairer’ way (i.e. accounting for rainfall) differences between them are not as evident. Therefore, the SBV model provides a ‘fairer’ farm efficiency ranking, especially under heterogeneous physical conditions. Figure 4 shows the confidence intervals and the point estimates of the two models for the 2009 and 2010 respectively are presented.

Table 7 Coefficient of Separation for the different model assumptions in 2009 and 2010 harvest years
Fig. 4
figure 4

Confidence intervals and point estimates for the CNV and SBV models in 2009 and 2010

4 Discussion: Conclusions

A standard (conventional) and a non-discretionary (sub-vector) DEA models were employed to measure farm level technical efficiency for 245 cereal farms in England during the production years of 2009 and 2010. The non-discretionary model integrated into the constraints of the model measurements of the annual rainfall and the length of the growing season for each farm in the sample in order to account for variations in the physical environment with direct impact on the production capacity of the farm. These variations in the constraints of the DEA model (SBV) ensures the benchmarking of the farms with similar physical environment characteristics and consequently, farms are compared within a more homogenous sample. Further, considering rainfall and the length of the growing season in the analysis allows to control for spatial heterogeneity in the model avoiding issues associated with unobserved heterogeneity such as model misspecification. Hence, the model accounts for spatial heterogeneity (peer comparisons, see Sect. 2.4) by comparing farms with the same physical environment conditions relevant to crop establishment and development. Spatial heterogeneity has been introduced into stochastic frontier models in the most recent works of Schmidt et al. (2009), Areal et al. (2012a), Glass et al. (2014), Glass et al. (2016), Gil et al. (2017), Adetutu et al. (2015) and Vidoli et al. (2016). Most recently the direct relationship of spatial dependency (Anselin 2002) and technical efficiency of farms is also demonstrated in Pede et al. (2018). A different approach that accounts for both spatial dependence and spatial heterogeneity is presented in the recent works of Andreano et al. (2017), Billé et al. (2017) and Billé et al. (2018). In the area of non-parametric efficiency analysis, the recent work by Vidoli and Canello (2016) has proposed a framework that is possible to consider the concept of spatial dependence into nonparametric efficiency models that is accounting the spatial proximity of peers rather than the relationship between inputs, outputs and the set of contextual exogenous factors with direct impact to production capacity. However, whereas the literature highlighted above focuses on ways to account for unobserved heterogeneity in circumstances where information such as climatic conditions is not present we do incorporate this key information into the model (i.e. rainfall and the length of the growing season). We argue and show that the physical environment (contextual exogenous factors to farm management) has a significant impact on technical efficiency and productivity measurements which has important policy implications. The relevant information used here is available from national meteorological offices and it can be easily combined with other sources of data (e.g. farm surveys) as long as it contains some type of geographical information.

As it was observed in the results, the non-discretionary model is adjusting the ranking for the farms that were previously benchmarked within an unfavourable environment but it maintains the same ranking for the remaining farms in the sample as has also been observed by Henderson and Kingwell (2005).

Our results suggest that standard efficiency analyses that do not account for environmental conditions such as rainfall levels may lead to farm performance measurement bias and consequently be misleading for policy advice. For instance, if the policy aim was to achieve high levels of productivity through efficiency improvement throughout the identification of less efficient farms to provide them with the necessary management support the standard approach may not identify the “right” farms in need of technical/knowledge support. This is in line with what was concluded in Rahman and Hasan (2008) where they suggested that an extended framework of analysis is required to evaluate the production performance of farmers in order to avoid the upward bias inefficiency which will lead to allocation of resources to less than optimal uses. Moreover, the inclusion of rainfall and length of growing season in the SBV model accounts for an important parameter in agricultural production since it improves soil’s production capacity and enables the use of chemical fertilisers and other inputs effectively. The aim of considering the variations in rainfall intensity within the model is to capture the negative impacts that excessive rainfall can cause in the production system (i.e. flooding, nutrient loss) and hence lower efficiency (Olayide et al. 2016). Based on these information policy makers could allocate resources to less efficient farmers in order to intervene and prevent the negative impacts of agricultural rainfall—runoff and soil nutrient loss. In addition, such an approach will contribute in the characterisation of regimes that would contribute in the detail understanding of the production environment of farming systems and provide information and guidance to policy design and the development of extension services (Billé et al. 2018). Hence, a spatial adjusted DEA model could provide further insight to the leading and lagging performances of farming systems based in Less Favoured Areas (LFAs) and therefore understand better the physical environment and the impact it has on production performance (Hoang 2013). When additional characteristics of the physical environment are taken into consideration in the model specifications, such as soil conditions, carbon storage capacity and field aspects, it is possible to develop policies that target more suitable landscapes for agricultural production within the sustainable intensification framework (Gadanakis et al. 2015a). In this way policy interventions may simultaneously aim at increasing productivity while enhancing biodiversity and the provision of ecosystem services (Strohbach et al. 2015). Furthermore, the proposed model considers both the characteristics of the farming system and the spatial heterogeneity associated with the physical environment, both determining the farm production capacity. Hence, the methodology provided here can be used for designing extension services and/or policies that support the sustainable development of the farm system. As discussed by Gadanakis et al. (2015a, b) a non-discretionary DEA model could be used to design policy interventions that will address the increased challenges in water availability due to variations in weather patterns and the increased demand for water in agriculture derived from the continuously growing population and hence food demand. Accounting for variations in the physical environment allows for a better allocation of resources helping to isolate the needs for different agricultural systems i.e. rainfed, irrigated, supplementary irrigated (cash crops in temperate climates) systems and therefore provide guidance for the design of water abstraction regulations in different geographical locations.

Significant changes in farm efficiency ranking were observed in both 2009 and 2010 production years. In particular, for the two separate periods the ranking has positively changed for approximately the 33% of the farms in the sample while a negative change in ranking was observed for approximately the 65% of the farms. In addition, in both periods 2% of the farms retain their ranking position. Hence, it can be concluded that the consideration of environmental conditions as non-discretionary inputs in the production function is suggested to account for variations in exogenous parameters and ensure the homogeneity of the benchmarking sample (Henderson and Kingwell 2005).

The above is also supported by the CoS statistic which showed that the consideration of rainfall as a non-discretionary input increases the number of fully efficient farms and the technical efficiency level of the farms below the frontier. Therefore, in order to reduce biased estimates of technical efficiency and also to improve management advice for farming systems, variations in environmental conditions should be considered to secure a homogenous benchmarking sample. Failing to account for physical parameters that significantly impact crop production (i.e. rainfall and length of growing season) in technical efficiency analysis will contribute in an unintentional discrimination of farms regarding their performance level (Rahman and Hasan 2008; Sherlund et al. 2002). Once all farms are compared in a ‘fairer’ way (i.e. accounting for rainfall) differences between them are not as evident. Future work will need to focus towards the isolation of exogenous factors that are linked to the physical environment and towards the characteristics of the human capital. The level of education, the adoption of innovation technologies and the behaviour of farmers are important parameters linked to production efficiency of farming systems. Hence, accounting for the above variations in efficiency analysis will allow further understanding of the specific needs for farmers within the frontier (inefficient) and what policy interventions are required to fully exploit the production technologies employed by their peers on the frontier (efficient) (Pede et al. 2018).

Our findings have shown that the estimates of technical efficiency can be improved when the DEA model accounts for environmental production conditions. Thus, future work should consider extending the approach of Simar and Wilson (2007) to account for both physical environment characteristics (production function) and for crop specific managerial characteristics (education, farm business specialisation, timing of the various agricultural operations, scale of operation, agri-environmental payments, etc.) in order to explore the determinants of inefficiency. This will enable further policy recommendations that will simultaneously account for spatial variations and farm specific characteristics to improve production performance of farming systems within the concept of sustainable intensification and therefore improve both the economic performance of farming systems and simultaneously reduce their environmental pressures generated. Furthermore, although this is not within the scope of this paper, an analysis on returns to scale for the adjusted DEA model (SBV) could be used to provide pathways for long term improvements and planning which could be used to strategically position a farm in relation to the long-term average cost curve and hence improve economic efficiency and productivity.