Introduction

Low agricultural productivity, recurrent food shortages and high prevalence of food insecurity in sub-Saharan Africa (SSA) have led to repeated calls to intensify agriculture, with a particular focus on addressing the widespread soil fertility depletion in agricultural lands (UN Millennium project 2005; Sanchez 2010; Shapouri et al. 2010; Andriesse and Giller 2015; Binswanger-Mkhize and Savastano 2017). Sustainable agricultural intensification is viewed as a prerequisite for combatting food insecurity and reversing the trend of natural resource degradation (Tittonell and Giller 2013; Vanlauwe et al. 2014; Zurek et al. 2015), and an increased use of mineral fertilizers is considered to be an essential part of the solution (IFDC 2006; Sanchez 2010; Holden 2018). Despite efforts to enhance the use of fertilizers in the region (Druilhe and Barreiro-Hurle 2012; Jayne et al. 2018), average application rates remain very low, with recent studies reporting an average fertilizer use around 14 kg ha−1 (Bonilla Cedrez et al. 2020), though there is a wide variability between countries, with averages of some countries surpassing 50 kg ha−1 (Liverpool-Tasie et al. 2017; Sheahan and Barret 2017). While the accessibility to fertilizers remains a main constraint to the widespread use of fertilizers by smallholder farmers, the production risk associated with poor crop responses caused by variable weather conditions (Mafongoya et al 2007) and/ or by local edaphic constraints (e.g. limited soil rootable zone or water holding capacity and soil organic matter) i.e. the so-called non-responsive soils (Vanlauwe et al. 2010), could discourage farmers to invest in fertilizers (Holden 2018; Schut and Giller 2020). A lack of crop response to the application of fertilizers represents an obvious economic loss to farmers and, if enduring, may make fertilizer application unattractive to farmers and potentially harmful to the environment. Determining the rate of incidence of non-response to fertilizer is needed to understand the magnitude of the problem, and this requires on-farm observations on the variability in yield responses. While there is a diverse literature reporting on response variability observed in on-farm trials performed at different spatial scales across SSA (Tittonel et al. 2007; Kihara et al. 2016; Zingore et al. 2007; Ronner et al. 2016; Njoroge et al. 2017; Ichami et al. 2019; Roobroeck et al. 2021; Garba et al. 2018; Wortmann et al. 2017), few quantify the proportion of fields that fail to show an appreciable response in a given year. Two methodological issues make the quantification of non-response in on-farm data more challenging than it may seem. First, quantifying inadequate yield response in a dataset on fertilizer responses requires a measure against which observations can be compared. For single nutrient fertilizers, the agronomic efficiency (AE), the amount of extra produce per quantity of nutrient applied, which is commonly reported while assessing response to inputs (Olk et al. 1999; Ngome et al. 2013; Kaizzi et al. 2012; Vanlauwe et al. 2016; Kamanga et al. 2014; Xu et al. 2014; Adiele et al. 2020) provides such a measure. However, an equivalent metric does not exist for multi-nutrient fertilizers which are typically used in on-farm trials and by farmers in SSA, often with varying rates for the different nutrients, and which are expected to illicit different yield responses to the same total amount of fertilizer. One solution is to restrict comparisons to cases where the same fertilizer is applied, but this obviously limits the scope and applicability of such analyses, given that various types of fertilizers are used in SSA. Another option is to look at economic efficiencies only, since these can be calculated on any type of fertilizer (Jayne and Rashid 2013), but the variation in response is then determined to a large extent by differences in input prices (Bonilla Cedrez et al. 2020), which can vary over space and time, therefore requiring additional estimates of agronomic response for proper interpretation and translation to current conditions.

The second issue relates to the lack of on-site replication that tends to characterize on-farm trials (Bielders and Gérard 2015; Njoroge et al. 2017; Shehu et al. 2018). Regardless of how response is quantified, the observed variation in response not only reflects field-level variation due to rainfall, soil nutrient status or other biotic and abiotic factors, but is also determined by variation between experimental plots caused by random agronomic and experimental factors that are not repeatable at the field scale. The lack of on-field replicates implies that the random plot-level variation, which will be referred to as residual variation here, is confounded with the field-to-field variation, leading to an overestimation of the latter (Vanlauwe et al. 2016) and consequently, to inflated estimates of the proportion of non-response. Simply stated, even if all fields in a study have the same positive response to inputs, large residual variation will cause a proportion of control-treatment comparisons to yield negative observed responses by random chance. Only when the amount of residual variation is known or can be estimated from on-site replicates, then it is possible to determine what proportion of fields are truly non-responsive in a given season (Vanlauwe et al. 2016).

Together, the inability to account for residual variation and the lack of general measures of fertilizer response may thwart efforts to quantify the extent of non-response in SSA. Here we applied two simple approaches to address one or both limitations. The first uses published averages of residual variation to obtain corrected estimates of response variation from sets of non-replicated on-farm trials. The second aims at overcoming both limitations simultaneously by using a random regression approach that, under simplified assumptions, measures response from the yield increase as a linear function of a general measure of the fertilizer application intensity (Janssen 1998, 2011) while estimating the residual variation as the deviation from this linear relationship.

The main objective of this paper is to provide an estimate of the prevalence of non-response to fertilizers in trials performed across SSA using various types of fertilizers, accounting for residual variation. In addition, we performed spatial analyses of the results to evaluate if inferred effects of climatic and edaphic factors suggest the existence of repeatable patterns of non-response. The latter is of relevance since a lack of trial repetitions on the same field over different seasons does not allow effects of location-specific factors to be directly assessed.

Methods

Dataset

The on-farm fertilizer response data included in this study were obtained from on-going or completed projects at the time of acquisition. Within projects, on-farm trials performed in a single season on the same crop were grouped into separate collections of trials called studies here. The main criterion for selection of trials to include in the dataset was the presence of a control and a fertilizer treatment conducted side by side. There was no further selection made in regards to the type of fertilizer used, therefore the dataset included fertilizers of diverse nutrient compositions. Both published and non-published data were assembled per project, and generally included geographic coordinates. Crops evaluated in the studies were cereals (maize, sorghum) or legumes (soybean, bush bean, climbing bean, groundnut, cowpea). In total, 41 studies (14 for cereals and 27 for legumes) were included, from 11 countries including 6 for cereals and 10 for legumes, with data for specific countries and crops covering one to three separate seasons (Table S1). In total, 515 fields were included for cereals and 3930 for legumes, though one project conducted in Nigeria, in four States, accounted for more than half (2578) of the legume fields. In fertilizer treatments, the ranges of N and P rates were 100–140 kg N ha−1 and 30–50 kg P ha−1 for cereals, and 0–36 kg N ha−1 and 18–69 kg P ha−1 for legumes.

Measures of fertilizer response

Absolute response

As mentioned above, fertilizer response can be assessed using the agronomic efficiency (AE) (Hutton et al. 1956; Vanlauwe et al. 2010; Ichami et al. 2019), calculated as the yield increase per unit of nutrient applied in the fertilizer or: \(\frac{{{\Delta }Y}}{{F_{appl} }}\), with \({\Delta }Y = y_{f} - y_{c}\); where \({F}_{appl}\) is the quantity of specific nutrient applied (usually in kg ha−1) and \(y_{f}\) and \(y_{c}\) are the yields with and without the application of that nutrient (usually in kg ha−1). Since this measure does not extend to multi-nutrient fertilizers, we defined the absolute response as \({\Delta }Y = y_{f} - y_{c}\). Although useful for describing the response to any fertilizer, single or multi-nutrients, it has little comparative value, since its magnitude depends on the specifics of the fertilizer (amount and composition).

Relative response

One way to overcome the challenge of obtaining comparable values of response across different fertilizer formulations, is to define \(F_{appl}\) such that it accounts for differences in fertilizer nutrient composition to adequately express the total amount of applied nutrients simultaneously. Here, we refer to such a general measure as fertilizer application intensity, to express the fact that a single measure of the magnitude of application is used. We adopted an agronomic measure of application intensity that is available in the literature (Janssen 1998, 2011). Based on a popular framework for quantifying soil fertility (QUEFTS, Janssen et al. 1990), the so-called Crop Nutrient Equivalent (CNE) expresses the total nutrient input as the equivalent amount (kg) of nitrogen that would need to be applied to achieve an equivalent yield response if other nutrients and water were not limiting (i.e. available in balanced proportions). Using CNE as measure of fertilizer application intensity, it is possible to obtain a universal definition of fertilizer response as the additional yield (in kg) per unit of CNE (in kg N equivalent).

The calculation of CNE while simple, requires agreement on parameter values and its interpretation depends on several assumptions on crop nutrient responses implicit in the QUEFTS framework (Janssen et al. 1990, see supplement S1a for details). Within this framework, CNE is expected to have an approximately linear relation with yield for balanced fertilizer, which is convenient when using a regression approach to estimate the fertilizer response as described below. Although a strictly linear response to balanced fertilizer may not occur in practice, and alternative nutrient response functions do not share this property (e.g. Greenwood et al. 1971), we expect deviations of linearity to be moderate at the nutrient levels considered here. Alternatively, alternative measures derived from non-linear response functions could be proposed but would require additional agreement on efficiency of parameters and reference levels for soil nutrients.

Estimating field-specific response and its variability

To estimate the extent of non-response to fertilizers, it is imperative to estimate field-level response and its variability, and to separate this variation by accounting for plot-level residual variation as much as possible. Since individual fields are typically not replicated across years, it is important to emphasise that field-level variation represents response variation among fields in a given year, and provides no measure of variation in long-term responsiveness of specific fields or locations. Both statistical methods used here are based on the use of linear mixed models (Henderson 1982), which have the advantage over standard general linear models because, in addition to the residual error term, they can contain other normally distributed random effects. In our case, this offers the possibility of modelling field-level variation in fertilizer response separately from the residual variation. This means that response variation inferred from the data, represented by field-level random effects, can be larger or smaller depending on the magnitude of the plot-level residual variation. For the same amount of observed response variation, larger residual variation will cause the model to infer less variation at the field level. The inferred values of field-level response and their variation can therefore be corrected for plot-level residual variation. Such correction is missing when using standard linear models or observed differences between control and treated plots, leading to overestimation of field-level variation.

For the absolute response calculated per study, we applied a relatively crude method to adjust estimates of plot-level residual variation by using fixed values derived from existing studies that included some form of field-level replication. Based on average values in the literature, the plot-level residual variation (i.e. the residual error) was set to 697 kg ha−1 for cereals ( Njoroge et al. 2017; ten Berge et al. 2019; De Laune et al. under review; Kamanga et al. 2014) and 250 kg ha−1 for legumes (Ronner et al. 2016; van Heerwaarden et al. 2018). For comparison, we also used a model where the residual error was fixed at 0, which corresponds to the observed paired differences between control and fertilized plots, without correction for plot-level residual variation.

For the relative response (as a function of fertilizer application intensity as measured by CNE), a slightly more sophisticated approach was used in which the plot-level residual variation was estimated from the dataset itself (see van Heerwaarden et al (2018) for a description of a similar approach and the Supplement S1b for details). Briefly, since part of the trials in the dataset contain several blends or rates of fertilizer on the same field, it was possible to apply a regression approach to quantify the fertilizer response using CNE as a covariate. Conceptually, on each field a regression line was fit to model yield as a linear function of CNE. The inferred slope for each field was then taken as a general measure of response, equivalent to the agronomic efficiency. Under the assumption that the yield is indeed linear with respect to CNE, the field-level deviations from each regression line can be considered as the residual, and can be used to estimate the plot-level residual variation. By using a mixed linear model and incorporating the field-level slopes as a random effect, two things are achieved. First, these statistical models are robust to unbalanced data and an estimate of plot-level residual variation is obtained even if not all fields have more than two fertilizer treatments. Second, the variation in slopes inferred from such a model is automatically adjusted for the amount of residual variation, providing a more accurate assessment of the actual field-level response variability and non-response than would be obtained from standard regression models. Although attractive, it is important to point out two caveats of this approach. First, the assumption of a linear relation between yield and fertilizer application intensity is likely to be commonly violated to some extent, which means that plot-level residual variation may be overestimated and, consequently, field-level response variation may be underestimated. Second, we currently assume a single level of residual variation for the entire dataset, an assumption that if violated could lead to inaccurate estimates of field-level variation in some areas.

Spatial representativeness and geospatial patterns in responsiveness

Two types of spatial analysis were performed on the dataset: an evaluation of potential spatial bias in the selection of our trial sites and an analysis of the geospatial patterns of responsiveness. All georeferenced trial locations were linked to geospatial information consisting of freely available spatial raster layers, namely a set of 250 m resolution maps of predicted topsoil properties and soil nutrient levels (Hengl et al. 2015, 2017) and a crop mask produced by the African Soil Information Service project (AfSIS) and 30 s resolution maps representing bioclimatic variables (Fick and Hijmans 2017) (See Supplement Table S2 for details). For the crop mask, only pixels with a larger than 50% probability of being under crop cover were retained as cropped sites.

The evaluation of spatial representativeness was performed as follows: for both crop types, a training dataset was compiled by combining the trial locations, and associated geospatial data, with an equal number of non-trial locations sampled at random from the crop mask sites. A random forest model (Breiman and Random 2001) was then fit to this training data, resulting in a predictive model for the probability of a new location to be classified as a trial location. This model was then applied to all retained cropped sites, where the site-specific probability of being classified as a trial location was used as a probability weight determining the chance of a location to end up in a random sample subject to the same spatial and environmental biases as the current set of trial locations. Site selection bias is expected to cause a skewed distribution of these probabilities, since sites with high environmental similarity to the trial locations would have the highest selection probabilities, whereas in the absence of such bias the distribution of selection probabilities should be uniform.

We used this principle to quantify spatial and environmental sampling bias by resampling all cropped sites with replacement and quantifying the proportion of sites that ended up in the final sample. In the case of uniform selection probabilities, the expected proportion of sites that end up in a sample of size n is 63%. This follows from the fact that random site selection can be treated as a set of Bernoulli trials for which the probability of inclusion of each individual site is given by:

$$p_{selected} = 1 - \left( {1 - \frac{1}{n}} \right)^{n} \approx 1 - e^{ - 1} = 0.632$$

Spatial bias in the set of trial locations was therefore quantified by comparing the actual proportion of sites that ended up in the sample to this theoretical value. Proportions below 0.63 are evidence of spatial bias.

The second analysis aimed to establish if predictable geospatial patterns were present in the fertilizer responses. The estimated field-level relative fertilizer responses were first tested for spatial structure using Moran's test for spatial autocorrelation (implemented in the spdep package). Association with geospatial variables was evaluated by fitting a random forest model with all variables and comparing the predictive ability with that of a model with geographic coordinates as only explanatory variables. Predictive ability was thereby defined as the correlation between the out of bag (OOB) predictions with the observed fertilizer response vector.

Defining threshold for non-response

We define non-response simply as cases where the yield differences between fertilized and unfertilized treatments are not significantly different from 0 or significantly lower than 0. For the regression approach used here, this translates to a zero or negative slope with respect to the fertilizer application intensity (CNE).

Results

Mean response, response variability, and prevalence of non-response

Absolute response

Absolute responses to inputs in cereals averaged 1365 kg ha−1 (Table 1), ranging from 599 to 2279 kg ha−1 for sorghum in Mali in 2009 and maize in Malawi in 2011 respectively (Supplement Table S3a). In legumes, the average response to applied fertilizers was 252 kg ha−1 (Table 1) with a range from −27 kg ha−1 in groundnut in Zimbabwe to 671 kg ha−1 in climbing bean in Rwanda in 2012 (Supplement Table S4a).

Table 1 Means of the proportion (%) of non-responsive fields as defined by absolute agronomic response, and relative agronomic response for cereals and legumes based on a dataset from 41 on-farm studies conducted in sub-Saharan Africa. The mean predicted values are indicated in grey (mid). Q2.5 and Q97.5 indicate the lower and upper 95% confidence limits respectively, averaged over studies

Variations in the absolute response for different studies are shown for cereals and legumes in Fig. 1, and Supplement Figures S1 and S2. For cereals, the proportion of non-responsive fields was generally very low when taking into account the residual variation, with a mean of only 0.9% and an average 95% confidence interval from 0.4 to 6.5% (Table 1). In fact, for the majority of studies, the mean and the lower confidence limit for the percentage of non-response were zero (Table S3b). The largest proportion of non-response was observed in trials in Kenya in 2004, the only study in which the lower confidence boundary of non-response was above 0 (6.8%). Not surprisingly, ignoring the residual variation led to higher estimates of non-response, with a mean of 4.9% and a confidence interval from 2.2 to 16.1% (Table 1, Supplement Table S3a).

Fig. 1
figure 1

Cumulative distributions, with 95% confidence intervals, of predicted absolute fertilizer response for maize in Malawi (mz.mlw09) and Soybean in Ghana (Sy.gh11). Black represent the “empirical” distribution assuming zero residual error. Red represent the distribution under the assumption of a residual error of 697 kg ha−1 for cereals and 250 kg ha−1 for legumes. mz.mlw09 refers to maize grown in 2009 season in Malawi; Sy.gh11 to soybean grown in 2011 season in Ghana. Other studies are presented in Supplement Figure S1 for cereals and Figure S2 for legumes

For legumes, the mean proportion of absolute, residual-corrected non-response was relatively high (7.4%), with a confidence interval from 2.0 to 27.8% (Table 1). The highest proportion of non-response (50%) was found in the groundnut study in Zimbabwe, associated with a mean response close to 0 (Table S4b). Only 5 out of 27 studies had a proportion of non-response above 5% at the lower confidence limit, whereas 6 studies had a zero percent non-response at the upper confidence limit. Not correcting for residual error, again led to a substantially higher mean proportion of non-response of 17% with a confidence interval from 11.8 to 33.7% (Table 1, Supplement Table S4a).

Relative response

In terms of the relative response to the fertilizer application intensity defined by CNE, the mean response was 5.8 kg grain kg−1 CNE for cereals, ranging from 2.8 to 9.8 kg grain kg−1 CNE (Table 1, Supplement Table S5). The corresponding values for legumes were 4.9 kg grain kg−1 CNE with a range of −0.67 to 13.5 kg grain kg−1 CNE (Table 1, Supplement Table S6).

Variations in the relative response for different studies are shown in Fig. 2, and in Supplement Figure S3 for cereals and Figure S4 for legumes, and indicate that, for all cereal studies, the lower confidence boundary non-response was 0.

Fig. 2
figure 2

Cumulative distributions, with 95% confidence intervals, of predicted relative fertilizer response for maize in Malawi (mz.mlw09) and Soybean in Ghana (Sy.gh11). mz.mlw09 refers to maize grown in 2009 season in Malawi; Sy.gh11 to soybean grown in 2011 season in Ghana. Other studies are presented in Supplement Figure S3 for cereals and Figure S4 for legumes

The estimated proportions of non-response in cereals were very low, with a mean of 0% and a confidence interval of 0 to 7.5% (Table 1). This upper limit was similar to that of the absolute response corrected with fixed residual error. For legumes, the mean proportion of non-response was 15.9%, somewhat higher compared to that in residual error-corrected absolute response, but not significantly so considering the width of the 95% confidence interval.

Spatial representativeness and spatial patterns in response

A spatial bias was evident in the selection of both cereal and legume trial locations but was more pronounced in the latter (Fig. 3). In both cases, the probability of being a trial site was highest around the actual trial locations, but for legumes, the large number of trial locations belonging to a single soybean study caused a clear bias towards Nigeria. This was reflected in a representativity measure of only 35 out of 63 percent for legumes, compared to 51 out of 63 percent for cereals, indicating that a spatial sampling bias was present. The extent to which this spatial bias is expected to affect the overall estimates of non-response would depend on the relation between response estimates and geospatial factors. After correcting for individual study, the response variation showed evidence of spatial auto-correlation in legumes (p < 0.0001) but not in cereals (p = 0.5249). In both cases, random forest predictions, using the full set of geospatial covariates, explained only a negligible amount of variation in relative response, 0.5 and 4% for cereals and legumes respectively, the same prediction accuracies as observed for a model with latitude and longitude only. This implies that there was no predictable spatial or environmental variation in the observed fertilizer responses in our data and suggests that there is no reason for estimates of non-response to be different if our study included trials located elsewhere. Hence, under the assumption of representativity, we attempted to provide a rough estimate of the total cropping area in sub-Saharan Africa that would be expected to be non-responsive to fertilizer application. Taking published estimates of total cropland areas planted to cereals as 52 million ha (van Ittersum et al. 2016) and to legumes as 27 million ha (Abate et al. 2012), and average levels of non-response (absolute with correction for residual error, and CNE relative response combined) of 0.5 and 12% found in the present study, respectively for cereals and legumes, the point estimate of the total area of non-response to fertilizers for the two crop types would be 260,000 ha for cereals and 3,240,000 ha for legumes.

Fig. 3
figure 3

Probability of cropped locations of sub-Saharan Africa to be a trial site based on the locations of trials used in the studies for cereals or legumes. A = cereals, B = Legumes. Black crosses indicate the location of trials used in the studies

When ignoring plot-level residual variation (absolute response without accounting for residual error or empirical response), the average non-response of 4.9% for cereals and 17% for legumes would lead to corresponding point estimates of total non-responsive area of 2,548,000 ha and 4,590,000 ha respectively.

Discussion

Prevalence of non-response to fertilizers

The present study used different statistical approaches to quantify the field-level variability in crop response to fertilizers from a collection of nonreplicated on-farm trials in SSA.

Our results show that while significant variations in response exist for both crops, actual agronomic non-response is relatively rare among on-farm trials and most probably below 1% and 15% of fields respectively for cereals and legumes. For cereals, accounting for plot-level residual error, either by fixing it to published values or by estimating it using our regression approach, produced low estimates of non-response compared to simple, empirical estimates of non-response, which do not consider the residual error. Studies have reported agronomic non-responses for maize in the range of 10–21% fields (Shehu et al. 2018; Ichami et al. 2019; Kihara et al. 2016). These proportions are higher than the mean in our study (Table 1) but mostly within the upper confidence limits in the option of non-consideration of residual error (Supplement Table S3a). Without residual error correction, only 3 studies had the mean proportion non-response within the reported range, whereas none of the studies was in that range when the residual error was considered (Supplement Table S3a&b). It seems therefore that although non-response exists, its magnitude may be lower than often reported, as long as residual variation is considered in the analysis.

The occurrence of agronomic non-response was more pronounced in legumes regardless of the approach used, and the wide confidence interval makes all approaches relatively similar. Studies reporting non-response to fertilizers in legumes are rare and information on that topic scanty. In on-farm fertilizer trials conducted in DR Congo, Kenya, Nigeria and Tanzania, Roobroeck et al (2021) reported 18–62% non-responsive fields for soybean, when non-response was defined as a failure to increase the yield of unfertilized control above 150 kg ha−1. Ronner et al (2016), using 10% yield increase as the increase needed for a treatment effect to be visible for farmers, reported 10–40% fields which did not reach that benchmark in Northern Nigeria. These ranges are probably not very different if a common threshold for non-response could be used. Defining a universal benchmark for response could improve the estimation of the occurrence of non-response in legumes.

Regardless of the crop, the range in the proportions of non-response estimated using the regression approach, with CNE as measure of general fertilizer application intensity, was similar to that when a fixed residual error was considered (Table 1). The regression-based approach solves two limitations to the quantification of fertilizer response variation. First, it implements a single measure of fertilizer application intensity that can be used for data on all types of fertilizers (single, multi-nutrient). Second, it avoids inflating variability by estimation plot-level residual variation from the regression model. Although attractive, there are obvious limitations to this approach. First, by using a single measure of application intensity the interpretation of response is not always straightforward, since a soil may be unresponsive for single nutrients only, requiring caution when interpreting results. Second, the use of CNE implies assumptions on crop nutrition, including a linear response at lower nutrient rates, that may not be entirely accurate and involve crop specific parameters that may be adjusted over time, potentially making published values of agronomic efficiency obsolete. Nonetheless, we consider this approach to have promise as a basis for diagnosing general problems that inhibit crop responses to nutrient applications using the type of nonreplicated data that is typically available in the African smallholder context.

Representativeness of trial locations and spatial patterns of non-response

One may question the extent to which our reported proportions of non-response is representative for sub-Saharan Africa, considering that our trial locations only cover a relatively small portion of the target region. Indeed, on one hand, we found a rather strong spatial bias in our sample of trials, particularly in the case of legumes. On the other hand, we could not predict any meaningful amount of response variation, after correcting for individual studies, based on climatic and environmental covariables. This lack of obvious dependence on environmental factors suggests that data from other parts of the region can be expected to yield similar outcomes, although results may vary considerably between individual studies, especially for legumes. Therefore, accepting the current average level of non-response as representative, we can provide a rough estimate of non-responsive cultivated area as 260,000 ha and 3,240,000 ha for cereals and legumes respectively, though there was a wide confidence interval for the proportion of non-response. Failing to account for plot-level residual variation (i.e. relying on simple empirical response data) would result in estimates that are tenfold higher for cereals and 1.4-fold higher for legumes.

The fact that variation in responses to fertilizers was not explained by any climatic or topsoil factors suggests that the variation primarily reflects transient weather or field-level soil effects, implying that the data used does not allow to identify and target specific areas of high or low fertilizer efficiency, something that has been reported in other studies (Ronner et al. 2016). In fact, studies trying to identify the biophysical properties (soil, rainfall) that cause non-response to fertilizers have generated inconsistent results, attributed to the interactions between factors (Roobroeck et al. 2021; Kihara et al. 2016). Zingore et al (2007) reported that low soil organic C was the main cause of the non-response in their study. In opposite, non-responsive fields in Kihara et al (2016), and Shehu et al (2018), had relatively higher soil organic C than the responsive fields, even though they did not show high yields in the unfertilized treatments. The two studies attributed non-response to imbalanced soil nutrients, including secondary and micronutrients, but other, not considered factors likely also play an important role.

Conclusions

The two approaches used here demonstrate that it is possible to account for plot-level residual variation in non-replicated on-farm trials, and that using a general measure of fertilizer application intensity allows for joint analysis and comparison of disparate datasets on nutrient responses. Our study also identifies some clear limitations that further research will hopefully help to overcome. First, the estimates of plot-level residual variation would improve by making specific adjustments to on-farm trial designs for this purpose. The need for sufficient coverage of geographic and farming systems heterogeneity thereby has to be balanced with that for estimates of residual variation. Options include the inclusion of duplicate plots for certain treatments or increasing the number of evaluated nutrient levels. Even applying a small number of on-site replications across the study region could be an option in this regard. Second, while our regression approach for estimating fertilizer response is attractive, it works best when the relation between the measure of application intensity and yield is expected to be linear. While this may be true for CNE under some assumptions, there is a clear need to look critically at what type of nutrient response functions might perform best in this regard and validate them empirically if possible. Third, the relatively strong spatial bias reported for our dataset is a reminder of the need for better sampling design when setting up on-farm nutrient response trials. Logistic and organization constraints very often lead to clustered and spatially unrepresentative trial sites. While it does not necessarily invalidate the outcomes, it is obvious that a systematic sampling approach that ensures proper representation of a pre-defined target area would hold many advantages in terms of analysis and extrapolation of results, an aspect that some recent initiatives are considering (e.g. African Cassava Agronomy initiative (https://acai-project.org) and which should be widely adopted.