1 Introduction

Economic development in Zambia is strongly linked to productivity growth in agriculture and sustainable management of farming systems (IAPRI 2020). Approximately 75% of the population rely on smallholder farming for their livelihoods (MoA/CSO 2019). Maize (Zea mays L.) is the main staple food crop in the country, as in other Southern African countries (Smale 1995), with a harvested area of approximately 1 Mha and providing 50–90% of the caloric intake of the national population. Maize production in Zambia is associated with low use of mineral fertilizers and low adoption of other sustainable intensification practices (e.g., conservation agriculture and improved maize legume cropping systems; Arslan et al. 2014). Poor soil fertility and adverse effects of increased climate variability reduce farmers’ financial resource base (Komarek et al. 2019) and contribute to low adaptive capacity of maize-based farming systems in the country (Cairns et al. 2013).

Smallholder farming systems in sub-Saharan Africa are highly diverse and farm typologies have proven useful to identify farms with different levels of resource endowments and livelihood strategies (Tittonell et al. 2010). The same is true in Zambia where approximately 1.6 million farmers are considered small scale with 70% having farm sizes below 2 ha, 25% having farm sizes between 2 and 5 ha, and 5% having farm sizes between 5 and 20 ha (Ngoma et al. 2019), and where poor subsistence farming co-exists with more market-oriented emerging commercial farming (Alvarez et al. 2018). Grain legumes are often produced alongside maize (Mwila et al. 2021) and livestock is kept in dry land areas of Southern and Western provinces characterized by low and erratic rainfall. Identifying different farm types is a means to consider farmers’ socio-economic context and resource endowment when promoting agricultural technologies (e.g., Jayne et al. 2019) and an important first step to target technologies for different farm types (Berre et al. 2017).

Yield gaps of rain-fed crops are defined as the difference between the water-limited yield (Yw) and the actual yield (Ya) observed in farmers’ fields (van Ittersum et al. 2013). Yw is defined as the maximum yield that can be obtained under rain-fed conditions in a well-defined biophysical environment and without nutrient limitations or yield reductions due to pests, diseases, or weeds. Currently, Ya for maize in Zambia ranges between 1.4 and 3.0 t ha− 1, which is considerably lower than a Yw of 8–15 t ha− 1 that could be achieved with best agronomic practices (Figure 1; van Ittersum et al. 2016). Yield gap decomposition is a means to unpack the causes behind yield gaps as it identifies the key crop management factors limiting or reducing Ya (Silva et al. 2017). The resource yield gap indicates the scope to increase Ya through higher amounts of inputs, whereas the efficiency yield gap indicates the scope to increase Ya through fine tuning current management practices and technologies in terms of the time, space, and application form of these inputs. The technology yield gap indicates the possible yield increases beyond current best performing technologies on-farm. This decomposition is important to derive policy recommendations and prioritize research and development interventions towards increasing maize yields in existing cropland as food security and biodiversity conservation are dependent on such improvements.

Fig. 1
figure 1

Maize yield gaps in Eastern Zambia. Maize plants on the left refer to an on-farm baby trial under good agronomic management (i.e., timely sowing, high plant population, hybrid maize variety, and proper fertilizer inputs). Maize plants on the right show crop performance under actual farm management. Credits: J.V. Silva, February 2022.

This is the first study providing a yield gap decomposition for maize in Southern Africa and combining it with farm typology delineation to identify what interventions are needed, where, and for which farm types to narrow existing yield gaps. We hypothesized that the magnitude and the determinants of the yield gap differ across farm types with different production orientations and resource endowments. The main objective of this study was thus to characterize farm diversity across maize-based farming systems in Zambia, and to identify the key limiting factors to maize yield gaps across the diversity of farms in the country. The analyses built upon a nationally representative household survey covering the 2011/12, 2014/15 and 2017/18 growing seasons (Figure 2; IAPRI 2012, 2015, 2019). Multivariate statistical techniques were used to construct the farm typology (Alvarez et al. 2018) and yield gaps were decomposed using a combination of frontier analysis and crop modeling (Silva et al. 2017). The latter was used to simulate Yw and estimate the nitrogen (N) rates needed to reach it, which were then used to benchmark maize yields and N rates observed in farmers’ fields.

Fig. 2
figure 2

Spatial distribution of the households included in the Rural Agricultural Livelihoods Survey (RALS) across Zambia. Background layer displays the total annual rainfall (in mm) average over the period 2000–2019. Source: Climate Hazards Group Infra-Red Precipitation with Station data (CHIRPS; Funk et al. 2015).

2 Materials and methods

2.1 Rural Agricultural Livelihoods Survey (RALS)

Data from the Rural Agricultural Livelihoods Survey (RALS) was used to identify the main farm types engaged in maize production and to determine the drivers of maize yield variability in Zambia. The RALS comprises a panel of households interviewed over three different periods and is statistically representative of the rural population at the province and national levels. The surveys were conducted by the Indaba Agricultural Policy Research Institute (IAPRI) in collaboration with the Ministry of Agriculture and the Zambia Statistics Agency. The first round of RALS was conducted in May/June 2012, the second in June/July 2015, and the third in June/July 2019. The months when the RALS were conducted coincide with the harvesting period of the previous agricultural production season and with the agricultural marketing season. A total of 8839, 7934, and 7241 households were surveyed in 2012, 2015, and 2019, respectively, with 6531 panel households interviewed in all three waves.

The spatial distribution of households included in the RALS is provided in Figure 2. The survey requested information on farm(er) characteristics and on field-specific crop management practices, thus meeting the requirements for yield gap decomposition (Beza et al. 2017). A unimodal rainfall regime with one wet season lasting from November to April in each year was observed across the country (Herrmann and Mohr 2011). Yet, annual rainfall was lowest in the Southern and Western regions of Zambia, with an average between 600 and 800 mm per year, intermediate in the central regions, with an average between 800 and 1200 mm per year, and highest in the Northern regions, with an average above 1200 mm per year (Figure 2).

Secondary data were retrieved from spatial products using the GPS coordinates of the individual households. Climatic data were retrieved from the climate zone scheme of the Global Yield Gap Atlas (GYGA) and comprised three variables: growing degrees days, temperature seasonality, and aridity index (Van Wart et al. 2013). Soil data on clay, silt and sand contents, pH in water and exchangeable acidity were retrieved from SoilGrids at 250m resolution (Hengl et al. 2017) and on rooting depth and soil available water from AfSIS-GYGA (Leenaars et al. 2015). Simulated water-limited yields for maize were retrieved from GYGA. Rainfall data were obtained from Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS, Funk et al. 2015) and used to determine the dekad corresponding to the onset of the rains for each of the growing seasons surveyed. The onset of the rains was defined as the first dekad with a cumulative rainfall equal to or greater than 25mm between the months of September and December (Hachigonta et al. 2008).

2.2 Farm typology delineation

The farm typology was constructed using principal component analysis (PCA) followed by hierarchical clustering (HC; Alvarez et al. 2018) on the pooled data. PCA is a technique used to reduce the number of dimensions in a dataset to a few synthetic and uncorrelated variables called principal components. The principal components are linear combinations of the original variables, which can be conceptualized as the directions of high-dimensional data that capture the maximum amount of variance and project it onto a smaller dimensional subspace. The principal components retained for analysis were those with an eigenvalue greater than one. PCA was conducted in R using the dudi.pca() function of the ade4 package (Dray and Dufour 2007). HC refers to the hierarchical decomposition of the data based on group similarities and was then applied to a distance matrix calculated for the principal components selected following the PCA. Similarities between clusters were calculated using the Ward method. The final number of clusters was identified through visual inspection of the resulting dendrogram aiming to reach not less than three and not more than five clusters. HC was conducted with the hclust() function of the R stats package (R Core Team 2013).

Thirteen variables aggregated at the farm level were used to construct the farm typology, seven of which were structural variables (i.e., describing the structure of the household, variables that tend to remain constant from one season to the next) and six of which were functional variables (i.e., describing the performance of the household). The farm(er) characteristics included in the typology were the age of the household head (years), household size (#), and area of owned cultivated land (ha) at the time of the surveys. Resource endowments were captured with variables referring to the cash available to each household (ZMW), farm assets calculated as the sum of the assets owned by each household multiplied by their respective economic value (in Zambian Kwacha, ZMW), total cultivated land in ha, and livestock ownership in tropical livestock units (TLU; Jahnke 1982) for each survey year. The total amount of maize produced, sold and bought per farm (all in kg) and the area cultivated with maize and legumes (both in ha) were included to assess the level of engagement of each farm in maize and legume production, whereas the total fertilizer use at farm level (in kg) was included to assess the level of agricultural intensification of each farm. Variables were screened for outliers and standardized using the scale() function in R to avoid the influence of different levels of variation due to the unit of measurement of each variable. The mean value of each variable was compared for each farm type and the number of households per farm type were summarized per province and per year.

2.3 Yield gap decomposition

2.3.1 Concepts and definitions

Yield gap decomposition (Silva et al. 2017) relies on four yield levels to diagnose agronomic constraints in cropping systems at regional level (Doré et al. 1997). In addition to Yw and Ya (van Ittersum et al. 2013), the highest farmers’ yield (YHF) is defined as the average top 10th percentile of farmers’ yields whereas the technically efficient yield (YTEx) is defined as the maximum yield that can be achieved for a given input level in a well-defined biophysical environment. The efficiency yield gap refers to the difference between YTEx and Ya and is explained by suboptimal crop management in relation to time, space and form of inputs applied. The resource yield gap refers to the difference between YHF and YTEx and is explained by suboptimal amounts of inputs applied. The technology yield gap refers to the difference between Yw and YHF and is explained by low input use and the lack of use of specific technologies. The feasible yield (Yf) was also considered to unpack the contribution of suboptimal input use (i.e., resource yield gaps) and variety choice to the technology yield gap. Yf is defined as the maximum yield with available technology and best-practice management but with no economic constraints (van Dijk et al. 2017).

2.3.2 Stochastic frontier analysis

Stochastic frontiers account for two random errors, vit (random noise) and uit (technical inefficiency), assumed to be independently distributed from each other when estimating production functions (Kumbhakar and Lovell 2000). A Cobb-Douglas functional form (Equation 1), comprising only first-order terms in the production frontier, was used to describe the relationship between maize yield and a vector of agronomic relevant variables defined according to principles of production ecology (van Ittersum and Rabbinge 1997). A translog functional form was also fitted to test the effect of second-order terms (i.e., squared and interactions) on maize yield. The results of the translog functional form are presented in Supplementary Material given the large number of estimated parameters (Supplementary Table 3). Inefficiency effects, i.e., the drivers of the efficiency yield gap, were also estimated through a one-step estimation of the production frontier and the second-stage regression (Equation 2; Battese and Coelli 1995), as follows:

$$ \begin{array}{@{}rcl@{}} { ln \ y_{it} } \ & =& \ \alpha_{0} \ + \ {{\sum}_{k}^{K}} \ \beta_{k} \ ln \ x_{kit} \ + \ v_{it} \ - \ u_{it} \end{array} $$
(1)
$$ \begin{array}{@{}rcl@{}} { u_{it} } \ & =& \ {{\sum}_{j}^{J}} \ \delta_{j} \ ln \ z_{jit} \ + \ \epsilon_{it} \end{array} $$
(2)
$$ \begin{array}{@{}rcl@{}} { v_{it} } \ & \sim& { N(0, {\sigma_{v}^{2}}) } \end{array} $$
(3)
$$ \begin{array}{@{}rcl@{}} { u_{it} } \ & \sim& { N^{+}\left( {{\sum}_{j}^{J}} \ \delta_{j} \ ln \ z_{jit}, {\sigma_{u}^{2}}\right) } \end{array} $$
(4)
$$ \begin{array}{@{}rcl@{}} { \text{Eff. Yg}_{it} } \ & =& \ 1 \ - \ exp(-u_{it}) \end{array} $$
(5)
$$ \begin{array}{@{}rcl@{}} { \text{Y}_{\text{TEx}_{it}} } \ & =& \ y_{it} \times exp(-u_{it})^{-1} \end{array} $$
(6)

where yit represents the maize yield in field i and in year t, xkit is a vector of agronomic inputs k used on field i and year t and, α0 and βk are parameters to be estimated. The vector zjit comprises the j crop management drivers of the efficiency yield gap in field i and in year t. YTEx and Yf were estimated for each field using the Cobb-Douglas model described earlier (Equations 1 and 6), but without considering inefficiency effects. Model parameters were estimated for the pooled data and for each farm type with maximum likelihood using the sfa() function of the R package frontier (Coelli and Henningsen 2013). Continuous variables were ln-transformed prior to the analysis and data were used as a cross-section rather than as a panel, hence technological change and time-(in)variant technical efficiency were not assessed.

The vector of inputs xkit was designed to capture the effect of growth-defining, growth-limiting, and growth-reducing factors on maize yield (Silva et al. 2017). Growth-defining factors were controlled for with the following variables: growing degrees day considering a base temperature of 0 C (Van Wart et al. 2013), temperature seasonality defined as the standard deviation of average monthly temperatures (Van Wart et al. 2013), seed rates (kg ha− 1), replanting (yes or no), and variety type (open-pollinated, hybrid, or unknown). Growth-limiting factors related to water included variety classification according to drought tolerance (yes, no, or unknown), aridity index defined as the ratio between total annual precipitation and annual total potential evapotranspiration (Van Wart et al. 2013), soil rooting depth and soil available water (Leenaars et al. 2015), soil texture class constructed based on spatial predictions of clay, silt, and sand contents (Hengl et al. 2017), location of the field in a wetland (yes or no), and presence of erosion or flood control practices (yes or no). Growth-limiting factors related to nutrients included the rate of N applied (kg N ha− 1), pH in water, and exchangeable acidity (Hengl et al. 2017). Finally, growth-reducing factors were captured with the number of weeding operations (none or one, two, and three or more), herbicide use (yes or no), and insecticide use (yes or no). Sowing date, expressed in weeks after the onset of the rains, and date of the first weeding operation, expressed in weeks after sowing, were included in the model as inefficiency effects. The variance inflation factors indicated no multicollinearity between the considered variables.

The Cobb-Douglas frontier model without inefficiency effects was used to predict Yf for specific values of some of the input variables. To do so, seed rate was set at 25 kg ha− 1, which is the recommended seed rate for maize in Zambia. N application rate was set at 350 kg N ha− 1, which is the minimum N requirement for a target of 80% of Yw in the high rainfall areas of Zambia (www.yieldgap.org). It was further assumed that drought tolerant hybrid maize varieties were used in combination with replanting of maize seedlings, herbicides, and insecticides. The estimation of Yf further assumed that fields with a pH in water below 6.5 were corrected to a pH in water of 6.5 and that fields with exchangeable acidity above 0.2 cmol+ kg− 1 were corrected to that level in fields with pH below 6.5.

2.3.3 Distribution of actual yields

Farmers’ fields were categorized as highest, average, and lowest yielding fields based on the distribution of Ya observed for a given variety type and climate zone x soil type combination. Highest yielding fields were identified as those with Ya above the 90th percentile. Average yielding fields were identified as those with Ya between the 10th and the 90th percentiles and lowest yielding fields as those with Ya below the 10th percentile. Highest (YHF), average (YAF) and lowest farmers’ yields (YLF) were calculated as the average Ya for the fields in each respective group. The field classification was specific to each of three variety types and to each unique climate zone (Van Wart et al. 2013) and soil type (Hengl et al. 2015), so genotype and biophysical factors were controlled for when comparing maize yields and management practices across the different fields.

2.3.4 Global Yield Gap Atlas (GYGA)

Yw for rain-fed maize across Zambia was obtained from GYGA. Maize Yw in Zambia was simulated with the HybridMaize crop model (Yang et al. 2004) for the period 2001–2010 (see www.yieldgap.org/Zambia for further details). The average Yw data over the period 2001–2010 for a given climate zone was used here to benchmark Ya in farmers’ fields and the technology yield gap was then calculated as the difference between Yw and YHF for unique climate zone x soil type x variety combinations. It was not possible to make use of year-specific Yw data for the same growing seasons in which the surveys were conducted due to lack of Yw data for the growing seasons surveyed, which introduces uncertainties in the magnitude of the overall yield gap estimated, particularly in regions with erratic rainfall. Therefore, coefficients of variation of maize Yw were computed to better characterize inter-annual yield variability across Zambia. The N rates needed to reach 80% of Yw were also retrieved from GYGA (ten Berge et al. 2019) to benchmark N used in farmers’ fields.

3 Results

3.1 Maize-based farming systems in Zambia

Rural agricultural households across Zambia cultivate on average 2.2 ha of land and own 4.5 tropical livestock units (TLU; Figure 3A and B). Yet, the median values were considerably lower with 50% of the surveyed households cultivating less than 1.6 ha and owning less than 1.1 TLU. Maize was cultivated throughout the country with an average and median maize area share of 67% of the total cultivated (Figure 3C). This corresponds to an average maize area per farm of about 1.4 ha. Fertilizer use across the country was on average 140 kg ha− 1 of cultivated land, with 50% of the surveyed farms using less than 110 kg of fertilizer per ha of cultivated land across the three survey periods (Figure 3D).

Fig. 3
figure 3

Main characteristics of farming systems in Zambia and their variability at national level and per province: (A) cultivated land in ha, (B) livestock ownership in tropical livestock units, (C) proportion of the cultivated land occupied by maize in %, and (D) fertilizer used per ha of cultivated land. Data for the entire country are highlighted in dark gray. Asterisks show the mean value across the farm-year combinations of each province.

There were wide variations in total cultivated land, livestock ownership, maize share of cultivated cropland, and total fertilizer use across the different provinces (Figure 3 and Supplementary Table 1). The average total cultivated land was larger than the national average in the Southern (3.4 ha), Central (2.8 ha), and Eastern provinces (2.4 ha), and lower in all other provinces (1.4–2.1 ha; Figure 3A). The same was true for livestock ownership which was on average 11.9, 5.5, and 4.4 TLU in the Southern, Central, and Eastern provinces, respectively, and much lower in all other provinces, notably those in the Northern part of the country (Figure 3B). Maize represented more than 50% of the cultivated land for at least 50% the surveyed farms in all provinces (Figure 3C). The average maize share of cultivated cropland was above 80% in the provinces of Lusaka and Copperbelt, between 70 and 75% in the Southern, Northwestern, and Central provinces, and about 60% in the Eastern, Muchinga, and Luapula provinces. The Northern province was where the maize share of cultivated cropland was lowest, ca. 55% of the total cultivated land. Finally, fertilizer use was below the national average in the Southern, Eastern, and Western provinces (50–100 kg ha− 1), and slightly above the national average in the other provinces (Figure 3D).

Table 1 Maize yield potential (Yp) and water-limited yield (Yw) for eight weather stations located across Zambia. Means and coefficients of variation (CV) are provided for the years 2000–2010. ‘N requirements’ refer to the minimum N rates needed to reach 80% of Yw in each location, averaged over the same period as the yield ceilings. Source: www.yieldgap.org; van Ittersum et al. (2016); ten Berge et al. (2019).

3.2 Farm types and importance of maize

The farm typology was constructed using principal component analysis (PCA) followed by hierarchical clustering (HC). Four principal components had an eigenvalue greater than one and were retained for further analysis. These four principal components explained approximately 60% of the cumulative variance in the data. Three clusters were identified in the dissimilarity dendrogram of the HC analysis, corresponding to three distinct farm types. In short, Farm Type 1 (FT1) exhibited a low dependency on maize production and consumption, Farm Type 2 (FT2) were net buyers of maize and exhibited low levels of maize area and production, and Farm Type 3 (FT3) were market-oriented maize producers engaged in agricultural activities, as indicated by the large number of livestock kept and large amount of fertilizer used (Figure 4 and Supplementary Table 2).

Fig. 4
figure 4

Radar charts represent all studied quantitative variables on individual axes starting from the same central point for each farm type. The variables displayed were used in the principal component analysis followed by hierarchical clustering to delineate the farm typology for the pooled data. Data are scaled with the average value of each variable for all farm types (cf. Supplementary Table 2). The spatial and temporal distribution of the farm types is provided in Supplementary Figures 2 and 2, respectively. Abbreviations: ‘HH’ = household, ‘TLU’ = tropical livestock units.

Table 2 Parameter estimates of the stochastic frontier model fitted for maize yield in Zambia during the growing seasons of 2010/11, 2013/14, and 2017/18. The same model was fitted to the pooled sample (Zambia) and each of the farm types identified (Figure 4). Reference values: Year = ‘2012’, Replant = ‘No’, Variety = ‘OPV’, Drought tolerant = ‘No’, Soil = ‘Clay’, Wetland = ‘No’, Erosion/Flood = ‘No’, Weeding = ‘One or none’, Herbicide use = ‘No’, Insecticide use = ‘No’. Units: WFO = week from onset of rains; WAS = week after sowing. Significance is indicated by the codes: ‘***’ 0.1%, ‘**’ 1%, ‘*’ 5%, ‘#’ 10%. n.a. = not applicable.

The age of the household head did not vary significantly across farm types (Figure 4) whereas household size was lower for FT1 (5.5 individuals), intermediate for FT2 (7.2 individuals), and higher for FT3 (8.2 individuals). FT1 owned 1.5 TLU and cultivated a total of 1.4 ha, 0.8 ha of which were allocated to maize and 0.3 ha to legumes, and used 140 kg of fertilizer per farm per year. FT1 produced an average of 1500 kg of maize, sold 600 kg of maize, and bought 50 kg of maize per farm per year. FT2 had access to 2.7 TLU and cultivated a total of 1.3 ha, of which 0.8 and 0.1 ha were cultivated with maize and legumes, respectively. Fertilizer use was lower in FT2 than in FT1 (Figure 4) with a rate of 90 kg fertilizer per farm per year, and so was maize production and maize sold (Figure 4), with an average of 1000 kg and 250 kg per farm per year, respectively. FT3 used 600 kg of fertilizer, produced 6500 kg of maize, sold 1600 kg of maize, and purchased 80 kg of maize per farm per year.

There were slight differences in the spatial distribution of the three farm types (Supplementary Figure 1). In Western province, nearly 70% of the farms were classified as FT2 and only 10% of the farms were classified as FT3. By contrast, in Southern and Central provinces as much as 50% of the farms were classified as FT3 whereas 20% and 30% were classified as FT1 and FT2, respectively. In Luapula, Muchinga, Northern, and Northwestern provinces, 35–40% of the farms were classified as either FT1 or FT3. Farms were evenly distributed amongst farm types (ca. 30% per farm type), in the Eastern and Copperbelt provinces. There were no major changes in farm type classification for single farms over time (Supplementary Figure 2): out of 5238 farm-year combinations, 715 were classified as FT3, 412 as FT2, and 209 as FT1 in the three rounds of the survey. Other changes in farm type classification were not consistent and were likely to reflect fluctuations in farm performance over time.

3.3 Yields and yield gaps of rain-fed maize

Maize Ya across all farm-year combinations analyzed ranged between nil and 9.0 t ha− 1 (Figure 5). Ya was smaller and more variable in 2019 than in 2012 and 2015 harvest years (Figure 5A), with average values of 2.6, 2.4, and 2.2 t ha− 1 and a coefficient of variation (CV) of 67, 67, and 77% during the 2012, 2015 and 2019 harvest years, respectively (Figure 5A). There were also clear differences in the distribution of Ya across agro-ecological zones, farm types, and variety types. Ya was smallest and most variable in agro-ecology IIb (mean = 1.3 t ha− 1, CV = 82%) and greatest and least variable in agro-ecology III (2.7 t ha− 1, 61%), with intermediate values observed in agro-ecology IIa and I (Figure 5B). Ya was also smallest and most variable for FT2 (1.8 t ha− 1, 76%), intermediate for FT1 (2.4 t ha− 1, 66%), and greatest and least variable for FT3 (2.9 t ha− 1, 61%; Figure 5C). Finally, Ya was on average 1.9 and 2.9 t ha− 1, with a CV of 61 and 73%, for open-pollinated and hybrid maize varieties, respectively (Figure 5D).

Fig. 5
figure 5

Maize actual yield variability across years (A), agro-ecology zones (AEZ, B), farm types (C), and variety types (D), and maize yield response to seed rate (E) and N applied (F). Lines in (A)–(D) display empirical cumulative distribution functions. Mean values (and coefficients of variation) are as follows: 2.6 t ha− 1 (67.0%) for year 2012; 2.4 t ha− 1 (66.6%) for year 2015; 2.2 t ha− 1 (76.8%) for year 2019; 2.1 t ha− 1 (74.0%) for AEZ I; 2.4 t ha− 1 (70.3%) for AEZ IIa; 1.1 t ha− 1 (82.0%) for AEZ IIb; 2.7 t ha− 1 (61.4%) for AEZ III; 2.4 t ha− 1 (66.2%) for farm type 1; 1.8 t ha− 1 (76.2%) for farm type 2; 2.9 t ha− 1 (61.3%) for farm type 3; 2.9 t ha− 1 (60.9%) for hybrid varieties; 1.9 t ha− 1 (73.3%) for open-pollinated varieties. Data in (E) and (F) are aggregated per household × field type, and lines display statistically significant ordinary-least square regressions fitted to highest (YHF), average (YAF), and lowest yielding fields (YHF, quadratic for seed rate and linear for N).

Simulated yield potential (Yp) ranged between 13 and 19 t ha− 1 in the Southern and Northern provinces, respectively, without a clear spatial distribution across the country (Table 1). Conversely, Yw was greatest and least variable in the Northern, Luapula, and Muchinga provinces, intermediate in the Eastern and Central provinces, and smallest and most variable in the Southern and Western provinces (Table 1). Yw was on average 18 t ha− 1 in the Northern province, 13 t ha− 1 in the Eastern province, and 9.5 t ha− 1 in the Western and Southern provinces. The respective CV for Yw was 5, 30, and 45% for the Northern, Eastern, and Western and Southern provinces, respectively (Table 1). The difference between Yp and Yw indicates the yield gap due to water limitations, whose magnitude increased along a North-South gradient (Table 1) characterized by lower and more erratic rainfall (Figure 2). N rates needed to reach 80% of Yw were greater than 250 kg N ha− 1 in the Northern, Luapula, and Muchinga provinces, ca. 230 kg N ha− 1 in the Eastern province, and about 170 kg N ha− 1 in the Western and Southern provinces (Table 1).

Yield gap closure (i.e., the ratio between Ya and Yw) was on average 21% of Yw and varied with agro-ecological zone, province, and farm type (Figure 6). Yield gap closure was greatest in agro-ecology I (35% of Yw), intermediate in agro-ecology IIa (23% of Yw), and smallest in agro-ecologies IIb and III (15% of Yw; Figure 6A and B). Yield gap closure per province was similar to that per agro-ecology (Figure 6B and E) because most of the Southern province is in agro-ecology I, the Central and Eastern provinces are in agro-ecology IIa, the Western province is in agro-ecology IIb, and the Northern, Northwestern, Luapula, Muchinga and Copperbelt provinces are in agro-ecology III. Finally, yield gap closure was on average 30% of Yw for FT3, 20% of Yw for FT1, and only 15% of Yw for FT2 (Figure 6C and F).

Fig. 6
figure 6

Maize yields and yield gaps in Zambia disaggregated by agro-ecological zones (A-D), provinces (B-E), and farm types (C-F). Panels in the top row display data in absolute terms (t ha− 1) and panels in the bottom row display data in relative terms (% of Yw). Codes: ‘AE’ = agro-ecological zone, ‘FT’ = farm type, ‘Efficiency Yg’ = efficiency yield gap, ‘Resource YgYHF’ = resource yield gap considering the highest farmers’ yields (YHF) as benchmark, ‘Resource YgYf’ = resource yield gap considering the feasible yield (Yf) as benchmark, ‘Technology Yg’ = technology yield gap.

Most of the yield gap was attributed to the technology yield gap, which accounted for 7.2 t ha− 1 (50% of Yw) on average, yet narrowing efficiency and resource yield gaps could more than double Ya for maize in Zambia (Figure 6). The efficiency yield gap was on average 1.6 t ha− 1 (14% of Yw) and the resource yield gap was on average 1.7 t ha− 1 (16% of Yw), which means that fine tuning current crop management practices and increasing input use to the level of highest yielding fields can increase yields from the current 2.4 t ha− 1 to 5.7 t ha− 1. The resource yield gap considering the feasible yield (i.e., maximum yield with available technology and best-practice management but with no economic constraints) as ceiling was small with an average of 1.0 t ha− 1 (7% of Yw). This means that resource-use efficiency in farmers’ fields is low and must be improved to realize the yield gains associated with increased input use and better technology. The large technology yield gap is thus a result of suboptimal input use compared to what is needed to reach Yw and of low resource-use efficiency of current farm practices.

There were slight differences between agro-ecological zones and provinces in the relative contribution of each yield gap to the overall yield gap (Figure 6). For instance, the relative contribution of the technology yield gap to the total yield gap was less than 10% of Yw in the Southern province (which is part of agro-ecological zone I; Figure 6D and E), whereas the relative contribution of the efficiency and resource yield gaps were ca. 20% and 30% of Yw. In Lusaka province (with areas also part of agro-ecological zone I), each of the three intermediate yield gaps accounted for ca. 20% of the total yield gap. The differences in the relative of contribution of the efficiency, resource, and technology yield gaps to the overall yield gap between these two provinces (Southern and Lusaka) and the other provinces is likely attributed to the low water-limited yield simulated, and hence small technology yield gap in absolute terms, for the Southern and Lusaka provinces (and respective agro-ecological zone, Figure 6A and B). There were also slightly differences in the causes of yield gaps for the different farm types (Figure 6C and F): the efficiency yield gap was slightly greater for FT3 (i.e., market-oriented maize farms) than for FT1 and FT2, whereas the opposite was true for the resource yield gap (Figure 6C and F).

3.4 Determinants of maize yield variability

The stochastic frontier model fitted to the pooled data revealed that seed rate, variety type, aridity index, soil available water, and herbicide use were the key drivers of maize yield variability (Table 2). The seed rate had a significant positive effect on Ya with a 1% increase in seed rate resulting in 0.33% increase in Ya. There was also a significant effect of variety on Ya, with hybrid varieties yielding ca. 13% more than open-pollinated varieties. The effects of temperature seasonality and replanting on Ya were also statistically significant, but the effect was small. Aridity index and soil available water had a significant positive effect on Ya with a 1% increase in these variables resulting into 0.50 and 0.20% increase in Ya. Ya in loamy sand soils were significantly lower (135%) than in clay soils and adoption of erosion and flood control practices increased Ya by 5%. N applied had a significant positive effect on Ya whereas exchangeable acidity had a significant negative effect on Ya, but in both cases the effect was small. Herbicide use had a significant positive effect on Ya, resulting in 12.5% greater Ya compared to fields where herbicides were not used. Finally, Ya was significantly lower in 2015 and in 2019 than in 2012 (cf. Figure 5A). The time of the first weeding, measured in number of days after sowing, had a significant negative effect on the efficiency yield gap, meaning that smaller efficiency yield gaps were observed when the first weeding was done at later dates, but again the effect was small.

The significance level and magnitude of the first-order terms derived from the survey data were comparable in both the Cobb-Douglas and translog stochastic frontier models (Supplementary Table 3). Yet, variables derived from secondary sources (temperature seasonality, aridity index, rooting depth, soil available water, pH in water, and exchangeable acidity) showed contrasting signs and different effect sizes (Supplementary Table 3). Quadratic terms were statistically significant for all continuous variables, except soil available water (Supplementary Table 3), indicating a quadratic effect of seed rate on Ya and a quadratic positive effect of N applied on Ya (cf. Figure 5E and F). There were negative interactions between seed rate and growing degree days, aridity index and N applied, and positive interactions between seed rate and temperature seasonality and pH in water. N applied showed a negative interaction with growing degree days, seed rate, rooting depth and soil available water, meaning that maize yield response to N decreased with increases in these variables.

The effect of seed rate and N applied on maize yield was further investigated for highest, average, and lowest yielding fields. Maize yield ranged between 0 and 1.5 t ha− 1, 1.5 and 4.0 t ha− 1, and 4.0 and 9.0 t ha− 1 for lowest, average, and highest yielding fields (Figure 5E and F). Seed and N rates were lowest in lowest yielding fields (16 kg ha− 1 and 54 kg N ha− 1), intermediate for average yielding fields (23 kg ha− 1 and 84 kg N ha− 1), and greatest for highest yielding fields (25 kg ha− 1 and 100 kg N ha− 1). There were no major differences in yield and input use for the different farm types across highest, average, and lowest yielding fields (data not shown). The quadratic effect of seed rate on yield was significant for highest and average yielding fields, but not for lowest yielding fields (Figure 5E), whereas the effect of N applied on yield was linear and positive for lowest, average, and highest yielding fields (Figure 5F). Yield response to N was greatest, intermediate, and smallest for average, highest, and lowest yielding fields, respectively.

The drivers of maize yield variability for each farm type were largely comparable to those observed for the pooled data (Table 2), as opposed to the results obtained for Northern, Eastern, and Southern provinces (Supplementary Table 4). For all farm types, seed rate, aridity index, soil available water, and N applied had a significant positive effect on Ya and Ya was significantly smaller in 2019 than in 2012. Variety type and herbicide use had a positive effect on Ya for FT1 and FT3, and fields weeded three or more times yielded 15% more for FT1, and 9% less for FT3, than fields weeded once or not weeded. Increasing temperature seasonality by 1% translated into increases in Ya of 28% for FT1, replanted fields yielded 11% less than non-replanted fields for FT3, and fields where erosion or flood control practices were adopted for FT2 had 11% greater Ya than fields where these practices were not adopted. Also for FT2, fields weeded twice yielded 10% more than fields with one or no weeding operations. The effects of soil type on Ya were not consistent across farm types. The seed rate and N applied had a significant positive effect of maize, and a similar effect size, independently of the province (Supplementary Table 4) and the effect of biophysical variables (e.g., aridity index and soil available water) was not significant when the model was fitted per province (Supplementary Table 4).

4 Discussion

Agricultural productivity must increase in sub-Saharan Africa with a view towards improved food security and reduced food imports with minimum crop expansion in biodiversity and carbon-rich natural habitats (e.g., Giller et al. 2021a; Jayne and Sanchez 2021; Giller 2020; Keating et al. 2014). Zambia is no exception to this narrative (Figure 1), where narrowing yield gaps up to 80% of Yw is needed for the country to reach cereal self-sufficiency by 2050 with cropland expansion (van Ittersum et al. 2016). Yield gap closure for rain-fed maize across Zambia is only ca. 20% of Yw (Figure 6), which is similar for other crops in other countries across sub-Saharan Africa (van Ittersum et al. 2016; Tittonell and Giller 2013). The large yield gap of rain-fed maize in Zambia is mostly attributed to the technology yield gap (Figure 6) indicating that more efficient production methods are needed to narrow maize yield gaps. Yet, narrowing efficiency and resource yield gaps through fine tuning current farm practices could more than double current yields (Figure 6). The latter can be achieved through improved timeliness and precision of management operations and through increases in input use to levels observed in highest yielding fields (Figures 5E and 5F). Similar findings regarding the relative importance of efficiency, resource, and technology yield gaps were reported for cereal farming systems in Eastern Africa (Silva et al. 2019, 2021; Assefa et al. 2020; van Dijk et al. 2017), pointing to the need for making inputs available to farmers at the right amount, cost, and time, and of targeting and packaging technologies in ways that increase adoption at farm level.

Seed and N rates, variety, weed control, and sowing date were the most important management drivers of maize yield variability in Zambia (Table 2). All these are well-known drivers of maize yield variability in Eastern and Southern Africa (e.g., Burke et al. 2020; Assefa et al. 2020). First, seed rate and variety type had a large impact on maize yield, with a 1% increase in seed rate resulting ca. 0.35% increase in maize yield and hybrid varieties yielding 12% more than traditional OPVs (Table 2). Seed rate might well be a proxy for plant population, a key factor controlling maize productivity in Southern Africa (Nyagumbo et al. under review). Second, the timing of the first weeding operation was an important driver of the efficiency yield gap (Table 2), reflecting the importance of timely weeding at the start of the growing season for maize productivity. Third, N fertilizer rate had a linear positive effect on maize yield (Figure 5F; Table 2), but the effect size was small due to the low amounts of N applied by farmers. In fact, the range of N application rates observed in farmers’ fields was considerably lower than that needed to reach 80% of Yw (i.e., 170–320 kg N ha− 1; Table 1). Such large N application rates are out of reach for most smallholders in the country, and may well not be profitable or desirable under prevailing conditions (e.g., input-output markets, infrastructure, and soil acidity). Lastly, the effect of timely sowing on maize productivity was very much related to the onset of the rains (Supplementary Figures 3and 4), and appropriate-scale mechanization can contribute to timely and more precise sowing across the region (Baudron et al. 2015).

The drivers of maize yield variability were largely consistent across farm types (Table 2), but the importance of maize for rural livelihoods across Zambia was farm-type specific (Figure 4). This means that interventions aiming to narrow maize yield gaps will likely benefit the different farm types differently. For instance, boosting maize productivity can be a suitable ‘stepping up’ strategy for market-oriented maize farms (FT3), who achieve the highest maize yields in Zambia (Figure 6C). Targeting interventions to this type of farm might well be the most effective way to increase maize production at national level. Conversely, farms with low levels of assets (FT1 and FT2, Figure 4), for whom ‘stepping out’ of maize production through investments in new on-farm activities or off-farm activities is likely more suitable, do not seem to have the productive capacity to intensify maize production in the short-term. Yet, increasing maize yields would be more beneficial for FT2 than for FT1 given the large dependency on bought maize of the former (Figure 4). Clearly, strategies aiming to narrow maize yield gaps must thus be complemented with a suite of pro-poor policies and investments tailored to specific farm types. This will be crucial to stimulate and embed smallholder agriculture into a broader rural development program that can provide social safety nets in the absence of livelihood options off-farm (Giller et al. 2021a).

Maize production in Zambia takes place across a gradient of agro-ecological conditions, which in turn have a considerable impact on yield gaps and their causes throughout the country (Figure 6; Supplementary Table 4). For instance, our analysis indicates that a 1% increase in soil available water translates into ca. 0.20% greater maize yield and that a 1% decrease in exchangeable acidity results into a 0.02% increase in maize yield across the pooled sample (Supplementary Table 4). Water is indeed a key limiting factor to production in the semi-arid areas of Southern and Western Zambia (Table 1, Figure 2; Ngoma et al.2021) whereas soil acidity is known to be a major constraint to agricultural production in the humid areas of Northern Zambia (Pelletier et al. 2020; Burke et al. 2017; Pauw 1994). These biophysical constraints may impact the adoption of mineral fertilizers to narrow resource yield gaps due to the risks involved in areas with low and erratic rainfall and the low nutrient-use efficiency in areas with acid soils, both with implications beyond maize farming in Zambia. Erratic rainfall is widespread across much of Eastern and Southern Africa (Muthoni et al. 2019) whereas soil acidity (defined here as low pH areas with high levels of exchangeable acidity) affects over half of all countries in sub-Saharan Africa (Silva et al., in preparation). These results support the revision of the subsidy program by the Government of Zambia (Morgan et al. 2019) to make it possible for farmers to access mechanized services and inputs (e.g., seeds, fertilizers, and lime) and to strengthen extension systems to deliver timely and site-specific agronomic recommendations (Jayne et al. 2018). This is crucial to improve soil health and sustainably intensify maize production in the country.

Further research is needed to understand how fertilizer use is influenced by climate variability and to identify profitable soil water conservation technologies for semi-arid areas. A range of new technologies building on previous conservation agriculture research (e.g., improved legume systems with strip-, double, relay and intercropping, green manure cover crops, and agroforestry species) are currently being tested on-farm in Zambia to address these challenges. For humid areas, it is crucial to revisit past research on soil acidity to assess the returns-on-investment associated with liming or acid soil management strategies (CIMMYT 2021; Burke et al. 2017). Simulated yield ceilings across the continent, and respective N rates needed to reach such yields (Table 1; van Ittersum et al., 2016), should also be thoroughly tested against empirical data as they are well above maximum yields reported in agronomic experiments under controlled conditions (see Masuka et al. 2017; Mupangwa et al. 2017 for examples in Zambia).

High rainfall variability makes rain-fed farming across Eastern and Southern Africa a risky activity for smallholders. Site-specific recommendations must thus consider year-to-year variation in profitability and smallholders’ risk profile to cope with uncertain yield response to inputs (Descheemaeker et al. 2016), as these are known to constrain farmers’ willingness to investment in technologies. More attention must be paid to incorporate the effects of rainfall variability and soil properties on yield response to inputs to better explain the adoption of technologies (Chamberlin et al. 2021; Burke et al. 2017), which appear to be profitable on average, but have high variance in outcomes over time. The role of non-information constraints, such as alternative uses of labor at critical periods (Silva et al. 2019; Kamanga et al. 2014), to the adoption of improved crop management practices also needs to be explored as these can limit the timely management needed to narrow yield gaps. Small farm sizes are another important constraint to technology adoption and intensification of crop production in African smallholder farming systems (Harris and Orr 2014), as narrowing yield gaps on small farms is often not enough to ensure food self-sufficiency or a living income at household level (Giller et al. 2021b).

5 Conclusion

Maize is the dominant crop in Zambian farming systems, which range from mixed-crop livestock systems in semi-arid areas of the Southern and Western provinces to mixed maize systems in the rest of the country. This study combined for the first time a farm typology delineation with yield gap decomposition to gain insights on what interventions are needed, where, and for which farm types, to increase maize production in Zambia. Three farm types were identified, including households for which maize is a marginal crop, households which are net buyers of maize, and households which are market-oriented maize producers. Maize yield gap closure across the country was only 20% of the water-limited yield (Yw), corresponding to 2.4 t ha− 1, and was slightly larger for market-oriented maize farms. For nearly all agro-ecological regions, provinces, and farm types, about half of the yield gap was attributed to current technologies used by farmers not reaching their full agronomic potential. Yet, improving current technologies in terms of timeliness and precision of operations and increasing input use, particularly mineral fertilizers, could more than double current yields. Doing so requires targeted approaches for technology intervention, e.g., by focusing on market-oriented maize producers, accompanied by carefully designed policy interventions ensuring other households benefit from other value chains or off-farm opportunities. If profitable, adoption of practices that increase soil moisture in semi-arid areas, such as conservation agriculture, and management of soil acidity in humid areas are key to improve yield response to mineral fertilizers. Two avenues can facilitate the foregoing policy levers. First, the current national subsidy program needs to be flexible enough to make it possible for farmers to access mechanized services and inputs. Second, the extension systems need to be strengthened to help farmers cope with risk and uncertain crop yield response to inputs in areas with high rainfall variability. Further research is needed to better understand the profitability of maize production under rain-fed conditions and to disseminate technologies that can reduce the vulnerability of farmers to inter-annual rainfall variability. Blanket, one-size-fits-all, recommendations should be avoided when promoting sustainable intensification practices aiming to increase yields in the country.