Unravelling the variability and causes of smallholder maize yield gaps in Ethiopia

Ethiopia has achieved the second highest maize yield in sub-Saharan Africa. Yet, farmers’ maize yields are still much lower than on-farm and on-station trial yields, and only ca. 20% of the estimated water-limited potential yield. This article provides a comprehensive national level analysis of the drivers of maize yields in Ethiopia, by decomposing yield gaps into efficiency, resource and technology components, and accounting for a broad set of detailed input and crop management choices. Stochastic frontier analysis was combined with concepts of production ecology to estimate and explain technically efficient yields, the efficiency yield gap and the resource yield gap. The technology yield gap was estimated based on water-limited potential yields from the Global Yield Gap Atlas. The relative magnitudes of the efficiency, resource and technology yield gaps differed across farming systems; they ranged from 15% (1.6 t/ha) to 21% (1.9 t/ha), 12% (1.3 t/ha) to 25% (2.3 t/ha) and 54% (4.8 t/ha) to 73% (7.8 t/ha), respectively. Factors that reduce the efficiency yield gap include: income from non-farm sources, value of productive assets, education and plot distance from home. The resource yield gap can be explained by sub-optimal input use, from a yield perspective. The technology yield gap comprised the largest share of the total yield gap, partly due to limited use of fertilizer and improved seeds. We conclude that targeted but integrated policy design and implementation is required to narrow the overall maize yield gap and improve food security.


Introduction
Population growth and changing consumption patterns have increased global food demand and are threatening food security in the developing world (Dzanku et al. 2015;Godfray et al. 2010;Tittonell & Giller 2013). Agricultural extensification (area expansion) and intensification (increase in production per unit of land) are major avenues of response to the growing food demand (Licker et al. 2010). However, land is a scarce resource meaning that agricultural uses must compete with alternative uses of land including industrial, residential and conservation (Godfray et al. 2010). Moreover, extensification comes with various environmental costs: greenhouse gas emission, competition with biodiversity aims and resource depletion (Cassman 1999;Struik et al. 2014). Agricultural intensification that narrows yield gaps through sustainable productivity gains is a central component of strategies for increasing food production and food security in regions with projected demand increases (Dzanku et al. 2015;Lobell et al. 2009;Struik et al. 2014;. Compared to other regions of the developing world, crop production in sub-Saharan Africa (SSA) is characterized by a large yield gap (Affholder et al. 2013;Dzanku et al. 2015;Tittonell & Giller 2013, van Ittersum et al. 2016; i.e. the difference between the potential and the actual farmers' yield for a given biophysical environment (van Ittersum & Rabbinge 1997). In Ethiopia, the yield gap in staple crops has been directly implicated in food shortages and the country's dependence on food imports and aid (Abate et al. 2015;Mann & Warner 2015). In the last decade, however, Electronic supplementary material The online version of this article (https://doi.org/10.1007/s12571-019-00981-4) contains supplementary material, which is available to authorized users. crop yields have improved, particularly for maize. Ethiopia has achieved the second highest average maize yield in SSA with more than 3 t/ha (Abate et al. 2015). Maize has become one of the five major cereals (also including wheat, teff, barley and sorghum) in terms of production volume, area coverage and household consumption (Abate et al. 2015;CSA & WB 2015). It occupies about 2 million ha, the second largest production area next to teff. Roughly nine million smallholders account for 95% of the national maize production (Abate et al. 2015;CSA 2012;Taffesse et al., 2012). It is estimated that about 77% of maize production goes to producer households' own consumption (CSA & WB 2013;CSA & WB 2015).
Ethiopia's yield improvements have been attributed to the use of modern maize varieties, mineral fertilizers and improved access to extension services. This improvement has contributed to decreased household poverty (Zeng et al. 2015) and improve food security (Abate et al. 2015). Yet, the actual maize yield is still far behind on-farm and onstation trial yields (Kassie et al., 2014), and only ca. 20% of the water-limited yield (van Ittersum et al. 2016). This implies a large potential to increase maize yield and improve food security in the country.
Narrowing the yield gap requires the identification and explanation of factors that determine it at farm level. Explaining factors are area specific, indicating the importance of studying factors that apply to a given context and time (Neumann et al. 2010;Tittonell & Giller 2013;Silva et al. 2017).
The existing literature on maize yield and yield gap analysis shows that various factors contribute to yield gaps in SSA. Kihara et al. (2015) assessed the maize yield gap in northern Tanzania and showed that timing and agronomic practices including plant density, manure application and appropriate varietal choice determined differences in maize yield. A metaanalysis in 12 SSA countries showed that interactions are important, as maize yield responses to organic and inorganic fertilizers varied with soil clay content, elevation and mean annual precipitation (Sileshi et al. 2010). The variability in responses to nitrogen has also been investigated by other researchers (e.g. Vanlauwe et al. 2011). Tittonell et al. (2008) also attributed the maize yield gap in Kenya to variability in soil and differences in management decision of farmers, mainly input quantity and timing. Seyoum et al. (1998) showed that access to extension services and socioeconomic factors affected maize production efficiency in Ethiopia. Beshir et al. (2012) performed farm level efficiency analysis in the north-eastern highlands of Ethiopia by including labour, land and capital as major inputs. While these studies provided important insights, they focused on either agronomic or economic variables. But in order to conduct a more comprehensive and informative yield gap analysis, economic and agronomic perspectives must be better integrated (Lobell et al. 2009;van Dijk et al. 2017). In addition, to contribute to long-term goals regarding improving food security (van Ittersum et al. 2016;Sachs et al., 2019), an understanding of the entire yield gap is important. When looking at historical yield progress in Europe, Australia, Southeast Asia and the United States (Anderson 2010;Richards et al. 2014;Fischer et al. 2014), removing inefficiencies have been among the first steps but increased input use and adoption of new technologies have clearly played a major role in increasing yields and narrowing yield gaps to ca. 20-30% of (water-limited) potential yields, as opposed to the 80% observed in SSA.
In a recent review of literature, Beza et al. (2017) found that most agronomic yield gap studies consider management and edaphic factors as the main explaining factors, whereas farm(er) characteristics and socioeconomic conditions are rarely considered. An exception is Tamene et al. (2015), which showed that soil nutrient content, socio-economic and agronomic practices together explain maize yield gaps in Malawi, highlighting the importance of an integrated approach. However, they used the highest farmers' yield as an estimate of potential yield, which is likely to underestimate the yield gap when all farmers face similar constraints and their practices are agronomically sub-optimal (Lobell et al. 2009). van Dijk et al. (2017 presented a framework that can be used to integrate agronomic and economic approaches to yield gap analysis using data from Tanzania, and employing stochastic frontier analysis. They, however, only included the first stage, explaining the production frontier, and not the second stage, explaining technical inefficiency, and therefore only part of the yield gap could be explained. The contribution of this paper is to provide a comprehensive empirical evaluation of the drivers of maize yield and yield gaps at national level for Ethiopia, by decomposing yield gaps into efficiency, resource and technology components. By decomposing the yield gap in this way, we are better able to assess opportunities for policy interventions and technological improvements. We use data from the "Sustainable intensification of Maize-Legume Cropping Systems for food security in Eastern and Southern Africa (SIMLESA)" and "Diffusion and Impact of Improved Varieties in Africa (DIIVA)" projects (Jaleta et al. 2018). While we acknowledge that the analysis at national level is at times agronomically and socially coarse, due to the large variation in biophysical and socio-economic conditions in Ethiopia, we also argue it is the first study to give an overview of maize production and yield gaps at national level using a relatively large data set. We use Ethiopia as a case study, using data that allow us to consider the variation in biophysical and socio-economic conditions across the major maize producing areas of the country.

Conceptual framework
The size of the yield gap depends on the yield level used as benchmark (Lobell et al. 2009;. The most widely used references are potential yield in irrigated systems or water-limited potential yield in rain-fed systems, maximum yields from controlled experiments, highest farmers' yields and technically efficient yield. Potential yield represents a yield that can be achieved in a specific climate without any water and nutrient limitations, and when pest, weed and disease problems are fully controlled (Evans 1993;van Ittersum & Rabbinge 1997). Water-limited potential yield represents the yield achieved under rain-fed conditions, but with no nutrient limitation and well controlled pests, weeds and diseases. Potential and water-limited potential yields are estimated with crop growth simulation models using site specific information on weather, soil and crop management practices (Lobell et al. 2009;. Experimental yields are usually achieved under best management practices and better-off biophysical conditions; while they do not represent real farmers' conditions, they can be used as a benchmark for specific locations . Maximum farmers' yields can be represented by the upper percentile of farmers' yield distributions from surveys (Affholder et al. 2013;Lobell et al. 2009). Technically efficient yield is the maximum output that can be realized given a set of production inputs. It can be calculated using individual farm data containing information on production resources, using economic models such as frontier analysis or boundary line analysis van Dijk et al. 2017).
Our framework to estimate and explain maize yield gaps builds on Silva et al. (2017), in which the yield gap is decomposed into efficiency, resource and technology components ( Fig. 1). We used stochastic frontier analysis to estimate technically efficient yield as it gives the advantage of estimating inefficiency and data noise as different components (Coelli et al. 2005). The stochastic production frontier depicts the maximum output which can be produced using a given vector of inputs, i.e. the technically efficient yield. The difference between the technically efficient yield (Y TEx ) and actual yield (Ya) is defined as the efficiency yield gap . The efficiency yield gap shows the extra yield that could be attained using the same level of inputs, when used optimally in production.
The efficiency yield gap measures the extent to which farmers could produce more by using the same inputs in the same production condition, but with improved practices regarding the timing, placement and form of the inputs applied. We conceptualize the efficiency yield gap as deriving from differences in crop management practices, specifically timing, frequency and spatial application of input use, following Silva et al. (2017) and Kihara et al. (2015). This corresponds to an information constrainti.e. lack (or uncertainty) of knowledge about which practices lead to most efficient production outcomes for a given technology and level of input use.
The resource yield gap is the difference between the highest observed yield for a population of fields and the technically efficient yield for a particular field. This yield gap shows the difference between the highest farmers' yield and the technically efficient yield at a lower input level ) and points to a sub-optimal quantity of inputs applied (from a production perspective). We considered inputs that significantly influenced yield level (mainly nitrogen and pesticide) to explain the resource yield gap.
The technology yield gap is the difference between waterlimited potential yield and highest farmers' yield. The technology yield gap can be seen as an upward shift of the production frontier estimated using actual yields. The shift in the production frontier can be attributed to two aspects: partial shift through reductions in resource yield gaps for specific inputs and/or total shift due to adoption of improved technologies . It should be noted that the technology yield gap also includes resource and efficiency components, as Y HF may be reached at input levels lower than required for Yw, and with inefficient use of inputs.

Study area and data source
The main data is accessed from the "Sustainable intensification of Maize-Legume Cropping Systems for food security in Eastern and Southern Africa (SIMLESA)" and "Diffusion and Impact of Improved Varieties in Africa (DIIVA)" projects (Jaleta et al. 2018). The data were collected by Ethiopian Institute of Agricultural Research (EIAR) and the International Maize and Wheat Improvement Center (CIMMYT) from five regional states of Ethiopia (Oromia, Amhara, Benishangul Gumuz, Tigray and Southern Nations Nationalities and Peoples (SNNP)). A stratified sampling strategy was used to select the sample households, by classifying the districts as "high", "medium" and "low" based on their average maize productivity using information from the Central Statistical Authority (CSA) and the International Food Policy Research Institute (IFPRI). Within the districts, four communities ("peasant associations") were randomly selected and from each of these communities 10-16 households were randomly selected for the final one-to-one interviews (Jaleta et al. 2018). 2455 and 2287 households were surveyed in 2010 and 2013, respectively. However, after dropping households with missing observations for key model variables (and all the observations for the Tigray region, which was not surveyed in 2013); excluding outliers above 99% and below 1% and for some variables above 95% and below 5% when values stayed low or high given agronomic insights; excluding households that did not grow maize and plots that had maize yield more than 5000 (kg/ha) without using nitrogen, which is  We used the Global Yield Gap Atlas (GYGA) (www. yieldgap.org) to access the water-limited potential yield that was employed to exclude calculated yields that exceed water-limited potential yield and to calculate the technology yield gap. We also accessed growing degree days (with base temperature of 0°C), temperature seasonality (the standard deviation of monthly average temperature), and aridity index (annual total precipitation divided by annual total potential evapotranspiration) from GYGA as indicators of climatic conditions of individual plots (van Wart et al., 2013).

Spatial pattern of maize yield
We conducted a hot spot analysis to identify if maize yields were clustered or dispersed based on their location. We used the Getis-Ord Gi* statistic to measure the intensity of clustering of high values ("hot spots") or low values ("cold spots") compared to average of neighboring values (Harris et al. 2017;Ord & Getis 1995). The null hypothesis is no association between the values of maize yield at a given site j and its neighbors, defined on the basis of spatial proximity j. We use 30 km as the threshold for defining neighbouring values. A statistically significant hot spot is characterized by a spatial cluster of observations with relatively high values.

Productivity and efficiency yield gap analysis
Stochastic and deterministic production frontier approaches are commonly used in efficiency analysis (Coelli et al. 2005). Stochastic frontier analysis was originally used to estimate efficiency of firms (Aigner et al. 1977), and further extended to analyze decision making units including farm units van Dijk et al. 2017). The stochastic production frontier assumes the presence of technical inefficiency, contary to the conventional average production function. Despite a distributional assumption on its inefficiency component, stochastic frontier analysis gives the advantage of estimating the degree of inefficiency and its explanatory factors in a single step (Coelli 1995). Deterministic approaches do not depend on functional distributions, but assume that any deviation from the frontier is due to inefficiency, making it sensitive to data measurement error (Coelli et al. 2005). Consequently, we used a stochastic frontier model with a technical inefficiency component to quantify and explain technical efficiency.
As we have many variables of which several are correlated, and many interactions are not agronomically meaningful, we did not use a full trans-log function, and started with a Cobb-Douglas production function. Later, interaction terms which are of specific interest were added. We specify yield (y i ) as a function of growth-defining, −limiting and -reducing factors x i , a non-negative random error (u i ) capturing technical inefficiency that is assumed to follow a half-normal distribution truncated at zero with mean μ and variance σ 2 , and a random error term (v i ) (Eq. 1).
Where the subscript i represents the i th plot and β is a vector of unknown parameters.
We estimated the production frontier and technically efficient yields based on concepts of production ecology van Ittersum & Rabbinge 1997). Production ecology determines yield as a function of growth defining factors (radiation, temperature, CO 2 concentration and crop genetics), growth limiting factors (water and nutrients), and growth reducing factors (diseases, pests, weeds and pollutants). In our analysis, we included type of maize variety used, climatic variables (year, growing degree days and temperature seasonality), and amount of seed as growth defining factors. The growth limiting factors considered were drought, aridity index, nitrogen (from mineral fertilizers DAP and Urea), soil fertility, 1 intercropping (better water and nutrient use efficiency; and pest and disease suppressions; Cong et al. 2015;Yu et al. 2015), crop residue from the previous season, the type of crop grown in the previous season (to capture crop rotation effects on soil fertility, soil erosion and spread of pests, weeds and diseases 2 ; TerAvest et al. 2015), use of soil and water conservation methods, and ploughing frequency (Table 1). Growth reducing factors are captured using weeding frequency (weeds reduce crop yields by competing for light, water and nutrients); disease and pest incidence and water logging; livestock (as source of traction and/or manure 3 ), and labour used for land preparation, planting and weeding. We included maize plot area to capture unobserved labour intensity/quality.
The data covered two production years and followed households, not plots, over time. This constrained the analysis a Saturation of soil with water from applying panel data estimators at the plot level. In addition, we are interested in some time-invariant independent variables, which preclude us from using a fixed-effects or first-differencing approach. However, simply pooling the data would leave us vulnerable to possible endogeneity concerns arising from unobserved time-invariant heterogeneity; for example, unobserved farmer experience in maize production and farm management skills could potentially bias our model. To overcome this, we use the Mundlak-Chamberlain (MC) device, which Wooldridge (2010) Where c i is the transformation of the inefficiency term; x i refers to mean values of time varying explanatory variables (in our case, mean values of labour, seed, nitrogen and livestock); γ represents estimates of x i and ɳ i is the error term, i.i.d.
So, the first specification of the stochastic production function (model A) is: The technical inefficiency (efficiency yield gap) effect is specified as: Where Z i represents set of variables explaining sources of inefficiency and w i is a random error term, i.i.d. N (0, λ 2 ).
We estimated six stochastic frontier model specifications (Table 2). Model A represents the basic model without interactions (Eq. 3). We further tested several interactions in different models, and show interactions between nitrogen application and other management variables. The observed response of maize yield to nitrogen application in farmers' fields in SSA is often less than expected based on data from on-farm trials (e.g. Kihara et al. 2016;Vanlauwe et al. 2011). Our alternative specifications were designed to shed light on possible interactions underlying such unexpected effects. Model B considers the interaction of nitrogen application with maize variety, model C with seed density, model D with soil fertility, model E with crop residue management, and model F with weeding frequency.
A second stage was included, in order to explain the efficiency yield gap. We considered education, other income, value of household assets, and plot distance from homestead to capture socioeconomic factors that determine the efficiency yield gap. Education level of household heads can affect farmers' decisions in terms of input type and timing. We hypothesized that farmers with more years of school have lower efficiency yield gaps because they perform crop management operations more carefully and at the recommended times. Income other than from own farming (off-farm sources and non-farm business income) can also augment farming activities by providing finance to access modern inputs and to hire extra labour to facilitate timely management activities . However, one could also argue that engaging in off-farm activities may shift labour away from farming, which could delay crop management operations and affect the efficiency yield gap. Household assets, which include farm implements (such as ploughs, sprayers), could allow farmers to do operations more timely (Beshir et al. 2012). In addition, household assets may give households the opportunity to access credit in order to buy inputs. Plot distance from the homestead can also affect the efficiency yield gap by giving the nearer plots the advantage of more input and frequent visits (Tittonell et al. 2005;Tittonell et al., 2008). We thus expect that the closer the plot is to the homestead, the lower the efficiency yield gap. Furthermore, we disaggregated the sample by maize variety, soil type, year and farming systems and compared efficiency yield gap across the categories.
The maximum likelihood method was used to estimate the stochastic frontier parameters and the technical inefficiency effects model together, using the "frontier" package in R (Coelli & Henningsen 2017). In all models, both dependent and independent variables were log-transformed prior to the analysis.

Estimation of the resource yield gap
We calculated the resource yield gap as the difference between the highest farmers' yield and technically efficient yield. We used the average maize yield above the 90th percentile to represent the highest farmers' yield . It demonstrates the resource yield gap by comparing plots with high yielding plots. This shows the resource limitations in the actual yield range and is explained by the significant variables in the frontier function (Eq. 3 & Table 2).
We used significant dummy variables in the frontier to disaggregate resource and technology yield gaps. These variables include soil type, maize variety and year. The significance of these variables shows that there are different frontiers benchmarking the sample. So, we need to calculate specific highest farmers' yield to disaggregate resource and technology yield gaps based on the categories of the dummies. We also disaggregated resource and technology yield gaps based on the farming systems mentioned in Section 3.3.
We further calculated the resource yield gap based on the approach of van Dijk et al. (2017). In that approach, the Table 2 Parameter estimates of the stochastic frontier models estimated for maize-based farming systems in Ethiopia. Stochastic frontier models were estimated without interactions (Model A) and with interactions between nitrogen and maize seed type, seeding rate, soil type, crop residue and weeding (Model B Pvalues: *** p < 0.001, ** p < 0.01, * p < 0.05 maximum yield achieved without any resource limitation is termed as feasible yield (Y f ). The feasible yield was calculated using the coefficient estimates of the frontier function (Eq. 3) and assuming a maximum quantity for major inputs in our function. We assumed a hybrid variety, 400 kg/ha nitrogen, 33 kg/ha of seeding rate (http://www.yieldgap.org), 50% increase in labour measured in person days, full pesticide application and considered average values for the rest of the variables. The resource yield gap is then the difference between the feasible yield and technically efficient yield. We estimated one feasible yield gap per farming system, correcting for spatial differences by including climatic variables in our frontier function ( By considering these two ways of estimating the resource yield gap, we can better compare the resource yield gap for two different scenarios. One (using the highest farmers' yield) close to the reality based on what farmers practiced and the second (using the feasible yield) based on technically feasible yield, which may not be economically optimal but can help to evaluate how yield can be enhanced if input constraints are minimized (van Dijk et al. 2017).

Estimation of the technology yield gap
The technology yield gap was calculated as the difference between water-limited potential yield from GYGA (www. yieldgap.org/Ethiopia) and the highest farmers' yield from the household survey data. We calculated different technology yield gaps for each farming system. The technology yield gap can be explained by agricultural technology, e.g., better yielding variety, that helps lifting the highest farmers' yield to the water-limited yield potential.

Farming systems
We disaggregated the different yield gaps based on farming systems in order to capture variability observed in biophysical and socio-economic conditions in the country. We classified the sample based on the farming systems described in Amede et al. (2015), cross-checking with those of Dixon et al. (2001). The farming systems in Amede et al. (2015) are many, and therefore they were re-grouped into four broad categories: highland perennial, highland mixed, highland maize mixed and lowland maize mixed (see Supplementary Material).

Characteristics of the farming system
The mean farm size that farmers operated was about 2.2 ha (Table 1). On average, farmers used about 19% of the total farm size for maize cultivation both in 2010 and 2013. The mean maize yield was significantly higher in 2013 (2.83 t/ha) than in 2010 (2.50 t/ha). Improved varieties were used on ca. 71% of the maize plots. Hybrid maize varieties were dominant, 57% in 2010 and 68% in 2013. The mean seed rate was 30 kg seed/ha, which is higher than the recommended rate of 25 kg/ha. Farmers applied nitrogen on 62 and 65% of their maize plots with a mean value of 32 kg/ha and 40 kg/ha in 2010 and 2013, respectively. Intercropping was practiced twice as often in 2013 (18%) than in 2010 (9%). Leaving crop residues from previous season was higher in 2010, on about a quarter of the plots. Soil and water conservation methods were practiced on a quarter of the plots in 2013 and on 16% in 2010. Pesticide application was very limited; only on 9 and 4% of the maize fields in 2010 and 2013, respectively. Stress in the form of disease and pest was lower in 2013 than in 2010. Only 1% of the plots were not weeded. More than half of the plots were weeded up to two times; and about 41% of the plots were weeded up to four times. The average number of livestock units (in tropical livestock units -TLU) of households was higher in 2013 (10.8) compared to 2010 (5.8). This difference could be due to the severe drought that happened in 2009 (Viste et al. 2013). The average labour used for land preparation, weeding and planting was higher in 2013 (62 person days/ha) compared to 2010 (53 person days/ha).
The concentration of lower yields (blue spots in Fig. 3a) in West Gojam (2.52 t/ha) and Jimma (2.02 t/ha) zones were not always consistent with the water-limited potential yield in those areas. This could be explained by unfavourable effects of socioeconomic conditions, management and farming systems. The yellow spots reflect no spatial clustering, i.e. they may have high or low values but they were not surrounded by similar features in their neighbourhood.

Maize yield and growth defining, limiting and reducing factors
The likelihood ratio test rejected the null hypothesis of no inefficiency component in the model, implying that the stochastic frontier model was an appropriate framework for our analysis. The estimate of the gamma value was also high (Table 2), which indicates inefficiency effects comprised much of the variation in the composite error term. Below, we explain the results from the stochastic frontier analysis following the conceptual framework presented in Section 2.

Growth defining factors
The effect of Open Pollinated Variety (OPV) maize seed variety on yield was statistically significant only when interacted with nitrogen (Table 2). Hybrid maize varieties showed on average 13% higher yield compared to local maize varieties. Moreover, the effect of hybrid maize on yield was higher with nitrogen application, i.e. about 18% higher maize yield (Model B). The amount of seed sown was included to capture planting density. Our results showed that the seeding rate (kg/ ha) had a positive impact on maize yield: 1% increase in the seeding rate resulted in 0.16% higher yield. We found that the effect of maize seeding rate on yield was not measurably influenced by nitrogen (Model C). The climatic variables year, growing degree days and temperature seasonality did not show a statistically significant effect on yield.

Growth limiting factors
The aridity index (an average climate variable) had a strong negative effect on yield, with a one unit increase in the aridity index resulting in 4.8% lower yield on average. Drought (a yearly climate variable) also reduced maize yield; plots that experienced drought had 16% lower yield. Nitrogen and maize yield showed a positive relationship. The result showed that when nitrogen application increased with 1%, yield increased by 0.10%. Maize yield was also conditioned on the soil fertility status of plots. On average, highly fertile plots resulted in higher yield compared to plots that had medium and poor soil fertility. We also tested if the effect of soil fertility on yield interacts with nitrogen application or vice versa (Model D), but we did not find a significant relationship. Yield (corrected for share of area planted with maize) was 7% higher under intercropping than in sole cropping. There was a negative significant impact of crop residues on yield. However, the effect of crop residue became positive and significant when nitrogen was applied (Model E). The number of ploughings was significantly and negatively related with yield. Soil and water conservation method and previous crops grown did not show a significant association with yield.

Growth reducing factors
Maize yield was 14% higher when pesticide was applied, albeit it was applied on only 6% of the plots. The impact of weeding frequency on yield was not statistically significant. This effect became significant and negative when interactions between this variable and nitrogen were included (Model F). More weeding reduced the positive effect of nitrogen. Plots that experienced disease and water lodging had 11% and 13% Water limited potential yield (t/ha) 6.2 -6.5 6.6 -7.8 7.9 -12.1 12.  lower maize yield than plots that did not experience these stresses, respectively. Pest incidence did not show a significant effect on yield. As expected, labour and yield were positively associated; yield increased by 0.13% when labour days/ha increased by 1%. Maize plot size, adjusted for intercropping, had a negative effect on maize yield. The result showed that maize yield decreased by 0.5% when maize area was increased by 1%.

Maize yield levels
Four different maize yield levels, actual (Ya), technically efficient yield (Y TEx ), highest farmers' yield (Y HF ) and waterlimited potential yield (Yw) disaggregated by maize variety, soil type, year and farming system were used to analyze maize yield gaps (Table 3; Fig. 4). Feasible yield (Y f ) was also considered to calculate the resource yield gap per farming system. The disaggregation shows that the yield levels differed across the categories. Ya, Y TEx and Y HF were greatest for hybrid varieties, plots that had "good" soil, in 2013 compared to 2010, and highland maize mixed farming system and highland mixed farming system (Table 3). The perennial farming system had relatively lower Ya, Y TEx and Y HF than other farming systems.

Efficiency yield gap and its explaining factors
The frontier analysis showed that the mean efficiency of farmers was 56% and the efficiency yield gap was 44% (Fig. 5a). The distribution of the efficiency yield gap revealed that about 38% of the maize plots had an efficiency yield gap greater than 50% (Fig. 5a). This suggests that about 38% of the maize plots were performing half or less of their capacity that could have been realized with the existing inputs and variety. The average efficiency yield gap was 1.82 t/ha (Fig.  5b). Technically efficient and actual yields showed a positive relationship (Fig. 5c). The result also showed that both low and high yielding fields were associated with low efficiency yield gap (Fig. 5d).
Disaggregating the data showed that the efficiency yield gap was higher for hybrid varieties compared to OPV and local (Table 3). This is because Y TEx is higher for hybrid varieties; Ya was also greatest for hybrid varieties, but better management is needed to realize their productivity. Plots that had "poor" soils had lower efficiency yield gap compared to "medium" and "good" soil plots (Table 3). A similar reasoning applies here: Y TEx was higher on "medium" and "good" compared to "poor" soil plots, and the efficiency gap is thus larger when management is imperfect. This also shows that correcting for variety and soil type is important, so that their impacts are not attributed to inefficiencies. There was not much difference over the two survey years, i.e. 2010 and 2013 (Table 3). Highland maize mixed and highland mixed farming systems had the highest efficiency yield gaps, whereas highland perennial had the lowest efficiency yield gaps (Table 3). This is mainly due to the higher Y HF in the first regions, and thus more potential to increase yields with current technologies. The effect of the factors included as determinants of the efficiency yield gap was as expected. Both plot level and farm level factors determined the efficiency yield gap (Table 2). Education reduced the efficiency yield gap, implying that educated farmers were more efficient. Plot distance from the homestead had a positive quadratic relationship with the efficiency gap, showing highest efficiency yield gap and thus lowest efficiency at a walking distance of ca. one hour. This quadratic effect mainly implies that efficiency was highest close to the homestead, and that the effect of distance decreased at larger distance. Income from other sources 5 and the value of household assets also reduced the efficiency yield gap. 5 We acknowledge an endogeneity concern with including income as a control variable and estimated the model with and without this variable. The results differed very little in their coefficient estimates.

Resource yield gap
The resource yield gap was greatest for hybrid varieties (2.13 t/ha), plots that had "medium" (2.16 t/ha) and "good" (2.15 t/ha) soil, and in 2013 (2.32). The highest farmers' yields were also greatest for these categories. The highland perennial farming system had the lowest resource yield gap (1.26 t/ha) ( Table 3). It also had the lowest technically efficient yield. The resource yield gap was also compared by considering highest farmers' yield and feasible yield as benchmarks (see section 3.3.2). The resource yield gap based on the feasible yield was higher compared to the resource yield gap based on Y HF (Fig. 6). On average, 52% of the technology yield gap (2.15/ 4.16) is explained by maximizing current inputs and practices.
We further explained the resource yield gap by using the pooled data and the significant variables (mainly pesticide and nitrogen in Table 2) as demonstrating examples. The mean maize yield was statistically different for applying pesticide. Nitrogen explained about one-fifth of the variation in maize yield (Fig. 7b).

Technology yield gap
Disaggregating the technology yield gap by maize variety showed that local and hybrid maize varieties had the highest and the lowest technology yield gaps, respectively. "Poor" soils also showed higher technology yield gap (7.3 t/ha) compared to "medium" (5.7 t/ha) and "good" (5.6 t/ha) soil types. The technology yield gap was similar in 2010 (5.91 t/ha) and in 2013 (5.57 t/ha). The highest and lowest technology yield gaps were found in highland perennial (7.8 t/ha) and highland mixed (4.82 t/ha) farming systems (Table 3), respectively.

Drivers of maize yield
Most of our results were as expected from an agronomic point of view. Improved hybrid maize varieties and high seeding  (Table 2). However, these may not be desirable from an economic perspective as farmers need to purchase hybrid seeds every cropping season and the cost of these is higher than the price of OPV or local varieties (Zeng et al. 2015;Abate et al. 2017). The average nitrogen (37 kg N/ha) and phosphorus application (5 kg P/ha) is much lower than the recommended amount of 110-130 kg/ ha of N and P together (Abate et al. 2015). Applying more nitrogen can be a major opportunity to improve maize yield further (Table 2 & Fig. 7b), which is consistent with the literature (e.g. Kaizzi et al. 2012;Tittonell et al. 2005).
Earlier studies already showed the variability in on-farm responses of maize yield to N, and our data allowed to relate the response to N to other management factors (i.e., improved varieties, crop residue management and weeding). This reflects the need to integrate agricultural technologies in order to improve and sustain the maize productivity in the country (Abate et al. 2015). Pesticide application was limited, but had significant positive impact on maize yields ( Table 2) Major inputs that explain the resource yield gap (p < 0.001) (a) Pesticide application gave higher maize yield; (b) Positive relationship between nitrogen and maize yield with r-squared value of 0.22. Note that these one to one relationships between variables are not corrected for other variables like in and diseases was reported to be low in 2010 and 2013. However, the outbreak of the fall armyworm in 2017 showed the large potential impact of pests and the relevance of pesticide application (FAO 2018). Labour (measured in persondays) supplied for land preparation, planting and weeding, was positively associated with maize yield. This is a realistic result as agricultural production in Ethiopia relies heavily on manual labour for the key crop management operations during the growing season (Baudron et al. 2015;Silva et al. 2019). The inverse relationship between maize plot size and maize yield is in line with the highly contested "inverse sizeproductivity relationship" hypothesis which posits that smaller farms are more productive than larger farms in the context of market imperfections (Daniel & Klaus 2014;Assunção & Ghatak 2003). This may be a result of more intensive and/or timely crop management practices, and/or the use of better quality labour, on smaller plots than on larger ones. Some results must be interpreted with caution though, as they are counterintuitive from an agronomic standpoint. For instance, leaving crop residues in the field had a positive effect on maize yield but only when nitrogen was applied (Table 2). Empirical results showed that fertilizer application may stimulate decomposition of organic matter of crop residues, which are low in carbon/nitrogen ratio, and this may enhance yield through increased nitrogen availability (Gebrekidan et al. 1999). Another example was the negative association between weeding and nitrogen (Table 2), which can point at the fact that nitrogen application promotes faster plant growth, and if this happens timely, weed management may be less critical. The dataset lacked information on the timing of weeding, hence we were not able to test this further. Finally, we cannot rule out the possibility that these counterintuitive results are an artefact of the dataset or the statistical analysis we used. Further local studies are therefore needed to investigate these relationships.

Yield levels, yield gap components and explanatory factors
The average values of water-limited potential, highest farmers', technically efficient and actual yields varied across maize varieties, soil types, year and farming systems (Table 3). This shows the relevance of disaggregated analysis as the different yield levels are determined by varying agroecological, institutional and socioeconomic contexts.
The national average efficiency yield gap, 1.8 t/ha, equals about two-thirds of the actual yield, showing the potential to increase maize yield by 68% and contribute to household food security. Tittonell et al. (2008) also showed that maize yield in Kenya was more than twice as high in well-managed fields compared with farmer fields. Education, income other than farming and value of assets and a short plot distance from homestead corresponded with smaller efficiency yield gaps.
Education can make people generally more receptive to communication and willing to accept new technologies in terms of input timing and application (Seyoum et al. 1998). Diversified income sources can help farmers to hire more labour and do the farming practices on time and use the available resources efficiently. Household assets may also allow farmers to do farming activities on time (such as sowing and weeding) by directly serving as farm equipment . In addition, household assets can serve as collateral to access formal or informal credit sources in order to apply or hire inputs on the appropriate time. The positive and significant quadratic effect of plot distance shows that plots located close to the homestead can also be prioritized in management, and get inputs (such as manure) and frequent visits (Tittonell et al. 2005;Tittonell et al., 2008), which reduces the efficiency yield gap.
Although the efficiency yield gap is substantial in relation to the actual yield, the resource and in particular the technology yield gaps are large and of strategic importance. The resource yield gap using the feasible yield was about twice as the resource yield gap based on highest farmers' yield ( Fig. 6). This implies that part of the technology yield gap (42%) is related to a resource yield gap. Although resources are available, farmers achieving the highest farmers yield do not apply maximum input levels, which are related to allocative and economic efficiency (van Dijk et al., 2017). Major inputs, i.e., pesticides and nitrogen could explain the resource yield gap. The mean maize yield was statistically different for applying pesticide [p < 0.000]. Nitrogen explained about onefifth of the variation in maize yield. This suggests the potential of applying more nitrogen and attributing the resource yield gap partly to lack of nitrogen. It is worth noting that the contribution of the resource yield gap to the total yield gap is relatively low, which could reflect the small variation in input use of the plots (Fig. 7b). The average nitrogen amount used to achieve the highest farmers' yield was 115 kg/ha. This value is lower than the minimum nitrogen amount required to realize 50% of water-limited potential yield in Ethiopia, i.e. 120 kg N/ ha (ten Berge et al. 2019).
The mean technology yield gap was higher than the mean efficiency and resource yield gaps in all farming systems (Fig. 8). It has been documented that the major technologies to improve maize productively are modern inputs, mainly improved maize varieties and fertilizers (Kassie et al. 2014). However, we found that 71% of the plots were planted with improved maize varieties. Abate et al. (2017) also showed that 77% of maize area planted was covered with improved maize varieties in the main cropping season of 2012/2013, which was higher than the SSA average (57%) in the same period. This points the need for improving crop management practices in maize-based systems across Ethiopia. This could include timely sowing and fertilizer placement methods.
Effective policies should target technology packages (Abate et al. 2015). Moreover, technological interventions that aim at improving smallholder productivity need to consider the diversity in farming systems and socioeconomic conditions (Giller et al. 2011;Tamene et al. 2015;Tittonell et al. 2010;Tittonell et al. 2008). Improving public expenditure on agricultural R&D as share of GDP, which is generally low in SSA (e.g. in Ethiopia 0.6%) (www.worldbank.org; Mogues et al. 2012), is required to find context specific solutions and achieve long-term food security.

Methodological considerations
Our analysis was conducted at national level using individual farm data, which brought new methodological challenges to the framework used in this paper (Fig. 1) and by Silva et al. (2017). From a theoretical perspective, the framework is only consistent if applied to unique genotype x environment combinations, i.e., the different yield levels and yield gaps should be estimated as disaggregated as possible for each climate zone, soil type and type of variety used by farmers. When variables related to these are included as control factors, their impacts can be partly captured. We have shown here that including such variables indeed influenced the levels of the different yield gap components. Often, information on climate zone, soil type and type of variety is not available, and economic studies rarely consider their impacts (Seyoum et al. 1998;Beshir et al. 2012;Beza et al. 2017). This analysis is thus a step forward, but more local studies are needed to analyze specific relationships in specific contexts. Unravelling the variability and causes of smallholder maize yield gaps in Ethiopia

Conclusion
Conducting yield gap analysis at national level with individual farm data and looking into the disaggregation by variety, soil type, year and farming systems helped to get an overview of the yield gaps at different levels, which is essential to search for strategies that narrow the yield gaps. The main maize production technologies used by the farmers were improved varieties and mineral fertilizer. Furthermore, maize yield was higher when these technologies were combined, indicating the relevance of integrating agricultural technologies. We also found that the effect of nitrogen was affected by management practices, mainly crop residue retention and number of weeding. Integrated crop management practices, rather than mere increases in mineral fertilizer application rates, are required to increase maize yields. The technology yield gap accounted for the largest proportion of the total maize yield gap, ranging from 54% to 73%, while the resource and efficiency yield gaps accounted for, respectively, 12-25% and 15-21% of the total yield gap. Research and development in maize production technologies that acknowledge the relevance of biophysical characteristics is essential to narrow the technology yield gap. However, provision of agricultural technology might not assure its success in reducing the maize yield gap unless successfully used by maize growers. Its profitability also needs to be considered for its sustainable contribution to long term food security in the country. Reducing the resource yield gap also requires optimal use of inputs. In conclusion, we show that the maize yield gap is explained by various factors associated with the efficiency, resource and technology yield gaps, and that the size of the yield gap components varied across the different maize varieties, soil types, year and farming systems. This implies that targeted (for the different yield gaps and farming systems) but integrated policy design and implementation is required to narrow the overall maize yield gap and improve food security.